CockroachDb single HOT node

Hi, I tried to deploy cockroachdb (v2.0.0) on kubernetes using https://github.com/kubernetes/charts/tree/master/stable/cockroachdb

The deployment worked fine, however I found the following interesting and would like to know why

  • A single node would have much higher usage in cpu and memory than others, this happened in 3-node, 5-node and 10-node settings. The high usage node also showed
    higher lease holders per store
    higher Raft Log Queue
    higher Queue processing times

on the dashboard

  • If setting resource limit on cpu and memory, insert, select performance would drop, even the limit was never reached. I think the data is evenly distributed.

Hey Jeorge,

Sorry for the delay getting back to you on this. Could you tell us a bit more about where you deployed and whether this is in production or testing? If testing, how are you testing? The symptoms you describe make it sound like there’s an issue balancing load across nodes. How are you or your test application connecting to the cluster?

Hi Tim,

Thanks for the reply! We deployed on in-house Kubernetes cluster based on AWS. This is for performance testing to see if cockroachdb can be of our use. I used Jmeter on a Kubernetes pod with connection pool to hit the public service endpoint (my-release-cockroachdb-public:26257)
The load is 800 inserts per second while each insert has 10 records in it.

I’ve tried to adjust cluster setting “kv.allocator.lease_rebalancing_aggressiveness” but it didn’t seem to help.

Thanks!

Hey Jeorge,

Got it - by any chance, are you using sequential IDs on the inserts? If so, you might be running into issues with transaction contention. If that were the case, the inserts would be adding to the same range over and over again, leading to a ‘hot node’. The solution would be to switch from sequential IDs to UUIDs instead. That would avoid new writes hitting the same range over and over again. There are suggestions in the previous link and here in the SQL FAQs.

If not, could you share the schema you’re testing and the queries you’re using in jmeter?

Hope that helps!

Hi Tim,

I have tried using with an automatic Primary key (i.e. without specifying primary key in the table) but still led to single hot node.

One schema I also tried, a fairly simple one
CREATE TABLE id_table (
id UUID,
PRIMARY KEY (id)
);
with insert that has one record
INSERT INTO id_table VALUES (gen_random_uuid());

I still observed the following (cpu, memory) while inserting
n3-cockroachdb-0 1103m 702Mi
n3-cockroachdb-1 731m 251Mi
n3-cockroachdb-2 893m 431Mi

Let me know what you think,

Thanks!

Thanks, that’s helpful! Out of curiosity, are these results similar to your tests on the 17th? I was just speaking with our CTO @bdarnell, and the CPU and memory distribution here doesn’t look wrong from his perspective. We don’t expect that nodes have identical load even with load balancing and random UUIDs. How did n0 perform compared to the others apart from the memory and CPU usage?

It’s probably more unevenly distributed because I was using timestamp as one of the primary keys

the major differences on the dashboard I can observe now for n0 (with only UUID in the table), and I probably observed similarly (Differences in Raft log queue and Queue processing time) in the previous tests, note that the node at the 3rd row seems relatively idle, while n0 is more active than the rest of the nodes.

Hi Tim, I’d like to get back to the original schema that caused single hot node and see how it can be improved.

CREATE TABLE my_table (
timestamp TIMESTAMP,
an_id INT,
a_type STRING,
id UUID,
another_filed INT,
PRIMARY KEY (timestamp, an_id, a_type)
);

timestamp can be somewhat serial because it’s a continuous time stream, while an_id and a_type are quite randomized.

The reason I chose the first 3 columns as primary key is because there will be SELECTs based on timestamp, an_id, a_type, whereas timestamp is a must. At the beginning I thought the primary key with 3 columns should have a more randomized appearance since “an_id” and “a_type” are quite random. But it may have hit transaction contention as the 1st column is somewhat serial?

I also noticed that when doing SELECT with columns in the primary key
explain(EXPRS) select * from my_table where timestamp BETWEEN ‘2018-05-08 16:13:00’::timestamp AND ‘2018-05-08 16:15:00’::timestamp AND an_id =2 ;
±-----±-------±------------------------------------------------+
| Tree | Field | Description |
±-----±-------±------------------------------------------------+
| scan | | |
| | table | my_table@primary |
| | spans | /2018-05-08T16:13:00Z/2-/2018-05-08T16:15:00Z/3 |
| | filter | an_id = 2 |
±-----±-------±------------------------------------------------+
explain (EXPRS) select * from my_table where a_type = ‘type’; ±-----±-------±----------------------+
| Tree | Field | Description |
±-----±-------±----------------------+
| scan | | |
| | table | my_table@primary |
| | spans | ALL |
| | filter | a_type = ‘type’ |
±-----±-------±----------------------+

So I was wondering, since under the hood it is using kv store, does it mean the order of primary columns matter? It seems that I can avoid whole table scan if I have the 1st column in WHERE constraints.

Another question is, if I change the primary key to an uuid or an automatic one just by ignoring the line starting with PRIMARY KEY…, because I’ll still need to SELECT over timestamp, an_id and a_type, I’ll need to add indexes on these columns, but that could involve some degrees of transaction contention too, right?

On a side note, is this still true that time series use cases are not optimized for cockroachdb? Time Series usage

Hey Jeorge,

To answer the last question first: we’re still not optimized for time series data, in part because of the issues you’re running into here. To add a bit more detail:

When we create rows in the KV store, we do it based on the order of the columns in the PRIMARY KEY statement, as you intuited. So if the first item in the primary key is serial, then the KV pairs would be serial as well. If you’re looking to minimize contention, you could try putting the least serial key first.

For the sake of argument, I’ll assume an_id above is somewhat random. You could swap the keys around and put serial data last: PRIMARY KEY (an_id, an_type, timestamp). This would avoid (or reduce) the contention issue, but the caveat is that if you want to avoid a full table scan when querying based on timestamp, you’d need to provide an_id and an_type in the WHERE clause as well.

Does that help? Happy to go into more detail about why if it’d be helpful.

Hey Jeorge,

Did that help? Let me know if you’re still running into issues.