Hi Tim, I’d like to get back to the original schema that caused single hot node and see how it can be improved.
CREATE TABLE my_table (
timestamp TIMESTAMP,
an_id INT,
a_type STRING,
id UUID,
another_filed INT,
PRIMARY KEY (timestamp, an_id, a_type)
);
timestamp can be somewhat serial because it’s a continuous time stream, while an_id and a_type are quite randomized.
The reason I chose the first 3 columns as primary key is because there will be SELECTs based on timestamp, an_id, a_type, whereas timestamp is a must. At the beginning I thought the primary key with 3 columns should have a more randomized appearance since “an_id” and “a_type” are quite random. But it may have hit transaction contention as the 1st column is somewhat serial?
I also noticed that when doing SELECT with columns in the primary key
explain(EXPRS) select * from my_table where timestamp BETWEEN ‘2018-05-08 16:13:00’::timestamp AND ‘2018-05-08 16:15:00’::timestamp AND an_id =2 ;
±-----±-------±------------------------------------------------+
| Tree | Field | Description |
±-----±-------±------------------------------------------------+
| scan | | |
| | table | my_table@primary |
| | spans | /2018-05-08T16:13:00Z/2-/2018-05-08T16:15:00Z/3 |
| | filter | an_id = 2 |
±-----±-------±------------------------------------------------+
explain (EXPRS) select * from my_table where a_type = ‘type’; ±-----±-------±----------------------+
| Tree | Field | Description |
±-----±-------±----------------------+
| scan | | |
| | table | my_table@primary |
| | spans | ALL |
| | filter | a_type = ‘type’ |
±-----±-------±----------------------+
So I was wondering, since under the hood it is using kv store, does it mean the order of primary columns matter? It seems that I can avoid whole table scan if I have the 1st column in WHERE constraints.
Another question is, if I change the primary key to an uuid or an automatic one just by ignoring the line starting with PRIMARY KEY…, because I’ll still need to SELECT over timestamp, an_id and a_type, I’ll need to add indexes on these columns, but that could involve some degrees of transaction contention too, right?
On a side note, is this still true that time series use cases are not optimized for cockroachdb? Time Series usage