How to improve CockroachDB query Performance

We have 8 millions records in one table, I build 10 nodes for this cluster with the latest version(CockroachDB V2.0 20180326). the same query show the poor performance comparing with mysql database. the same query in cockroachDB need about 6 seconds, but in mysql, it just need 100ms. I confirmed the right indexes were build with the sql explain result below.

Is it possible to improve the single query time into 1 second? Which approach I can try?

±------------------------±------±---------------------------------------------------------------+
| Tree | Field | Description |
±------------------------±------±---------------------------------------------------------------+
| limit | | |
| └── render | | |
| └── index-join | | |
| ├── scan | | |
| │ | table | t_sync_data@t_sync_data_last_sync_id_user_id_data_type_idx |
| │ | spans | /1/"!\xcd\x19\xe4\x83}K\xffJv\xae\xfa\r\xdb\xecd"/“highlight”- |
| └── scan | | |
| | table | t_sync_data@primary |
±------------------------±------±---------------------------------------------------------------+
(8 rows)

the original query is,
SELECT * FROM t_sync_data where last_sync_id>0 and user_id=‘11cd19e4-837d-4bff-4a76-aefa0ddbec64’ and data_type=‘a01’ order by last_sync_id asc limit 50;

the original indexes,
CREATE INDEX ON t_sync_data (last_sync_id, user_id, data_type);
CREATE INDEX ON t_sync_data (user_id, data_type);
CREATE INDEX ON t_sync_data (last_sync_id);

The query time is about 6 ~ 20 seconds. Then I dropped the first index (last_sync_id, user_id, data_type),
the query time is still very slow. And then I modified the query
SELECT * FROM t_sync_data@t_sync_data_user_id_data_type_idx where last_sync_id>0 and user_id=‘11cd19e4-837d-4bff-4a76-aefa0ddbec64’ and data_type=‘a01’ order by last_sync_id asc limit 50;

Now the query performance is about 50ms. it’s pretty good! I resolved this issue through force index selection!

Hi @levifan, it’s unfortunate that the original query plan was slow. Can you post the whole schema of the t_sync_data table? It may help us understand why this is being planned this way. Could you also post the query plan for the second version of the query, that was fast?

@vilterp, please see the below table schema,

CREATE TABLE t_sync_data (
remote_id UUID NOT NULL,
conflict_remote_id UUID NULL,
user_id UUID NOT NULL,
last_sync_id INT NOT NULL,
data_type STRING NOT NULL,
sync_data STRING NULL,
deleted BOOL NOT NULL,
create_device_id STRING NOT NULL,
original_data_id STRING NOT NULL,
create_time INT NOT NULL,
last_update_time INT NOT NULL,
CONSTRAINT “primary” PRIMARY KEY (remote_id ASC),
INDEX t_sync_data_user_id_data_type_idx (user_id ASC, data_type ASC),
INDEX t_sync_data_last_sync_id_idx (last_sync_id ASC),
FAMILY f_meta (remote_id, conflict_remote_id, user_id, last_sync_id, data_type, deleted, create_device_id, original_data_id, create_time, last_update_time),
FAMILY f_data (sync_data)
)

Thanks @levifan. I filed https://github.com/cockroachdb/cockroach/issues/24297 for our SQL team to follow up on why this query plan was suboptimal.

By the way, we are working on a brand new query optimizer which will eventually make smarter decisions in cases like this.