Hi, I’m evaluating CockroachDB, and previously I had a 5 node cluster in GCP setup with the below single table schema which has ~50 million rows in it:
CREATE TABLE cellevents (
ckm_partition INT8 NOT NULL,
revision_partition INT8 NOT NULL,
revision INT8 NOT NULL,
value DECIMAL(18,8) NULL,
composite_key JSONB NOT NULL,
eventSequence INT8 NOT NULL,
CONSTRAINT revision_partition_pk PRIMARY KEY (revision_partition, revision, eventSequence),
INVERTED INDEX ckm_composite_key_idx (composite_key)
);
In version 2.1.5 I could execute a query such as:
SELECT value, composite_key from cellevents WHERE revision_partition=3562736 AND revision=2 AND composite_key->‘3’=‘3722’ order by eventsequence;
This could return up to 5000 rows in ~25ms. We’ve recently upgraded the cluster to version 19.1.3 (same cluster size), and the same query on the same schema and data volume now takes ~200ms. In both cases the execution plan is the same, and in both cases it is a distributed read query.
I’ve also noticed that using the bulk import command to get this test data in takes in the region of 1.5x - 2x longer.
Can you explain why such a massive speed degradation would occur?
Thanks