Performance with many INSERT ON CONFLICTs


(Chervine Majeri) #1

Hello,

We’re in process of migrating our application from sqlserver to cockroachdb, and we’re having big performance issues when doing many inserts.

We run a cluster of 3 cockroach nodes, with memory for cache and memory for sql queries both set at 35% (so 70% combined), running (in containers) on machines (VMs) with 10CPUs and 32GB of RAM, running on SSD.

For our application, we need to be able to insert a lot of rows initially (preferably as fast as possible) followed by a lot of periodic inserts (around 30k rows every 5-6 minutes or so). Some inserts will be updates of previously inserted data, and others are brand new (the proportion is about 40% inserts are actually updates, the rest are new inserts). We are unable on the application side to filter cases where it will be an update or a brand new insert without checking in DB.

So the logical thing for us seemed to be grouping up our rows batch by batch and running one long “INSERT … ON CONFLICT” with all the values inside. This feels slower than I would have expected, even knowing that our table has 7 different indexes.

For instance, a batch of 500 rows will routinely take more than 10-20s to complete on a “small-ish” table of 180k rows, which is very small compared to the potentially tens of millions rows that the end table would have.

So I have a few questions, is this sort of performance normal, or is there something wrong with my nodes? I’m currently not loadbalancing across nodes, and simply sending all my requests to the same one.
I’ve set the trace option to see what part of the queries take the most time, and there’s a lot of “waited Xs for overlapping requests”, but our batches are sent one by one, and there is no overlap inside one batch (all the rows have distinct primary key, no value in the batch will conflict with another one in the same batch), is this expected?

How could I possibly increase performance? Spin up more nodes? load balance my write requests? Is there maybe some way that I could hint at cockroach as to what I need to do?

Thanks.


(Tim O'Brien) #2

Hey @CMajeri,

It’s difficult to draw conclusions without specifics, but 40ms per insert is definitely on the long end. To really dig in, we’d need to know:

  • How is the cluster provisioned? (i.e.: # of cores and their speed, available memory, disk type and speed)
  • How many nodes are there? What is the replication factor, and where are the nodes located?
  • What is the schema?

I suspect that you’re running into transaction contention based on the waiting for overlapping requests message. Usually this happens because the primary key is sequential, which means that your throughput is limited by the resources on a single node. But I’ll wait to hear more about your setup before drawing conclusions.


(Chervine Majeri) #3

The cluster is 3 VMs of 10CPUs with 3.1GHz each, 32GB of RAM (for system) with 70% that cockroach can access freely (35% cache, 35% sql queries memory), with 600GB SSD disks (I don’t know about speed, but decent enough).

The replication factor is 3, for 3 VMs.

Initially, the queries were all sent to the same VM.

The schema is this :

(
hash STRING,
idx INT4, 
owner STRING,
marked_hash STRING,
marked_idx INT4 NOT NULL,
timestamp TIMESTAMPTZ,
marked_timestamp TIMESTAMPTZ,
pinned BOOL NOT NULL,
PRIMARY KEY (hash, idx),
INDEX(marked_hash),
INDEX(owner, marked_idx),
INDEX(owner, timestamp),
INDEX(owner, marked_timestamp),
INDEX(pinned, timestamp, marked_idx),
INDEX(pinned, timestamp),
INDEX(owner, pinned)
).

As they get marked, previously unmarked objects are updated with reference to where they were marked, hence the upsert.

I have since updated the way I handle Upserts, and rather than doing INSERT … ON CONFLICT on a large batch, I first run a large transaction with many UPDATE statements, and then one with many INSERT statements with ON CONFLICT DO NOTHING. This seems to work WAY better, with writes being about 10ms each. With my workload, I do not specifically need the upsert to be all atomic so things work out in the end, but it still feels odd.
When I distribute the load parallelly across nodes instead of just sending them to a single one, performance increases even further than that, with a batch of 300 taking less than a second even though the table is more than 100 million rows.

With my “new way” of doing upserts, things go much better, without any query taking longer than a second, but I don’t quite know why just yet.