We’re in process of migrating our application from sqlserver to cockroachdb, and we’re having big performance issues when doing many inserts.
We run a cluster of 3 cockroach nodes, with memory for cache and memory for sql queries both set at 35% (so 70% combined), running (in containers) on machines (VMs) with 10CPUs and 32GB of RAM, running on SSD.
For our application, we need to be able to insert a lot of rows initially (preferably as fast as possible) followed by a lot of periodic inserts (around 30k rows every 5-6 minutes or so). Some inserts will be updates of previously inserted data, and others are brand new (the proportion is about 40% inserts are actually updates, the rest are new inserts). We are unable on the application side to filter cases where it will be an update or a brand new insert without checking in DB.
So the logical thing for us seemed to be grouping up our rows batch by batch and running one long “INSERT … ON CONFLICT” with all the values inside. This feels slower than I would have expected, even knowing that our table has 7 different indexes.
For instance, a batch of 500 rows will routinely take more than 10-20s to complete on a “small-ish” table of 180k rows, which is very small compared to the potentially tens of millions rows that the end table would have.
So I have a few questions, is this sort of performance normal, or is there something wrong with my nodes? I’m currently not loadbalancing across nodes, and simply sending all my requests to the same one.
I’ve set the trace option to see what part of the queries take the most time, and there’s a lot of “waited Xs for overlapping requests”, but our batches are sent one by one, and there is no overlap inside one batch (all the rows have distinct primary key, no value in the batch will conflict with another one in the same batch), is this expected?
How could I possibly increase performance? Spin up more nodes? load balance my write requests? Is there maybe some way that I could hint at cockroach as to what I need to do?