Query Performance Issues

I am seeing high latencies for some really simple insert queries. I have attached screenshots of the diagnostics page as well.

For example for the following query, I see it taking about ~3.8 seconds. Its inserting about 128 rows into the table, that looks like this:

CREATE TABLE public.profile_updates (
id INT8 NOT NULL DEFAULT nextval(‘profile_updates_auto_inc’:::STRING),
profileoperation_id INT8 NOT NULL,
profile_id INT8 NULL,
“timestamp” INT8 NOT NULL,
action VARCHAR(255) NOT NULL,
details STRING NOT NULL,
CONSTRAINT “primary” PRIMARY KEY (id ASC),
INDEX profileoperation_id (profileoperation_id ASC),
INDEX profile_id (profile_id ASC),
FAMILY “primary” (id, profileoperation_id, profile_id, “timestamp”, action, details)
)

Query:
INSERT INTO profiles.profile_updates(profileoperation_id, profile_id, “timestamp”, action, details) VALUES ($1, $2, more3), (more100)
I am using jdbc batched statement to update the rows.


I see that most the latency is attributed to overhead. What exactly does overhead include and what would be the best way to optimize it if this is the root cause.

1 Like

Hi again @vramkris!

To answer your question about “overhead”, we have a bug in how that is computed – it can sometimes include time spent executing other statements if multiple statements are submitted in a single request.

Can you confirm that you see the same latency if you run the individual query from the command line?

A starting point for improving query performance is here. Could you share the output of the EXPLAIN plan? You could also capture a statement bundle as described here, and see if there are any particularly long operations in the resulting trace.

Also, we recommend against using a sequentual key for the primary key. See these docs. This is most likely your performance problem.

There are more best practices here

Hey @rafiss
Thanks for the response.

I don’t have the explain plan for it now…I will rerun and post it here if I can reproduce it.
Since I imported the db from mysql, this was the schema that was generated. I altered the primary key to
ALTER TABLE profile_updates ALTER COLUMN id SET DEFAULT unique_rowid(); and the latencies seem to have improved.
Unfortunately, it might be hard to switch to a UUID since it would break existing downstream consumers who rely on it.
Would enabling hash based sharding help in this situation for all tables that have an auto-incremented primary key?

On the topic of batching, is the preferred way to use jdbc batching or manually writing a query that does multiple inserts?
I ask because the former results in multiple queries of batch-size 128 or lower.

Thanks,
Vinay

The problem with the sequence is twofold. It’s both that the workload will be sequential and thus always writing to the same place and that incrementing that sequence is going to be a synchronous write to the database per row. This PR (sql: implement sequence caching and cached sequence serial normalization by jayshrivastava · Pull Request #56954 · cockroachdb/cockroach · GitHub) would likely help with the second thing. unique_rowid() is probably the right choice for now. If you still seem to have a bottleneck at write time then maybe hashing. Generally I’d try replacing the sequences with that as the next step.

That does make sense.
I enabled Hash based sharding for 2 tables and I saw a significant improvement in latency. What is the general recommendation around the number of buckets to specify for the index?
I currently specified 4 buckets based on the number of nodes in the cluster. Should the number of buckets be higher and is there a methodic approach to determine how many buckets should the index be split into?