DELETE statement hangs indefinitely with automatic retries

Hi All,

We are exploring CockroachDB in a 3 node Kubernetes cluster. Each has 2 CPU, 3GB memory.

I’m very confused with DELETE statement performance and probably doing something wrong.

On a table of about 7 millions record it can take an hour to delete 25K rows (attempt on the screenshot below). Biggest try was 5 hours running statement after which I just restarted the cluster.

What seems off to me is why one statement on idle cluster can’t manage its execution without locks and retries. And why there’s no limit on number of automatic retries.

I have a trace to share https://gist.github.com/acnot/d4f4a039fc066deb125c46b75afc02b7
This time it took 236840.902ms to delete 10K rows. In the trace you can see a lot of locks and retries.

Is this normal behavior for CockroachDB? Anything I’m doing wrong?

Kind Regards.

An update here.

To prevent CRDB from indefinite retries I had to use explicit transactions. That way I get an error after first failed attempt.

The reason for the failure seems to be something else, internal, writing to the same table. My statement was the only one from external clients on idle cluster. I’m not sure how to confirm but it may be range split/merge operations that won’t let me update any significant number of records at a time.

After exploring this more I learned that on my setup I’m unable to update/delete more than a few thousand records at the time when something else is writing to the table. When I make sure I’m the only one writing the table I can basically update as much as CRDB is able to process in 5 minutes which is about 2 millions rows for me.

I guess my updated question is the same. This kind of write performance and sensitivity to concurrent updates is normal for Cockroach?