Cockroach cluster freeze when deleting rows

We are deleting rows from a table containing over 500 million rows. Trying to delete more than 2000 rows in one attempt gets us into trouble. When in trouble we see:

  • CPU util on all cockroach nodes reaches 100% (24 core box)
  • Nothing can be done via clients (loading data) for at least 15 minutes

We are using batch delete on an indexed filter and gc.ttlseconds is configured as 2 hrs on a 3 node cluster.

Query for delete is:

DELETE FROM table WHERE (ts, f2, f3, f4) IN (SELECT ts, f2, f3, f4 FROM table WHERE (f2 >= '4400000902' AND f2 < '4400000910') AND ts < 1626270139000000);

Is there anything we can do to avoid getting into this problem? How to recover more quickly?

Any help in this regard is really appreciated.

Hi! Welcome to the forum! This might be what you’re looking for: Bulk-delete Data | CockroachDB Docs

As the document says, you can try using the LIMIT clause to try different batch sizes and find the optimal batch size. Let me know if it works!