We are deleting rows from a table containing over 500 million rows. Trying to delete more than 2000 rows in one attempt gets us into trouble. When in trouble we see:
- CPU util on all cockroach nodes reaches 100% (24 core box)
- Nothing can be done via clients (loading data) for at least 15 minutes
We are using batch delete on an indexed filter and gc.ttlseconds is configured as 2 hrs on a 3 node cluster.
Query for delete is:
DELETE FROM table WHERE (ts, f2, f3, f4) IN (SELECT ts, f2, f3, f4 FROM table WHERE (f2 >= '4400000902' AND f2 < '4400000910') AND ts < 1626270139000000);
Is there anything we can do to avoid getting into this problem? How to recover more quickly?
Any help in this regard is really appreciated.