Table with 8 million rows has very slow delete yet same delete statement with LIMIT clause seems MUCH faster

,

A story where a delete of 6 million rows of an 8 million row table takes more than 10 hours/never completes yet the same delete statement with limit 100000 completes in 8 seconds.

I have a straightforward table, with 8.4 million rows, with an index on a column named identifier. The actual indexes are as follows:

show indexes from dataseries_itemqtys;
      table_name      |                     index_name                     | non_unique | seq_in_index | column_name | direction | storing | implicit
----------------------+----------------------------------------------------+------------+--------------+-------------+-----------+---------+-----------
  dataseries_itemqtys | primary                                            |   false    |            1 | rowid       | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx |    true    |            1 | identifier  | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx |    true    |            2 | actiondate  | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx |    true    |            3 | item        | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx |    true    |            4 | rowid       | ASC       |  false  |   true  
  dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx |    true    |            1 | identifier  | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx |    true    |            2 | item        | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx |    true    |            3 | actiondate  | ASC       |  false  |  false  
  dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx |    true    |            4 | rowid       | ASC       |  false  |   true  
(9 rows)

This is the count of rows for each identifier value:

  identifier |  count
-------------+----------
  2626       |  636182
  3223       |  248249
  3470       |  917217
  1300       |   25642
  1505       | 6634123

if I run this query:
delete from dataseries_itemqtys where identifier = '1505';
it takes 10+ hours and never finishes (I have to kill it) but if I run this query
delete from dataseries_itemqtys where identifier = '1505' limit 100000;
it runs in 8.432 seconds total which leads me to believe/extrapolate that the first delete should end WAY before 10 hours.

EXPLAIN says:

explain  delete from dataseries_itemqtys where identifier = '1505';              
    tree    |        field        |                              description
------------+---------------------+-------------------------------------------------------------------------
            | distribution        | local
            | vectorized          | false
  delete    |                     |
   │        | from                | dataseries_itemqtys
   │        | auto commit         |
   └── scan |                     |
            | estimated row count | 6641365
            | table               | dataseries_itemqtys@dataseries_itemqtys_identifier_actiondate_item_idx
            | spans               | [/'1505' - /'1505']
(9 rows)

The actual table definition is this:

show columns from dataseries_itemqtys;
     column_name    |   data_type   | is_nullable | column_default |  generation_expression   |                                                     indices                                                     | is_hidden
--------------------+---------------+-------------+----------------+--------------------------+-----------------------------------------------------------------------------------------------------------------+------------
  dataseries        | STRING        |    false    | NULL           |                          | {}
                                                           |   false
  identifier        | STRING        |    false    | NULL           |                          | {dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx}         |   false
  item              | STRING        |    false    | NULL           |                          | {dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx}         |   false
  actiondate        | DATE          |    false    | NULL           |                          | {dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx}         |   false
  quantity_increase | DECIMAL(18,4) |    false    | NULL           |                          | {}
                                                           |   false
  quantity_decrease | DECIMAL(18,4) |    false    | NULL           |                          | {}
                                                           |   false
  running_quantity  | DECIMAL(18,4) |    false    | NULL           |                          | {}
                                                           |   false
  identifier_clean  | STRING        |    true     | NULL           | ltrim(rtrim(identifier)) | {}
                                                           |   false
  rowid             | INT8          |    false    | unique_rowid() |                          | {primary,dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx} |   true

This is on a 5-node cluster, all are physical machines (no VMs), all are SSDs only, all have 8gb RAM, all are wired to the same physical network switch (1gbps) - each is just inches away from the others :slight_smile:

SELECT VERSION shows:

  CockroachDB CCL v20.2.3 (x86_64-unknown-linux-gnu, built 2020/12/14 18:33:39, go1.13.14)

All data was IMPORT data from CSV files; my current workaround is to drop the table and recreate it which means reIMPORTing the data each time.

Any ideas? Thanks!!!

P.S here’s a dashboard excerpt showing the spike when the delete was attempting to execute - it kept everything busy


After 4 and half days, the delete is still running. I don’t understand why deleting 10,000 rows is quick (8 seconds) but deleting 8 million is not.

Extrapolating: 8 seconds times 1,000 is 8,000 seconds which is 2.2 hours. Even with some increase it should not be 4 days.

All data has had no updates, it has been imported then no other changes.

Below is graph snippet for a week - you can see previous attempt that I stopped after a day, then 2nd attempt started Jan 27th 16:30 I let run but it never completes…

As an experiment i am upgrading one node to the newest version v20.2.4 (with 30% CACHE AND 30% SQL INSTEAD OF DEFAULT 25%).

Maybe I’ll stop the delete statement after everything syncs again and try the delete again…

Sorry nobody got back to your sooner. What I think you’re seeing is that the very large delete without the LIMIT is running into a few different issues that combine to force the transaction to restart internally but prevent it from ever succeeding. While it’s running, consider having a look at crdb_internal.cluster_transactions and seeing what the num_retries field looks like. https://github.com/cockroachdb/cockroach/pull/46275 in 20.2 may help and allow this to survive. kv: txn giving up on refresh span collection causes closed ts to kick it out · Issue #44645 · cockroachdb/cockroach · GitHub is perhaps the related parent issue.

In general, I’d strongly encourage you to always perform your DELETE statements with a LIMIT clause and to run them in a loop if you need to delete more than the limit. The increase in work being done here is totally non-linear. Once you reach the point that restarts happen, everything is going to get bad.

Thanks for getting back. The performance graphs look very periodic, so the theory seems correct. I will look into the retries counter after I retry the transaction - currently upgrading nodes so it may be a day or two.

OTOH, 8 million rows is not that many rows :slight_smile:

Sure, the limit on how many reads we’ll track for refreshes is governed by the kv.transaction.max_refresh_spans_bytes cluster setting. It defaults to 256KiB so clearly not enough to track all of 8M rows. If you increase it to something like 64MiB or 128MiB then it may work out. However, that has the downside of potentially exposing the cluster to OOMs as I do not think we track this memory usage because we assume it’ll stay small.

P.S. I did try a delete with limit=10000000 with same results, but that would hit the same limit it sounds like :frowning:

OK. Upgraded all nodes to newest v20.2.4

Execution of

select node_id, num_stmts, num_retries, num_auto_retries, txn_string from crdb_internal.cluster_transactions order by num_retries desc;

gets result:

  node_id | num_stmts | num_retries | num_auto_retries |
                                                                  txn_string
----------+-----------+-------------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        1 |         1 |           2 |                2 | "sql txn" meta={id=c0782119 key=/Table/94/2/"1505"/13888/"QBL810"/-9223352884633616199/0 pri=0.03656968 epo=2 ts=1612145101.002729734,2 min=1612142698.224514013,0 seq=3270000} lock=true stat=PENDING rts=1612144888.712018674,2 wto=false max=1612142698.724514013,0

…so it has failed and is retrying. as you described…