A story where a delete of 6 million rows of an 8 million row table takes more than 10 hours/never completes yet the same delete statement with limit 100000
completes in 8 seconds.
I have a straightforward table, with 8.4 million rows, with an index on a column named identifier. The actual indexes are as follows:
show indexes from dataseries_itemqtys;
table_name | index_name | non_unique | seq_in_index | column_name | direction | storing | implicit
----------------------+----------------------------------------------------+------------+--------------+-------------+-----------+---------+-----------
dataseries_itemqtys | primary | false | 1 | rowid | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx | true | 1 | identifier | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx | true | 2 | actiondate | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx | true | 3 | item | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_actiondate_item_idx | true | 4 | rowid | ASC | false | true
dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx | true | 1 | identifier | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx | true | 2 | item | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx | true | 3 | actiondate | ASC | false | false
dataseries_itemqtys | dataseries_itemqtys_identifier_item_actiondate_idx | true | 4 | rowid | ASC | false | true
(9 rows)
This is the count of rows for each identifier value:
identifier | count
-------------+----------
2626 | 636182
3223 | 248249
3470 | 917217
1300 | 25642
1505 | 6634123
if I run this query:
delete from dataseries_itemqtys where identifier = '1505';
it takes 10+ hours and never finishes (I have to kill it) but if I run this query
delete from dataseries_itemqtys where identifier = '1505' limit 100000;
it runs in 8.432 seconds total which leads me to believe/extrapolate that the first delete should end WAY before 10 hours.
EXPLAIN says:
explain delete from dataseries_itemqtys where identifier = '1505';
tree | field | description
------------+---------------------+-------------------------------------------------------------------------
| distribution | local
| vectorized | false
delete | |
│ | from | dataseries_itemqtys
│ | auto commit |
└── scan | |
| estimated row count | 6641365
| table | dataseries_itemqtys@dataseries_itemqtys_identifier_actiondate_item_idx
| spans | [/'1505' - /'1505']
(9 rows)
The actual table definition is this:
show columns from dataseries_itemqtys;
column_name | data_type | is_nullable | column_default | generation_expression | indices | is_hidden
--------------------+---------------+-------------+----------------+--------------------------+-----------------------------------------------------------------------------------------------------------------+------------
dataseries | STRING | false | NULL | | {}
| false
identifier | STRING | false | NULL | | {dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx} | false
item | STRING | false | NULL | | {dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx} | false
actiondate | DATE | false | NULL | | {dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx} | false
quantity_increase | DECIMAL(18,4) | false | NULL | | {}
| false
quantity_decrease | DECIMAL(18,4) | false | NULL | | {}
| false
running_quantity | DECIMAL(18,4) | false | NULL | | {}
| false
identifier_clean | STRING | true | NULL | ltrim(rtrim(identifier)) | {}
| false
rowid | INT8 | false | unique_rowid() | | {primary,dataseries_itemqtys_identifier_actiondate_item_idx,dataseries_itemqtys_identifier_item_actiondate_idx} | true
This is on a 5-node cluster, all are physical machines (no VMs), all are SSDs only, all have 8gb RAM, all are wired to the same physical network switch (1gbps) - each is just inches away from the others
SELECT VERSION shows:
CockroachDB CCL v20.2.3 (x86_64-unknown-linux-gnu, built 2020/12/14 18:33:39, go1.13.14)
All data was IMPORT data from CSV files; my current workaround is to drop the table and recreate it which means reIMPORTing the data each time.
Any ideas? Thanks!!!
P.S here’s a dashboard excerpt showing the spike when the delete was attempting to execute - it kept everything busy