Understanding performance and query optimisation

Hello,

I have a very simple table containing integers (see end of post). With around 20,000 rows of test data, I’ve got a performance issue I can’t explain when I run this simple SQL:

DELETE FROM propagation WHERE year=‘2018’ AND month=‘1’ AND sourcedxcc=‘376’

This has been running for over 10 minutes. There is no other traffic on the cluster (which is healthy, three nodes). I have a compound index on year,month,sourcedxcc. Here’s the EXPLAIN output:

root@:26257/clublog> explain DELETE FROM propagation WHERE year=‘2018’ AND month=‘1’ AND sourcedxcc=‘376’
-> ;
±---------------------±------±--------------------------------------+
| Tree | Field | Description |
±---------------------±------±--------------------------------------+
| delete | | |
| │ | from | propagation |
| └── render | | |
| └── index-join | | |
| ├── scan | | |
| │ | table | propagation@year_month_sourcedxcc_idx |
| │ | spans | /2018/1/376-/2018/1/377 |
| └── scan | | |
| | table | propagation@primary |
±---------------------±------±--------------------------------------+
(9 rows)

Time: 21.531196ms

Any ideas what might be wrong? I feel this can’t be correct in a such a small set of test data. Is there anything else I can run to see what is going on?

I was also wondering if there is a better way to post code snippets here.

Best wishes
Michael

Table structure:

root@:26257/clublog> show create table propagation;
±------------±-------------------------------------------------------------------------+
| Table | CreateTable |
±------------±-------------------------------------------------------------------------+
| propagation | CREATE TABLE propagation ( |
| | |
| | sourcedxcc INT NOT NULL, |
| | |
| | bandid INT NOT NULL, |
| | |
| | year INT NOT NULL, |
| | |
| | month INT NOT NULL, |
| | |
| | hour INT NOT NULL, |
| | |
| | zone INT NOT NULL, |
| | |
| | sourcezone INT NOT NULL, |
| | |
| | qsl INT NOT NULL, |
| | |
| | dxcc INT NOT NULL, |
| | |
| | sfi INT NULL, |
| | |
| | INDEX sourcezone_idx (sourcezone ASC), |
| | |
| | INDEX sourcedxcc_idx (sourcedxcc ASC), |
| | |
| | INDEX dxcc_idx (dxcc ASC), |
| | |
| | INDEX zone_idx (zone ASC), |
| | |
| | INDEX year_month_sourcedxcc_idx (year ASC, month ASC, sourcedxcc |
| | ASC), |
| | |
| | FAMILY “primary” (sourcedxcc, bandid, year, month, hour, zone, |
| | sourcezone, qsl, dxcc, sfi, rowid) |
| | |
| | ) |
±------------±-------------------------------------------------------------------------+
(1 row)

Hi Michael,

I don’t know what’s going on with your query… What version of CRDB are you running? Would you mind trying with the latest 2.0 beta?
Another thing to try is to put a limit on your DELETE and see if that helps.

Hi Andrei

Having looked at it every which way, I’m now pretty sure it will never complete - perhaps something is corrupt, deadlocked or similar? I wish I knew a little more about how to investigate it. Are there any other commands I can use via sql, etc to find out what it’s doing?

It’s v2.0-beta.20180319 by the way.

Thanks,
Michael

I realise this is not entirely a satisfactory result, but I can report that the problem was resolved by dropping and re-adding all the indexes. I will do my best to make a reproducible test case for you.

Best wishes
Michael

Very interesting… If you manage to repro, please let us know…

I should have said something before: is it possible that, when the DELETE was hanging, there was some other transaction open that had written to the rows being deleted? An open transaction that wrote something would block other overlapping writers until its committed/rolled back.

Hi Andrei

It wasn’t a transaction that got stuck (I did have a look for one of course). In my script, I was iterating over months and years, and I found it worked correctly - and repeatedly - until it got to this particular query. So, perhaps just that span (/2018/1/376-/2018/1/377) was involved in whatever the issue was? Prior to this point, I had run the script - found it was going too slowly - and dropped a few other indexes on the propagation table. Then I added the single compound index. I did this via the sql interface and in both cases (dropping, adding indexes) the process was pretty snappy. I haven’t identified anything odd in cockroach.log yet. The only other thing I can think of is that I moved to the 2.0 beta binary (so the database was actually created under the 1.1 binary, but this test was on the 2.0 beta binary).

I will keep looking for something reproducible for you, Andrei.

Best wishes
Michael