Diagnosing query performance on large data set - not using index

I created a table with roughly 70 million rows from which there is a username and timestamp column that are the most frequently used columns in the where clause. Both those columns have an index created on them, but I am seeing some odd performance characteristics when I run queries that bound the timestamp data. After reading the documentation it suggested placing the column that can reduce the dataset the most first in the order of the where query, which in this case is the username, and follow that with aggregation columns that would further reduce the dataset.

However when I do a simple lookup for the username I get a fairly quick response, and when I attempt to bound the timestamp column the execution time increases dramatically. In this case from ~300ms up to 4s:

user@host:26257/test> SELECT count(*) from table1 WHERE username = 'user';
+----------+
| count(*) |
+----------+
|   103826 |
+----------+
(1 row)

Time: 293.097588ms

user@host:26257/test> SELECT count(*) from table1 WHERE username = 'user' and timestamp > '2017-11-1' and timestamp < '2017-12-31';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
(1 row)

Time: 4.051862505s

These are the explain statements for each query:

user@host:26257/test> explain SELECT count(*) from table1 WHERE username = 'user';
+-------+--------+-------+--------------------------------------------------+
| Level |  Type  | Field |                   Description                    |
+-------+--------+-------+--------------------------------------------------+
|     0 | group  |       |                                                  |
|     1 | render |       |                                                  |
|     2 | scan   |       |                                                  |
|     2 |        | table | table1@table1_username_idx |
|     2 |        | spans | /"user"-/"user\x00"                      |
+-------+--------+-------+--------------------------------------------------+

user@host:26257/test> explain SELECT count(*) from table1 WHERE username = 'user' and timestamp > '2017-11-1' and timestamp < '2017-12-31';
+-------+------------+-------+--------------------------------------------------+
| Level |    Type    | Field |                   Description                    |
+-------+------------+-------+--------------------------------------------------+
|     0 | group      |       |                                                  |
|     1 | render     |       |                                                  |
|     2 | index-join |       |                                                  |
|     3 | scan       |       |                                                  |
|     3 |            | table | table1@table1_username_idx |
|     3 |            | spans | /"user"-/"user\x00"                      |
|     3 | scan       |       |                                                  |
|     3 |            | table | table1@primary                        |
+-------+------------+-------+--------------------------------------------------+
(8 rows)

Time: 26.786089ms

It appears the index for the timestamp column is not being used at all in the second query. Just wondering how I can improve the performance of this query and use the index for the timestamp column?

Hi @somecallmemike. A query won’t ever use more than a single index, so it makes sense that just the index for the user column is selected. But I also find the 4 second response unexpectedly slow.

Here are a few things to try:

  • Create a single index on both columns:
    CREATE INDEX ON table1 (user, timestamp);
    
  • Use an index hint to force the query to use the index on timestamp:
    EXPLAIN SELECT count(*) FROM table1@timestamp WHERE username = 'user' and timestamp > '2017-11-1' and timestamp < '2017-12-31';
    

Could you try those and share how they affect performance? It’d also be interesting to hear how performance changes when using just the primary index:

EXPLAIN SELECT count(*) FROM table1@primary WHERE username = 'user' and timestamp > '2017-11-1' and timestamp < '2017-12-31';

cc, @andrei and @knz, who might have other ideas.

Thanks @jesse I was just about to post that in https://www.cockroachlabs.com/docs/stable/indexes.html#selection it states that only a single index is used during a query, and I am currently waiting for the index you suggested to be created. I’ll post again when I get the compound index created.

@jesse that did the trick thank you!

user@host:26257/test> SELECT count(*) from table1 WHERE username = 'user' and timestamp > '2017-11-1' and timestamp < '2017-12-31';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
(1 row)

Time: 70.442505ms

user@host:26257/test> explain SELECT count(*) from table1 WHERE username = 'user' and timestamp > '2017-11-1' and timestamp < '2017-12-31';
+-------+--------+-------+--------------------------------------------------------------------------+
| Level |  Type  | Field |                               Description                                |
+-------+--------+-------+--------------------------------------------------------------------------+
|     0 | group  |       |                                                                          |
|     1 | render |       |                                                                          |
|     2 | scan   |       |                                                                          |
|     2 |        | table | table1@table1_username_timestamp_idx               |
|     2 |        | spans | /"user"/2017-11-01T00:00:00.000001Z-/"user"/2017-12-31T00:00:00Z |
+-------+--------+-------+--------------------------------------------------------------------------+
(5 rows)

Time: 780.981µs

@jesse Would it be wise to now delete the exiting single column indexes?

Glad that helped, @somecallmemike! Sure, if you no longer expect to need those single-column indexes, you can drop them.