Sql plan, strange behavior

Thank you for this incredible database, i’ve already love it.
Just made some test, and can’t figure out some strange behavior of sql execution.
I’m testing v2.1.4 version.

I have simple table:

> show columns from speed_test;
  column_name |  data_type   | is_nullable | column_default | generation_expression |     indices     | is_hidden
+-------------+--------------+-------------+----------------+-----------------------+-----------------+-----------+
  id          | INT          |    false    | unique_rowid() |                       | {"primary","g"} |   false
  a           | VARCHAR(128) |    true     | NULL           |                       | {}              |   false
  b           | VARCHAR(128) |    true     | NULL           |                       | {}              |   false
  c           | VARCHAR(128) |    true     | NULL           |                       | {}              |   false
  d           | VARCHAR(128) |    true     | NULL           |                       | {}              |   false
  e           | VARCHAR(128) |    true     | NULL           |                       | {}              |   false
  f           | INT          |    true     | NULL           |                       | {}              |   false
  g           | INT          |    true     | NULL           |                       | {"g"}           |   false
(8 rows)

With such data:

> select * from speed_test limit 5;
          id         |     a     |     b     |     c     |     d     |     e     | f | g
+--------------------+-----------+-----------+-----------+-----------+-----------+---+---+
  427404369821859841 | field_a_1 | field_b_1 | field_c_1 | field_d_1 | field_e_1 | 1 | 1
  427404370053693441 | field_a_1 | field_b_1 | field_c_1 | field_d_1 | field_e_1 | 1 | 1
  427404370056609793 | field_a_2 | field_b_2 | field_c_2 | field_d_2 | field_e_2 | 2 | 1
  427404370056642561 | field_a_3 | field_b_3 | field_c_3 | field_d_3 | field_e_3 | 3 | 1
  427404370056675329 | field_a_4 | field_b_4 | field_c_4 | field_d_4 | field_e_4 | 4 | 1

Questions:
1 MIN, MAX queries
These queries works fast
SELECT MIN(id) FROM speed_test;
Time: 1.109965ms

SELECT MAX(id) FROM speed_test;
Time: 1.648666ms

But this one is slow
SELECT MIN(id), MAX(id) FROM speed_test;
Time: 796.550218ms

For some reason it try to use not primary index

> explain SELECT MIN(id), MAX(id) FROM speed_test;
    tree    |    field    | description
+-----------+-------------+--------------+
  group     |             |
   │        | aggregate 0 | min(id)
   │        | aggregate 1 | max(id)
   │        | scalar      |
   └── scan |             |
            | table       | speed_test@g
            | spans       | ALL

Anyway, if I will force to use primary index, it still works slowly.

SELECT MIN(id), MAX(id) FROM speed_test@primary;
Time: 902.318747ms

> explain SELECT MIN(id), MAX(id) FROM speed_test@primary;
    tree    |    field    |    description
+-----------+-------------+--------------------+
  group     |             |
   │        | aggregate 0 | min(id)
   │        | aggregate 1 | max(id)
   │        | scalar      |
   └── scan |             |
            | table       | speed_test@primary
            | spans       | ALL

2 Range query
This first query works fast

> explain SELECT id FROM speed_test@primary WHERE id BETWEEN 1 AND 500 AND ((id % 16) = 0);
  tree | field  |    description
+------+--------+--------------------+
  scan |        |
       | table  | speed_test@primary
       | spans  | /1-/500/#
       | filter | (id % 16) = 0

But this second query works very slow (in query changed 500 -> 1000 only)

> explain SELECT id FROM speed_test@primary WHERE id BETWEEN 1 AND 1000 AND ((id % 16) = 0);
  tree | field  |                   description
+------+--------+--------------------------------------------------+
  scan |        |
       | table  | speed_test@primary
       | spans  | ALL
       | filter | ((id >= 1) AND (id <= 1000)) AND ((id % 16) = 0)

Can’t get why this happening, and how can I tell database not to scan whole table in second case and use behavior from the first query

Thank you in advance

Hi @mykola,

The problem here is that when we see the individual MIN and MAX queries, we recognize that we can replace them with a simple scan of the relevant index. However, if we have both of them, we don’t recognize the slightly more involved transformation to get both of them. That’s a great observation—we should definitely be finding a better plan here. I’ve opened #35038 to track this issue.

As a temporary workaround, you can use subqueries to get the fast behaviour:

SELECT (SELECT MIN(id) FROM speed_test) AS min, (SELECT MAX(id) FROM speed_test) AS max

Thanks!
Justin

For 2: very interesting case. The problem is that we don’t (yet) collect statistics on tables. The optimizer operates on the assumption that all tables have 1000 rows. But even so, we should prefer the first plan; I filed https://github.com/cockroachdb/cockroach/issues/35042. Also note that the next major version will collect table statistics automatically.

In the meantime, you can tell the optimizer that the table is large using this:
ALTER TABLE speed_test INJECT STATISTICS ‘[
{
“columns”: [“id”],
“created_at”: “2018-01-01 1:00:00.00000+00:00”,
“row_count”: 1000000,
“distinct_count”: 1000000
}
]’

But this might affect other queries, especially if they involve other tables in conjunction with this table.

Thanks for help guys