How to use index in case of range operators (>, <, etc.) with jsonb column?

Hi,

I created a table with a jsonb type column, and inserted a row like below:

create table t (
        id decimal primary key,
        j jsonb
);

create index t_idx on t using gin (j);

insert into t (id, j) values (1, '{"d":123,"s":"foobar","n":null,"b":true}');

And then query explanation is like below:

root@roach1:26257/test> explain select * from t where (j->>'d')::decimal = 123;
  tree |    field    |       description
-------+-------------+---------------------------
       | distributed | true
       | vectorized  | false
  scan |             |
       | table       | t@primary
       | spans       | FULL SCAN
       | filter      | (j->>'d')::DECIMAL = 123
(6 rows)

Time: 1.7316ms

root@roach1:26257/test> explain select * from t where (j->'d') @> '{"d":123}';
     tree    |    field    |             description
-------------+-------------+--------------------------------------
             | distributed | false
             | vectorized  | false
  index-join |             |
   │         | table       | t@primary
   │         | key columns | id
   └── scan  |             |
             | table       | t@t_idx
             | spans       | /"d"/"d"/123-/"d"/"d"/123/PrefixEnd
(8 rows)

Time: 2.9443ms

I know that it’s possible to use index in case of the EQ (=) operator with this syntax @>, but I’m not sure how to do the same for range operators like LT (<), GT (>), etc.

root@roach1:26257/test> explain select * from t where (j->>'d')::decimal >= 123;
  tree |    field    |        description
-------+-------------+----------------------------
       | distributed | true
       | vectorized  | false
  scan |             |
       | table       | t@primary
       | spans       | FULL SCAN
       | filter      | (j->>'d')::DECIMAL >= 123
(6 rows)

Time: 1.5771ms

Any suggestion? Appreciate your help!

Unfortunately, we don’t support this yet. See this issue: https://github.com/cockroachdb/cockroach/issues/24960

Thanks. Hope it’s implemented soon.

So as a workaround, I just tried to use computed column to pull out a field from the jsonb column, and index it like below:

ALTER TABLE t ADD COLUMN computed_d decimal AS ((j->>'d')::decimal) STORED;
CREATE INDEX idx_d ON t (computed_d);

But it’s still doing full scan. Is this a bug?

root@roach1:26257/test> explain select * from t where computed_d > 1;
  tree |    field    |  description
-------+-------------+-----------------
       | distributed | true
       | vectorized  | false
  scan |             |
       | table       | t@primary
       | spans       | FULL SCAN
       | filter      | computed_d > 1
(6 rows)

Time: 2.8003ms

This is the table and the indexes I have.

root@roach1:26257/test> show index from t;
  table_name | index_name | non_unique | seq_in_index | column_name | direction | storing | implicit
-------------+------------+------------+--------------+-------------+-----------+---------+-----------
  t          | primary    |   false    |            1 | id          | ASC       |  false  |  false
  t          | t_idx      |    true    |            1 | j           | ASC       |  false  |  false
  t          | t_idx      |    true    |            2 | id          | ASC       |  false  |   true
  t          | idx_d      |    true    |            1 | computed_d  | ASC       |  false  |  false
  t          | idx_d      |    true    |            2 | id          | ASC       |  false  |   true
(5 rows)

Time: 4.1263ms

root@roach1:26257/test> show create t;
  table_name |                      create_statement
-------------+--------------------------------------------------------------
  t          | CREATE TABLE t (
             |     id DECIMAL NOT NULL,
             |     j JSONB NULL,
             |     computed_d DECIMAL NULL AS ((j->>'d')::DECIMAL) STORED,
             |     CONSTRAINT "primary" PRIMARY KEY (id ASC),
             |     INVERTED INDEX t_idx (j),
             |     INDEX idx_d (computed_d ASC),
             |     FAMILY "primary" (id, j, computed_d)
             | )
(1 row)

Time: 246.9445ms

(*Updates) If it’s forced to use the index of the computed column, then it works. But the problem is, I have to force it.

root@roach1:26257/test> explain select * from t@idx_d where computed_d > 1;
     tree    |    field    |  description
-------------+-------------+----------------
             | distributed | false
             | vectorized  | false
  index-join |             |
   │         | table       | t@primary
   │         | key columns | id
   └── scan  |             |
             | table       | t@idx_d
             | spans       | /1/PrefixEnd-
(8 rows)

Time: 4.1956ms

Is there any roacher who can help answer my question above? :grin:

Isn’t it supposed to use the index without forcing it? Looks like a kind of bug to me.

I’m pretty certain that the optimizer is choosing to not use the index because it estimates that doing to lookup join to render the rows after scanning off the index is more expensive than just doing the full table scan. If you change the select * to select count(*) it will use the index.

If you insert rows and then let stats get collected, then the query will also use the index. For example:

INSERT INTO t (SELECT d::DECIMAL, json_build_object('d', d)  FROM generate_series(1, 5000, 1) as d)
CREATE STATISTICS stats FROM t;
> EXPLAIN SELECT * FROM t WHERE computed_d <= 1;
     tree    |    field     |     description
-------------+--------------+----------------------
             | distribution | local
             | vectorized   | false
  index join |              |
   │         | table        | t@primary
   │         | key columns  | id
   └── scan  |              |
             | table        | t@idx_d
             | spans        | /!NULL-/1/PrefixEnd

Ah, that’s where the optimizer kicks in. Makes sense. Thanks a lot for your answer!