Efficiently select rows based on array of ranges

I have this table. Column val is indexed.

create table public.playground (
  foo text,
  val bigint,
  rowid bigint primary key not null default unique_rowid()
);
create index val_index on playground using prefix (val);

I have a list of ranges [(min1, max1), (min2, max2), ... , (minN, maxN)] and I want to select all rows with column val that fit in any of those ranges.
E.g. my ranges looks like that: [(1,5), (20,25), (200,400)] Here is the simple query that extracts corresponding rows:

select p.*
from playground@val_index p
where (p.val between 1 AND 5) or
    (p.val between 20 and 25) or
    (p.val between 200 and 400);

It uses the index and the query is efficient, here is the output of the explain:

""	distributed	true
""	vectorized	false
index-join	""	""
 β”‚	table	playground@primary
 └── scan	""	""
""	table	playground@val_index
""	spans	/1-/6 /20-/26 /200-/401
""	filter	((val <= 5) OR ((val >= 20) AND (val <= 25))) OR (val >= 200)

But the problem that the number of ranges is dynamic I can have array of 300+ ranges. (Ranges don’t overlap, my app will ensure it)
Here is how I tried to rewrite the query:

select p.*
from playground@val_index p, (values (1, 5), (20, 25), (200, 400)) as r(min_val, max_val)
where p.val between r.min_val and r.max_val;

And the explain output:

""	distributed	true
""	vectorized	false
render	""	""
 └── cross-join	""	""
      β”‚	type	inner
      β”‚	pred	(val >= column1) AND (val <= column2)
      β”œβ”€β”€ values	""	""
      β”‚	size	2 columns, 3 rows
      └── index-join	""	""
           β”‚	table	playground@primary
           └── scan	""	""
""	table	playground@val_index
""	spans	ALL

As I understand it doesn’t use the index on the val column even though I put a hint on it.
I tried to do the same query (without index hint syntax) in Postgres and I did use the index:

Nested Loop  (cost=0.13..12.52 rows=2 width=44) (actual time=0.057..0.069 rows=4 loops=1)
  ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=8) (actual time=0.004..0.005 rows=3 loops=1)
  ->  Index Scan using val_index on playground p  (cost=0.13..4.15 rows=1 width=36) (actual time=0.016..0.017 rows=1 loops=3)
        Index Cond: ((val >= "*VALUES*".column1) AND (val <= "*VALUES*".column2))
Planning Time: 0.276 ms
Execution Time: 0.129 ms

How can achieve the same efficiency with CockroachDB?
If the only way is to generate a between for each range, will it be ok to send to CockroachDB such a long query (300+ ranges).

Hi @georgysavva,

Sorry for the delay in response.

The only way to achieve the same efficiency with CockroachDB is to generate a between for each range. It should be ok to send CockroachDB a long query with 300+ ranges.

Regards,
Florence
Technical Support Engineer

1 Like