Creating index with geometry and other non-spatial data

Hi,

Is there any way to create an index that includes both spatial data and non-spatial data? For example, I have this table:

create table "location"
(
    id    uuid primary key,
    timestamp    timestamptz,
    point    geometry,
    name    string
)

And I would like to search for points that are in an area provided in the query within a time range. Can I simply create an index with timestamp and point, or do I have to separate both of them into two indexes? Thank you.

Hello!

It’s possible to create multi-column inverted indexes where the point column will be after the timestamp column (e.g., Inverted Indexes | CockroachDB Docs). However, the primary use case is queries where only a few constant values can be provided for the columns preceding the geo column (like regional by row). An INVERTED INDEX (timestamp, point) will only help if timestamp is held constant. If querying for a range of timestamps, this index can’t be used.

Your best bet at the moment is to create an index on the column where the filter is most selective - or create an index on both and the optimizer will choose the best index based on selectivity. Both indexes won’t be used in the same query, though. Which is fine - it likely wouldn’t help.

Do you expect most of the cardinality reduction to be due to the timestamp predicate or the point predicate?

Hi, thanks for the quick reply!

There will be > 10M rows inserted into the table in production every day, so I expect the queries will mainly be filtered by timestamp first. You mentioned that an inverted index couldn’t be used with timestamp range, and only a single index will be used per query. So I am wondering if the following query can use two indexes…

  1. Create two indexes, one for timestamp and one for point.
  2. When query, first filtered by timestamp using CTE and called time_range.
  3. Then filtered by point in time_range.

There is a way to rewrite this query such that two indexes are used. Suppose your schema and query look like:

CREATE TABLE t (
  id UUID PRIMARY KEY,
  t TIMESTAMPTZ,
  p GEOMETRY,
  INDEX (t),
  INVERTED INDEX (p)
);

SELECT * FROM t WHERE t > '2021-01-01' AND ST_Equals(p, 'POINT(0 0)');

You can rewrite the query to use both indexes:

SELECT * FROM t WHERE id IN (
  SELECT id FROM t@t_t_idx WHERE t > '2021-01-01'
  INTERSECT
  SELECT id FROM t@t_p_idx WHERE ST_Equals(p,'POINT(0 0)')
);

You can see that the resulting query plan utilizes both indexes:

  • lookup join
  │ estimated row count: 0
  │ table: t@primary
  │ equality: (k) = (k)
  │ equality cols are key
  │
  └── • intersect
      │ estimated row count: 0
      │
      ├── • scan
      │     estimated row count: 1 (100% of the table; stats collected 6 minutes ago)
      │     table: t@t_t_idx
      │     spans: [/'2021-01-01 00:00:00.000001+00:00' - ]
      │
      └── • filter
          │ estimated row count: 0
          │ filter: st_equals(p, '010100000000000000000000000000000000000000')
          │
          └── • index join
              │ estimated row count: 0
              │ table: t@primary
              │
              └── • inverted filter
                  │ inverted column: p_inverted_key
                  │ num spans: 31
                  │
                  └── • scan
                        estimated row count: 0 (11% of the table; stats collected 6 minutes ago)
                        table: t@t_p_idx
                        spans: 31 spans

Note that this isn’t guaranteed to be more efficient, so you’ll have to experiment. It’s also potentially dangerous because it will prevent the optimizer from falling back to using a single index which could be more efficient with different constants in the filter, or as the dataset changes over time.

The optimizer doesn’t currently consider this transformation, but ideally it would try to perform this rewrite automatically. This would allow the optimizer to compare this alternate amongst other query plans that use a single index on a per-query basis. I’ve created an issue to track this optimizer feature: opt: split conjunctions into PK intersections of two constrained index scans · Issue #72886 · cockroachdb/cockroach · GitHub

1 Like