Regional by row table slow on non-unique index queries

I have a 3 region 9 nodes cluster and a regional by row table named todos with a primary varchar column called id and a non-unique varchar column called group_id. I created a secondary index on group_id and SHOW INDEXES FROM todos looks like this:

  table_name |     index_name     | non_unique | seq_in_index | column_name | direction | storing | implicit
-------------+--------------------+------------+--------------+-------------+-----------+---------+-----------
  todos      | primary            |   false    |            1 | crdb_region | ASC       |  false  |   true
  todos      | primary            |   false    |            2 | id          | ASC       |  false  |  false
  todos      | todos_group_id_idx |    true    |            1 | crdb_region | ASC       |  false  |   true
  todos      | todos_group_id_idx |    true    |            2 | group_id    | ASC       |  false  |  false
  todos      | todos_group_id_idx |    true    |            3 | id          | ASC       |  false  |   true

To my surprise, although executed from the same region where the rows are created, SELECT * FROM todos WHERE id = _ is very fast while SELECT * FROM todos WHERE group_id = _ is very slow (reaching other regions).

explain select * from todos where id = 'abc':

------------------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • union all
  │ estimated row count: 1
  │ limit: 1
  │
  ├── • scan
  │     estimated row count: 1 (33% of the table; stats collected 9 minutes ago)
  │     table: todos@primary
  │     spans: [/'gcp-asia-east2'/'abc' - /'gcp-asia-east2'/'abc']
  │
  └── • scan
        estimated row count: 1 (33% of the table; stats collected 9 minutes ago)
        table: todos@primary
        spans: [/'gcp-europe-west4'/'abc' - /'gcp-europe-west4'/'abc'] [/'gcp-us-central1'/'abc' - /'gcp-us-central1'/'abc']
(16 rows)

Time: 47ms total (execution 1ms / network 45ms)

explain select * from todos where group_id = 'abc';:

------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ estimated row count: 0
  │ filter: group_id = 'abc'
  │
  └── • scan
        estimated row count: 3 (100% of the table; stats collected 10 minutes ago)
        table: todos@primary
        spans: [/'gcp-asia-east2' - /'gcp-us-central1']
(11 rows)

Time: 48ms total (execution 1ms / network 46ms)

Is this because the group_id column is not unique? Any related documentation on this?

Hi @thezjy

Tech writer here. We do have some docs on this that I think will answer your question

On the ADD CONSTRAINT docs page we have a section Add a unique index to a REGIONAL BY ROW table, which has info about how indexes work on these tables. It also calls out that a unique index can allow the optimizer to limit the search to data on machines in the local region:

Note that there is a performance benefit for queries that select a single row (e.g., SELECT * FROM users WHERE email = 'anemailaddress@gmail.com' ). If 'anemailaddress@gmail.com' is found in the local region, there is no need to search remote regions. This feature, whereby the SQL engine will avoid sending requests to nodes in other regions when it can read a value from a unique column that is stored locally, is known as locality optimized search .

Therefore I think the answer to your question

Is this because the group_id column is not unique?

is yes.