Subquery evaluation on simple table structure

Hi there.

I experience a weird behavior and would like to have your opinion on it.

Given two tables, users and user_role, I’d like ho have all roles, a certain user is in (could be quite a lot because of the design of the system which does not matter here).

Table users consist of pkuserid and username (beside others) and table user_role contains only fkuserid referencing users table and pkroleid referencing the roles (in a third table that is out of interest here). Furthermore I have an index on users.username and on user_role.fkuserid.

I experienced quite long query times and investigated further.

Having the two tables joined takes approx 5 seconds!

select ur.fkuserid, ur.fkroleid 
from 
    user_role ur,
    users u
where 
    ur.fkuserid = u.pkuserid
and u.username = 'username'
limit 10000;

as well as having a subselect.

select ur.fkuserid, ur.fkroleid 
from 
    user_role@active_user_role ur
where 
    ur.fkuserid = (select u.pkuserid from users u where u.username = 'username')
limit 10000;

However, when I avoid the join by previously acquiring the user’s primary key it only takes 5ms!

select ur.fkuserid, ur.fkroleid 
from
    user_role ur
where 
    ur.fkuserid = 123456
limit 10000;

The user_role table is quite big: approx. 2 Mio entries and the users table has about 170.000 rows.

It seems to me that the subquery and the join expression gets evaluated every time for each hit in the user_role table.

Query plan: is

Level |Type   |Field      |Description                 |
------|-------|-----------|----------------------------|
0     |limit  |           |                            |
1     |render |           |                            |
2     |scan   |           |                            |
2     |       |table      |user_role@active_user_role  |
2     |       |spans      |ALL                         |
2     |       |subqueries |1                           |
3     |limit  |           |                            |
4     |render |           |                            |
5     |scan   |           |                            |
5     |       |table      |users@users_username_key    |
5     |       |spans      |/"username"-/"username\x00" |

Hi @sahlex,

Subqueries only get evaluated once per statement in CockroachDB currently. We don’t currently support what you are guessing at - that the subquery would be evaluated once per row.

My guess is that the slow part of this query is simply SELECT u.pkuserid FROM users u WHERE u.username = 'username'. Could you try running that one by itself? Perhaps you don’t have an index on users.username?

You can investigate further yourself by looking at the output of EXPLAIN on these queries.

Jordan

@jordan from the query plan, it looks like there is a users_username_key index that we’re able to use efficiently in the join. On the other hand, the join is scanning all of user_role@active_user_role, which I suspect is taking a long time. Perhaps we’re using the wrong index for some reason.

@sahlex, would you mind providing the full schema of each table? You can use SHOW CREATE TABLE <tablename> on each table to accomplish. That way, we can play around with the query on our end.

Also, to test my theory from above, could we try running the following query and see how long it takes:

select ur.fkuserid, ur.fkroleid 
from
    user_role@{FORCE_INDEX=active_user_role} ur
where 
    ur.fkuserid = 123456
limit 10000;

We don’t incorporate values from subqueries into index constraints (we plan the query before running the subquery). So with the second query as written, we will scan the entire user_role table (as the plan shows: “spans ALL”).

The feature that we’re missing here is nested-loop joins, unfortunately we don’t implement that yet (but it’s definitely on our radar).

@jordan Actually there is an index on username. The query takes only 2ms to evaluate.

@nathan when I try to execute the given query, I’m getting a syntax error:

select ur.fkuserid, ur.fkroleid
            -> from
            ->     user_role{FORCE_INDEX=active_user_role} ur
            -> where
            ->     ur.fkuserid = 123456
            -> limit 10000;
invalid syntax: statement ignored: syntax error at or near "{"
DETAIL: source SQL:
select ur.fkuserid, ur.fkroleid
from
    user_role{FORCE_INDEX=active_user_role} ur

I think it’s rather:

select ur.fkuserid, ur.fkroleid 
from
     user_role@active_user_role ur
where 
    ur.fkuserid = 123456
limit 10000;

which only takes 2ms.

@radu Well that certainly explains why it’s happening.

I work around the issue by firing two queries…
(Certainly not very elegant as it involves another round-trip to the database)

Sorry that’s my mistake, I meant user_role@{FORCE_INDEX=active_user_role}.

@sahlex could you try the following queries?

select ur.fkuserid, ur.fkroleid 
from 
    users u,
    user_role ur
where 
    ur.fkuserid = u.pkuserid
and u.username = 'username'
limit 10000;

and

select ur.fkuserid, ur.fkroleid 
from 
    users u,
    user_role@{FORCE_INDEX=active_user_role} ur
where 
    ur.fkuserid = u.pkuserid
and u.username = 'username'
limit 10000;

@nathan,
the first one takes 5-6 seconds, the second one around 4 seconds.

@sahlex could you provide the EXPLAIN output from the second one?

As @radu mentioned, the feature missing here that would make this work as expected is nested-loop joins (or “lookup joins”). This is on our radar and has recently been prioritized. See https://github.com/cockroachdb/cockroach/issues/19038.

Sure. Here you go:

Level |Type   |Field    |Description                 |
------|-------|---------|----------------------------|
0     |limit  |         |                            |
1     |render |         |                            |
2     |join   |         |                            |
2     |       |type     |inner                       |
2     |       |equality |(pkuserid) = (fkuserid)     |
3     |scan   |         |                            |
3     |       |table    |users@users_username_key    |
3     |       |spans    |/"username"-/"username\x00" |
3     |scan   |         |                            |
3     |       |table    |user_role@active_user_role  |
3     |       |spans    |ALL                         |