I admit to being a bit lazy and not empirically validating some assumptions here, but I am curious if there’s a simple answer to: how many compound or clauses will the optimizer unroll into index/PK lookups? And what the general performance practices are around choosing when to union to hit the PK vs an in clause.
Say I have the following table:
CREATE TABLE thing_labels ( id UUID NOT NULL PRIMARY KEY, key STRING NOT NULL, value STRING NOT NULL, thing_id UUID NOT NULL, -- notice this is not a unique index INDEX (key, value) );
In general which is the better query on disjointed label lookup? Is it:
SELECT thing_id FROM thing_labels WHERE (key = 'fookey' AND value = 'foovalue') OR (key = 'barkey' AND value = 'barvalue') OR (key = 'bazkey' AND value = 'bazvalue')
Or is it:
SELECT thing_id FROM thing_labels WHERE key = 'fookey' AND value = 'foovalue' UNION ALL SELECT thing_id FROM thing_labels WHERE key = 'barkey' AND value = 'barvalue' UNION ALL SELECT thing_id FROM thing_labels WHERE key = 'bazkey' AND value = 'bazvalue'
Or might the optimizer convert them to the same thing? Does that hold no matter the key/value lookup count I have? I would like a single query to be atomic w/out explicit transaction and client-side retry. My use case doesn’t support inverted index for this use case.
I am wondering if there is general best practice here? E.g. “union is better because you’ll always hit the indexes no matter how many key/values you match” or “OR clauses are better because union does an extra expensive thing” or “the cost-based optimizer is not predictable here and it really depends on table size and key/value lookup count, and there isn’t a specific number for when it switches to scanning vs index use”.