I’m trying to model an existing system (based on a different DB) in CockroachDB.
The current schema (fetched from CRDB UI) is:
When I do a test fill with the following SQL statement (again, from the UI):
WITH inputrows (store, object_id, bucket, bucket_id, size, disk, start, "end") AS (VALUES (_, _::BYTES, _, _, _, ARRAY[_, _], ARRAY[_, _], ARRAY[_, _]), (__more10__)) , inserted_objects AS (INSERT INTO objects(store, id, size) (SELECT DISTINCT ON (store, object_id) store, object_id, size FROM inputrows) ON CONFLICT (store, id) DO NOTHING RETURNING store, id) , inserted_bucket_objects AS (INSERT INTO bucket_objects(store, oid, bucket, id) (SELECT store, object_id, bucket, bucket_id FROM inputrows) ON CONFLICT (store, oid, bucket, id) DO NOTHING RETURNING bucket, id) INSERT INTO object_locations(store, id, disk, start, "end") (SELECT input.store, input.object_id, unnest(input.disk), unnest(input.start), unnest(input.end) FROM inputrows AS input, inserted_objects WHERE input.object_id = inserted_objects.id)
I can see pretty low response times and heavy IO on one of the nodes of a three nodes cluster:
Is there anything inherently bad with the above schema/SQL statement?
The IDs are randomized (hashlib.sha512(struct.pack(‘q’, i)).digest()[:32], where i is a counter) and I can see balanced load amongst the machines.
Yet, only one of them does this crazy IO.
Also, I’m not sure 15 seconds of response time is normal for a 10 row insert with this CTE.
The query plan is here: