I would like some advice on building a multi-tenant database in cockroach db.
We have a column “tenantId” in each table, identifying the rows that are owned by a tenant. This column is the first column in our PK, followed by the id of the entity. When joining multiple tables, I assume that it would be preferable if all the data from one tenant are (as much as possible) on the same machine (range/shard?), however, reading https://github.com/cockroachdb/cockroach/blob/master/docs/design.md suggests that keys starts with database and table names, which to me makes it more likely that the data in a table is co-located, rather than the data for a tenant. If this is true, joins would often span multiple ranges, which seems inefficient.
Suppliers: tenantId, supplierId, supplierName SupplierAddresses: tenantId, supplierId, address, zipcode
Would give these keys (
<db>:<table>:<primary-key>[:<secondary-key>]) for tenants 1&2 with one supplier and one address each:
supplierdb:suppliers:1:1 supplierdb:suppliers:2:1 supplierdb:supplieraddresses:1:1 supplierdb:supplieraddresses:2:1
Assuming a range can only contain 2 key-value pairs, the above data is sharded like this:
But it would have been preferable to shard it like this:
Are my assumptions about the sharding correct? Is there any way of co-locating a tenant’s data, to improve joins? Any best practices for this kind of design in cockroachdb?