CRDB for real-time time-series workloads


I have a workload for more than 10k rps of data points inserts at the moment.
Here is an example of table columns:

INSERT INTO data VALUES (timestamp, user_id, event_id, amount)

All data comes from Kafka in real-time way (timestamp is always close to now() ).
Among other things there is reads workload (100-500 rps) for range queries like

SELECT SUM(amount) WHERE timestamp > now() - 24h

Is it suitable workload for CRDB ?

As I can undestand there is no way to use range based sharding here, because all writes comes into one shard (same problem in Spanner). On the other hand all reads are effective, because of timestamp column the first in primary key.

So, is there any idea how I can have both - effective writes (multiple shards) && reads using index ?
The first thing that comes to mind is hidden column, which can be used to split data into ranges during writes, but not used in primary key to have effective reads from each one.


Hi @kbespalov

This blog post describes CockroachDB’s native support for creating hash sharded indexes as a way to significantly improve performance on sequential workloads.

CockroachDB uses range partitioning by default, as opposed to hash partitioning. Hash sharded indexes allow users to use hash partitioning for any table or index in CockroachDB.

In 20.1, CockroachDB will introduce the following syntax in order to natively support hash sharded indexes, in order to achieve hash partitioning. This syntax was inspired by SQLServer and Postgres:

USING HASH WITH BUCKET_COUNT = <number of buckets>

Hope this info helps!

Technical Support Engineer