CockroachDB for Event Store


(Hyun Min Choi) #1

Hi,

I’m currently developing a backend application using event sourcing and CQRS, and was considering CockroachDB as a database for the project. Like many event sourced applications, I’m storing a BLOB in a row for every event, and occasionally saving a snapshot for an entity for every 50 persisted events or so. The size of a blob may become as large as 10 kilobytes, and we are expecting 0.5 milllion concurrent users, with each user persisting approximately 0.5 events/sec. (Of course, this is a really wild guess.)

In this case, will CockroachDB as an event store be a good option? I’m not hoping for unbelievable latency numbers, and will be happy if the P50 value stays lower than 12ms or so. So far, the design as I understand by reading the docs seems really promising, as the scaling out option for the cluster seemed like a no-brainer, and the consistency guarantees of CockroachDB itself seems like a good reason to use it as an event store.

Thanks!


(nathan) #2

Hi,

Thanks for the interest in CockroachDB! Without hard numbers and a full spec of your desired cluster topology, it’s tough to predict exactly what kinds of latency figures you should be able to expect from Cockroach. For instance, how do you plan to distribute the nodes within your cluster? Will there be substantial inter-node latency?

Still, given a reasonable schema and typical access patterns, it sounds like your plan to use Cockroach as an event store is well within bounds of the kinds of workloads that excel on CockroachDB. From the rough numbers you gave, you’ll be writing about 5000 entities per second, which Cockroach should be able to handle without issue.


(Hyun Min Choi) #3

Hi Nathan, thanks for the reply!

I was considering maintaining a CockroachDB on AWS (or maybe GCP), with 10 ~ 20 i3.4xlarge cluster nodes inside a VPC. Do you currently have a guess on how the latency levels will be like? What if we maintain Cockroach in a multi VPC topology, say with some instances in Tokyo, Seoul, Oregon, and so on (which I guess will lead to some amount of inter node latency)?


(Hyun Min Choi) #4

And by the way, we will be persisting entity snapshots along with events (the snapshots are taken periodically, and both events and snapshots are persisted), so I am thinking about 5000 * 50 = 250000 inserts per second.

Our schema is going to be quite simple in size. The journal table contains the persisted events, and the snapshot table contains the persisted snapshots.

DROP TABLE IF EXISTS journal;

CREATE TABLE IF NOT EXISTS journal (
ordering BIGSERIAL,
persistence_id VARCHAR(255) NOT NULL,
sequence_number BIGINT NOT NULL,
deleted BOOLEAN DEFAULT FALSE,
tags VARCHAR(255) DEFAULT NULL,
message BYTEA NOT NULL,
PRIMARY KEY(persistence_id, sequence_number)
);

CREATE UNIQUE INDEX journal_ordering_idx ON journal(ordering);

DROP TABLE IF EXISTS snapshot;

CREATE TABLE IF NOT EXISTS snapshot (
persistence_id VARCHAR(255) NOT NULL,
sequence_number BIGINT NOT NULL,
created BIGINT NOT NULL,
snapshot BYTEA NOT NULL,
PRIMARY KEY(persistence_id, sequence_number)
);

Most of the inserts will go to the journal table, and since the number of inserts is likely to go quite high, I’m wondering whether maintaining a cluster with > 10 i3.4xlarge level nodes will give us the performance we need.

One again, thanks for your help!


(nathan) #5

Thanks! The schema looks pretty straightforward and is certainly one that Cockroach can handle.

In terms of provisioning and making latency and throughput estimates, that part is always a bit harder to know in advance. I reached out to one of our sales engineers who has experience making these kinds of assessments. He mentioned that we have a simulator to provide estimates and that he’d be happy to help. Specifically, he would like to set up a call to discuss assumptions of the locality of clients, the data distribution, and the workload, as all of these play into expected performance and necessary provisioning. Would you mind requesting an evaluation through https://www.cockroachlabs.com/pricing/sales/?


(Hyun Min Choi) #6

Hi Nathan, thanks for the confirmation!

Our team is currently discussing the characteristics a database for our product should have. After this is over, I’ll make sure to request an evaluation. Again, thanks for all your support.