CockroachDB for Realtime Updates

Hi,
I wanted to get some inputs for using CockroachDB as a store to keep all the transactional data that are being pushed to kafka in realtime, the tables can have less than a week /month of data which gets updated based on the changed data received from Kafka. I wanted to know few things like

  • How frequent can be the updates? Is bulk/batch updates every few minutes is good?
  • How would be the query performance in terms of complex joins and concurrent queries?

Bulk/batch updates every few minutes is “good” insofar that you are not trying to update millions of rows at a time :slight_smile: In general we can give advice about the average and burst number of inserts/updates per unit of time, not the number of batches. If you tell us more about your workload, we can probably give you more information.

The same response also applies to query performance: “it depends”. There is not enough information in your question to say for sure. CockroachDB can support complex joins and concurrent queries very well in some cases, not in others.

Hi Knz,
Thanks for the feedback :grin:, the workload is as follows, there are around 100 tables which am planning to keep in CoackroackDB for last 1week to 1month as warm data, for each of these tables there is a separate processor which reads from specific kafka topic with a moving window of 15-30 mins and the order of insert/updates per table would be around 10K - 100K records. Additionally, these tables would be used in near real time multiple times for doing some analytics and generating reports. Hence, wanted to know if CoackroachDB is the right candidate for this?

Did you just write that there are 100 tables and you are inserting 10.000 to 100.000 records to each of them every minute? This is between 1 and 10 million rows updated every minute. Remember what I wrote first above :slight_smile:
There are probably ways to set up your cluster and your schema so that this works properly with CockroachDB, but this workload would be stretching CockroachDB’s ability to “make data easy”.

Now on the other hand, if each table contains 10K-100K records in total and only a fraction is updated every minute, then it’s not a problem at all.

No Not every minute :sweat_smile:, some tables have millions of records, the range 10k - 100k what I shared is the fraction which either updates from those millions of records or just gets inserted on every 15/30 mins as a batch.

100K rows updated every 15 minutes per table sounds reasonable. The way to achieve this is to distribute the writes using concurrent connections to multiple nodes simultaneously. I’m also assuming the selected primary key allows the writes to be distributed (i.e. it enables scattering).

By distribute you mean, if I have 4 nodes for 100K records, each node gets a micro batch of 25K right? These tables are all mainly transactional data from different MySQL instances, most of them just have AUTO_INCREMENT or some transactional id which is always unique and am creating the tables under CoackroachDB Similarly, since am new to CoackroachDB, kindly share some leads if possible which can be useful for my use-case.

Yes I meant indeed if you have 4 nodes and want to insert 100K records you arrange your clients to send approx 25K rows through each node. You can even probably speed up the inserts a little bit more by opening multiple connections to each node.

As for the other question the general idea you need to understand is that CockroachDB decides which node is responsible for the write based on the value of the primary key. If all your new rows use values that are very close to each other, they will force handling by just one node or very few, which is bad for performance. The CockroachDB equivalent of “auto_increment” is postgres’ SERIAL type, which CockroachDB internally replaces by a suitably scattered ID generator. If you use anything else with too much value locality (for example, if you use sequential txn ids as PK), your may encounter issues.

Oh that’s great, thanks for your time :grin:, will get back if I encounter any issues.

1 Like