How does CockroachDB scale


#1

I have played a bit with CockroachDB and build a 3 node cluster to test write performance.

One simple script doing one line inserts in a loop. (I know multilines is preferd, but thats not the use case)
For my system I get the following
1x thread per node, on 1x node =~ 220 one line inserts /s
1x thread per node, on 3x node =~ 400 one line inserts /s

2x thread per node, on 1x node =~ 320 one line inserts /s
2x thread per node, on 3x node =~ 600 one line inserts /s

8x thread per node, on 1x node =~ 800 one line inserts /s
8x thread per node, on 3x node =~ 1800 one line inserts /s

32x thread per node, on 1x node =~ 2000 one line inserts /s
32x thread per node, on 3x node =~ 3000 one line inserts /s

64x thread per node, on 1x node =~ 3000 one line inserts /s
64x thread per node, on 3x node =~ 3000 one line inserts /s

my question is, how does cockroachDB scale for writes?
as you can see at least on my systems it does not matter if I used 1 or 3 nodes to make inserts.
the maximum is 3000/s

Should the write speed go up with more nodes?
or will it stay flat and for writes number of nodes is irrelevant?
eg. if I have 30 nodes instead of 3, will i get seomthing like ~30k inserts/s?


(Raphael 'kena' Poss) #2

Thanks Dave for your interest in CockroachDB!

How much scale you can get depends on whether your writes cause contention.

If the values in the indexed columns (either in the primary key or the secondary indexes) are “close to each other” (high data locality, e.g. a timestamp column) then the writes will be contended and scalability will be limited by the processing speed (CPU+disk) of a single node. Then scalability must be achieved vertically.

If the values written to indexed columns are more uniformly distributed (e.g. a UUID column, e-mail, names in non-alphabetical order, etc, or a SERIAL column which has a special behavior in CockroachDB), then the writes are not contended and the scalability can happen horizontally by increasing the number of nodes.


(Radu Berinde) #3

Dave, are you using the default zone config? The default is 3x replication; so it would be no surprise that 3 nodes are no better for writes than 1 (we are effectively doing 3 writes instead of 1). If you want a fair comparison, either change the zone config to 1 replica, or compare 3 nodes to 6.


#4

Is there anywhere we can find more documentation on write contention and how to avoid it? I’m confused how a timestamp column can cause write contention.


(Raphael 'kena' Poss) #5

If you populate a timestamp column with the “current time” (e.g. via now()) then two successive writes will have two values close to each other. When CockroachDB projects the SQL value into KV, the key representations become also close to each other.

Then consider that CockroachDB splits the key space into ranges. Writes to different ranges can be processed independently, but writes to the same range are serialized: two nodes writing to the same range will send their write command to a single node (the raft leader for the range) to “resolve” the write.

So if you write values logically close to each other in the same table from separate nodes, they will land in the same KV range, and the write will be processed (resolved) on a single node. This is called contention: with more nodes attempting to write to the same range, the performance becomes limited to the maximum throughput of a single node.


#6

Is this only when the timestamp is an index key? Or does it happen even if timestamp is a regular data field.


(Rebecca Taft) #7

It should only matter if there is a primary or secondary index on the timestamp field.


(tim page) #8

This post was flagged by the community and is temporarily hidden.