Rebalance of replicas of the same table across all stores in a node

Setup: a 3-nodes cluster, and each node has two Stores, totally 6 stores.

I create one single table and insert lots of data continually, and so far there are 1300+ ranges, 0.3 billion rows in this table. As the number of concurrent db connection grows, latency grows.
And I find that all replicas of this table are located on only 3 stores that each store on each node, the performance bottleneck is write bandwidth of SSD.

So, does CockroachDB support distribute replicas of the same table to all stores on all nodes ?


Hi @wudi, you’ve hit an unfortunate known edge case in our rebalancing system. We never put two replicas for the same range on the same node (even if the node has more than one store) for availability reasons. We wouldn’t want one node going down to make an entire range unavailable.

As a result, we can’t properly rebalance in a cluster that only has 3 nodes, no matter how many stores those nodes have, because there’s no way for us to move a replica from one store on a node to another without at some point exposing the cluster to the risk of a single node failure causing unavailability for the range being moved.

If you were to add a fourth node, everything would rebalance evenly across all your stores, because everything would be able to rebalance through it. Does that make sense?


yeah, never putting two replicas for the same range on the same node makes sense to me, but one table may have lots of ranges, and why can’t different ranges located at different stores on same node ?

suppose that we have three nodes Node1(including store1, and store2),Node2(including store3, store4 ),Node3(including store 5, store6), and we have a table names T1 that have two ranges R1 and R2. I think that

replicas of R1 are located at store1, store3, store5 and replicas of R2 are located at store2, store4, store6 are reasonable ? However, in my benchmark, both replicas of R1 and R2 are located at store1, store3, store5.

The only issue I’m aware of is the one I explained in my original response. Ranges from the same table can be located on different stores on the same node, they just can’t be properly rebalanced in a cluster that only contains 3 nodes.