5 nodes cluster, 2 down nodes = Unavailable ranges?


I was doing some reliability testing and I saw that on our 5 nodes cluster, when 2 nodes go down, we are seeing unavailable ranges, I would have expected the cluster to survive this without issue as the quorum is still maintained.

Is this normal ? How to resolve this ?


If you have a cluster replication factor of 3 you can only down1 node, and if you have replication factor of 5 you can down 2 node clusters and the default replication factor is 3
So you can set the replicator to be 5

Thanks I’ll give it a try :slight_smile:

@kedare, you might also want to check out this Fault Tolerance training module. Slides 6 and 7 speak to your question.


I’ve been doing similar testing and have found the same issue with unavailable ranges, even though I have set the replication factor to be 5 for a 5-node cluster. This is against version v2.1.0-beta.20181008. I start a 5 node cluster and have 0 unavailable/under-replicated ranges. If I stop 2 nodes at the same time then I end up with unavailable ranges. Based on the solution I didn’t think this was supposed to happen?

The commands I used to set the replication factor are as below:

echo 'num_replicas: 5' | ./cockroach zone set .default --insecure -f -

echo 'num_replicas: 5' | ./cockroach zone set .liveness --insecure -f -

echo 'num_replicas: 5' | ./cockroach zone set .meta --insecure -f -

echo 'num_replicas: 5' | ./cockroach zone set system.jobs --insecure -f -

echo 'num_replicas: 5' | ./cockroach zone set evo --insecure -f -