Thanks Alex.
I wasn’t aware that a 3 node cluster could lose 2 nodes and still run. The docs you linked to say that an even number of nodes doesn’t give added resiliency and that “to survive two simultaneous failures, you must have five replicas.”
I suppose you mean that a 3 node cluster can withstand 2 non-simultaneous failures, but now I wonder why we don’t get a split brain situation if one of the two remaining nodes fail. Would this not be similar to having a 4 node cluster where 2 nodes fail and now the cluster doesn’t have a quorum? My experience is with XtraDB cluster and those docs say that split brain occurs when half the nodes go down, even if it’s not simultaneous.
If I understand you correctly, although the docs lean towards using an odd number of nodes to survive multiple simultaneous failures, using an even number of nodes still provides resiliency as long as the failures don’t happen simultaneously.
For example, in a 4 node cluster 1 nodes fails, then hours later a second node fails, then later a third node fails. It sounds like CRDB would gracefully make sure that all replicas are available on whatever nodes remain provided that there is time to do so between failures.