Split-brain scenarios in multi-AZ ?

The docs are remarkably silent on how (if at all ?) split-brain scenarios are handled.

For example, given the following scenario:

  • Two zones, A & B (representing two buildings)
  • Three Cockroach nodes in each zone
  • Replication Factor of 3 or 5

What happens in a split brain scenario where the link between the two buildings is severed ?

With both RF 3 and RF 5, there would be an immediate failure of > 2 nodes, correct ? So what happens about (a) ongoing operations (b) recovery ?

Would this change in the following alternative scenario ?

  • Three zones, A, B &C
  • Three Cockroach nodes in each zone
  • Replication Factor of 3 or 5
  • The break happens as follows: A <-/line-fault/-> B&C

Hello!

With both RF 3 and RF 5, there would be an immediate failure of > 2 nodes, correct? So what happens about (a) ongoing operations (b) recovery ?

In a split brain scenario ongoing operations are suspended and wait/block until the partition resolves. There are no writes during the partition (they are blocked), so we don’t need to re-sync. The block is only for the minority side, so the majority side can proceed.

  • Writes don’t get committed until a majority has acknowledged them. If the acknowledgement went through, the write is ok; if the acknowledgement is blocked, the write is considered aborted.
  • Reads that encounter a write that’s too recent (i.e. we’re not sure 100% that it’s not going to abort) need to wait, or retry, or push the write forward in time.

Ongoing operations resume when the partition resolves.

Would this change in the following alternative scenario ?

  • Three zones, A, B &C
  • Three Cockroach nodes in each zone
  • Replication Factor of 3 or 5
  • The break happens as follows: A <-/line-fault/-> B&C

3 zones is better (assuming their zone configs are properly set up) because then 1 zone failure makes it possible for the remaining 2 zones to continue operating.

That’s very useful thank you.

I was under the incorrect impression that the whole cluster (of e.g. 3AZ) would stop if there was no quorum, not that the majority 2AZ side would still work. So that’s good news.

Just to clarify, in the A <-/line-fault/-> B&C scenario, would A still be available for reads ? Or does it block for both reads and writes?