Replication question

(Benny) #1

This is probably a question that is asked to dead by now.

If the data replication is 3 and you have 5 nodes.

n1: dataset1, dataset2
n2: dataset2
n3: dataset2
n4: dataset1
n5: dataset1

When n4 and n5 get removed we come to:

n1: dataset1, dataset2
n2: dataset2
n3: dataset2

dataset1 became unavailable as expected. dataset2 keeps going, and introducing another dataset3 is possible because we got 3 nodes up.

From this behavior, it seems that a range in CRDB acts like a Raid5.

dataset1: n1 data, n4 data, n5 parity

Is this how CRDB really works under the hood? This also explain why a 5 set can handle 2 failures ( technically Raid 6 ) and beyond … If this is correct, it may be better to add this information to the manual.

On the other hand, if the data is

dataset1: n1 data, n4 data, n5 data ( where each is a copy ).

Going to n1 data alone, with 2 “spare” nodes like n2 and n3 really needs to result in a new replication on n2 and n3, does it not?

I understand that a RAFT needs two at minimum to work but it feels strange to actually have a full set of data and it being unavailable because the synchronization mechanism is the real stopper?


(Ron Arévalo) #2

Hey @Benny,

In your example of having a replication factor of 3 and 5 nodes, you would only be able to survive the loss of 1 node. The number of failures that can be tolerated is equal to (Replication factor - 1)/2. This is because, as you mentioned, the raft consensus protocol requires 2/3 to maintain quorum. So when you lose 1 node in a 5 node cluster, you’d still maintain the majority (2/3), if you were to lose 2 nodes, some ranges would lose their majority, if you happen to lose system ranges, the entire cluster becomes unavailable. If those are table ranges, then the corresponding data become unavailable. In either case, losing 2 nodes with a replication factor of 3 will cause you to lose access to data.

Let me know if you have any questions.




(Benny) #3

Hello @ronarev

My question goes more into the direction:

  • Lets say you have replication 3 and two nodes die. Your system is unworkable. Now for hypothetical, lets assume that we can not get node 2 and 3 back up ( both have their disks destroyed and lets say no backups or backups are older then the actual server data ).

We introduce two new nodes. We know that node 1 still has a full up-to-date copy of the data and the problem is simply the lack of a quorum to replicate.

Can CRDB be forced to start replicating its ranges on the two new nodes, using the original leftover range? We already found out that CRDB keeps a query of table/range changes until a data range is back active, so any data loss is also covered.

Its a question that i can not find the answer because most explain the raft / quorum but neglect the “leftover” node / range and what is still possible ( or what is not possible ).

  • If the above is not possible: Why?



(Ron Arévalo) #4

Hey @Benny,

So to clarify, you want to know if you can recover from 1 if you have lost everything else? Is that correct?

If so, this is not possible, yes, the data is still there, however there is no quorum and so the solo node will not be able to do anything until it gets quorum back. If you just turn on one node, for all it knows, the other two are still out there somewhere, operating as a cluster and missing a node
so it can’t just start serving traffic from its copy, since that copy could be out of date
that’s what is sometimes called a “split-brain” scenario in distributed systems if the cluster becomes separated, you don’t want both sides thinking that they are the cluster.

Does that answer your question?

Let me know!



(Benny) #5

Does that answer your question?

Indeed. But does it not make sense ( if needed ), that we can issue a command to tell a node to replicate the missing node? In other words, manual intervention can resolve a issue where automation is more dangerous.


(Raphael 'kena' Poss) #6

So in this conversation there are two separate problems:

  • recovery from loss of quorum, for example your replication factor is 7 and you lose 4 nodes. There are still 3 copies.

    CockroachDB does not yet provide tooling to recover from loss of quorum, however this is planned. We have some internal tooling for debugging and we plan to productionize this in a future version.

  • recovery from a single replica, for example your replication factor is 3 and you lose 2 nodes, or 2 and you lose 1 node. This is an extremely serious situation because then the consistency of the data is harder to guarantee. Consider that with just 1 replica, there could be data errors in that replica and there would be no way to detect it (with 2 more copies you can start to detect an inconsistency; with 3+ you can correct it).

    CockroachDB does not yet provide tooling to recover from a single replica. It is likely that such tooling will be provided in the future; however, we will always strongly discourage attempts to recover from a single replica and instead recommend to restore from backups instead.