This is probably a question that is asked to dead by now.
If the data replication is 3 and you have 5 nodes.
n1: dataset1, dataset2
n2: dataset2
n3: dataset2
n4: dataset1
n5: dataset1
When n4 and n5 get removed we come to:
n1: dataset1, dataset2
n2: dataset2
n3: dataset2
dataset1 became unavailable as expected. dataset2 keeps going, and introducing another dataset3 is possible because we got 3 nodes up.
From this behavior, it seems that a range in CRDB acts like a Raid5.
dataset1: n1 data, n4 data, n5 parity
Is this how CRDB really works under the hood? This also explain why a 5 set can handle 2 failures ( technically Raid 6 ) and beyond … If this is correct, it may be better to add this information to the manual.
On the other hand, if the data is
dataset1: n1 data, n4 data, n5 data ( where each is a copy ).
Going to n1 data alone, with 2 “spare” nodes like n2 and n3 really needs to result in a new replication on n2 and n3, does it not?
I understand that a RAFT needs two at minimum to work but it feels strange to actually have a full set of data and it being unavailable because the synchronization mechanism is the real stopper?