Recovering a cluster when 2/3 nodes fail

#1

Let’s say 2 out of my 3 nodes permanently fail; disks burned, CPU’s fried, they’re gone. I still got my 1 node with it’s data files. How do I start a new (1-node) cluster using these data files? If this isn’t possible, that would mean that I permanently lose my databases even though they’re technically still right there on disk, which would be a little silly.

(Ron Arévalo) #2

Hi @stijnv404,

Recovery from a single replica when you have a replication factor of 3 and you lose 2 nodes is an extremely serious situation because then the consistency of the data is harder to guarantee. Consider that with just 1 replica, there could be data errors in that replica and there would be no way to detect it (with 2 more copies you can start to detect an inconsistency; with 3+ you can correct it).

CockroachDB does not yet provide tooling to recover from a single replica. It is likely that such tooling will be provided in the future; however, we will always strongly discourage attempts to recover from a single replica and instead recommend that you have consistent backups that you are able to restore from.

Please let me know if you have any questions.

Thanks,

Ron

#3

I do understand the danger, but at least starting a new cluster with existing data would give me the opportunity to manually verify/fix that data. I’d very much prefer that to the data just being gone. As you said, it would be perfect if the option was there, but with huge warning signs.

(Tim O'Brien) #4

@stijnv404 - the issue to track is https://github.com/cockroachdb/cockroach/issues/17186. Feel free to follow and add $0.02 there.