Cluster Restart After Disaster

Hi,

I have deployed CockroachDB by using Kuberenetes Operator method. Suddenly, have hardware problem which make all CockroachDB pod terminating and lost, except one pod. This one pod (cockroachdb-0) is running but not serving since it is the only one. The PVCs are still bound even cockroachdb-1 and cockroachdb-2 are lost. How to restart the cluster for making these lost pods running again? I have tried to do rolling restart of Statefulset, but both pods are not coming.

Best regards.

Hi Lazuardi, have you had any success reconnecting to the lost pods?

Hi Lauren,

I have no success on that. Fortunately, the full backup schedule has been finished few hours before the disaster, so I did full restore from that.

Anyone has success on reappearing the missing pods?

Best regards.

Hi Lazuardi,

I got some information from team, and unfortunately there isn’t a way to get the lost pods running again.

If quorum is lost and the drives are lost, a restore from backup is needed. If you lose 2/3rds of a CockroachDB cluster however you install it, it’s unrecoverable.

It sounds like you did successfully restore from a backup to a new cluster?

Hi Lauren,

Yes, I can restore the last backup. But it takes time and I think it is better if we can do bootstraping like MySQL Galera.

Best regards.