In the mean time I’m doing some disaster recovery test using new 2.0 with 3 datacenters (3x3 setup)
Is this still something that will be available in 2.1, and do you have any documentation yet I can review on how this will be achieved? Thanks
Curious about this as well, just ran into on a test cluster, someone shut down too many nodes
In 2.1 we implemented a repair tool for a particular class of failure (specifically when some ranges have been reduced to a single replica in a 3-replica configuration). This tool has risks (such as corrupting secondary indexes) so we’re not ready to promote it for general use, but if you have an emergency that might require such a tool please talk to us.
We’re going to make continued investment in this kind of recovery tooling for 2.2, but tools like this will never be perfect and your primary defense against data loss will always be replication and keeping regular and well-tested backups (replication is not a substitute for backups, as this issue demonstrates).