when testing how cockroach will survive a single node failing i did:
- start a 3 node cluster, which now has 93 ranges perfectly replicated
- unclean shutdown one node and destroy the disk
- bring back the machine
- now i have 4 nodes, 1 down, 74 ranges and 34 undereplicated
the lost node is never removed from the list.
in metrics, i can see all 74 ranges are resynced to the crashed machine, so this works fine,
but aftee waiting several days the dead node is still there