Restarting from decommissioned nodes

Hello,

I am testing cockroach running on kubernetes using the example statefulset. I was load testing and scaled the cluster up to ~40 nodes (k8s pods), and then began scaling the cluster back down. I added a pre-stop lifecycle hook that decommissioned the node, thinking it could be used to scale down more cleanly.

However, I accidentally rolled all nodes, meaning all nodes were decommissioned. I’m wondering what the proper steps to recovery are here? I first tried executing cockroach node recommission commands from the decommissioned pods, but all cockroach node commands hang. I then tried adding new nodes to the cluster, thinking I could recommission the old nodes from the healthy new nodes. However, the new nodes quickly start failing liveness checks, and any cockroach node * commands executed in those pods hang as well. Here are the startup logs from one of the newly added nodes:

++ hostname -f
+ exec /cockroach/cockroach start --logtostderr --insecure --advertise-host cockroachdb-21.cockroachdb.default.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%
W180712 18:32:20.073295 1 cli/start.go:909  RUNNING IN INSECURE MODE!

- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.

Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v2.0/secure-a-cluster.html
I180712 18:32:20.076214 1 cli/start.go:923  CockroachDB CCL v2.0.3 (x86_64-unknown-linux-gnu, built 2018/06/18 16:11:33, go1.10)
I180712 18:32:20.182768 1 server/config.go:330  available memory from cgroups (8.0 EiB) exceeds system memory 15 GiB, using system memory
I180712 18:32:20.182817 1 server/config.go:430  system total memory: 15 GiB
I180712 18:32:20.182898 1 server/config.go:432  server configuration:
max offset             500000000
cache size             3.7 GiB
SQL memory pool size   3.7 GiB
scan interval          10m0s
scan max idle time     200ms
event log enabled      true
I180712 18:32:20.182936 1 cli/start.go:789  using local environment variables: COCKROACH_CHANNEL=kubernetes-insecure
I180712 18:32:20.182963 1 cli/start.go:796  process identity: uid 0 euid 0 gid 0 egid 0
I180712 18:32:20.182986 1 cli/start.go:461  starting cockroach node
I180712 18:32:20.183850 5 storage/engine/rocksdb.go:552  opening rocksdb instance at "/cockroach/cockroach-data/cockroach-temp272311970"
I180712 18:32:20.201737 5 server/server.go:734  [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I180712 18:32:20.203690 5 storage/engine/rocksdb.go:552  opening rocksdb instance at "/cockroach/cockroach-data"
I180712 18:32:20.942507 5 server/config.go:538  [n?] 1 storage engine initialized
I180712 18:32:20.942552 5 server/config.go:541  [n?] RocksDB cache size: 3.7 GiB
I180712 18:32:20.942566 5 server/config.go:541  [n?] store 0: RocksDB, max size 0 B, max open file limit 1043576
W180712 18:32:20.947635 5 gossip/gossip.go:1293  [n?] no incoming or outgoing connections
I180712 18:32:20.955579 98 gossip/client.go:129  [n45] started gossip client to cockroachdb-0.cockroachdb:26257
I180712 18:32:20.957938 5 storage/store.go:1303  [n45,s45] [n45,s45,r4/?:/System/{NodeLive…-tsd}]: added to replica GC queue
I180712 18:32:20.958938 5 storage/store.go:1303  [n45,s45] [n45,s45,r411/?:/Table/54/1/"b{08567…-147eb…}]: added to replica GC queue
I180712 18:32:20.959349 5 storage/store.go:1303  [n45,s45] [n45,s45,r162/?:/Table/54/1/"f{3e040…-469c1…}]: added to replica GC queue
I180712 18:32:20.958097 98 gossip/client.go:134  [n45] closing client to node 1 (cockroachdb-0.cockroachdb:26257): received forward from node 1 to 2 (cockroachdb-2.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.963352 406 gossip/client.go:129  [n45] started gossip client to cockroachdb-2.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.964757 5 server/node.go:507  [n45] initialized store [n45,s45]: disk (capacity=98 GiB, available=93 GiB, used=79 MiB, logicalBytes=192 MiB), ranges=8, leases=0, writes=0.00, bytesPerReplica={p10=0.00 p25=795287.00 p50=36912902.00 p75=40966061.00 p90=47855689.00 pMax=47855689.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I180712 18:32:20.964800 5 server/node.go:347  [n45] node ID 45 initialized
I180712 18:32:20.964900 5 gossip/gossip.go:333  [n45] NodeDescriptor set to node_id:45 address:<network_field:"tcp" address_field:"cockroachdb-21.cockroachdb.default.svc.cluster.local:26257" > attrs:<> locality:<> ServerVersion:<major_val:2 minor_val:0 patch:0 unstable:0 >
I180712 18:32:20.965487 406 gossip/client.go:134  [n45] closing client to node 2 (cockroachdb-2.cockroachdb.default.svc.cluster.local:26257): received forward from node 2 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.965778 5 storage/stores.go:331  [n45] read 22 node addresses from persistent storage
I180712 18:32:20.966135 5 server/node.go:648  [n45] connecting to gossip network to verify cluster ID...
I180712 18:32:20.966221 5 server/node.go:673  [n45] node connected via gossip and verified as part of cluster "e8e02116-7c96-4fcb-8007-384e4cd269a8"
I180712 18:32:20.966303 5 server/node.go:441  [n45] node=45: started with [<no-attributes>=/cockroach/cockroach-data] engine(s) and attributes []
I180712 18:32:20.966444 5 server/server.go:1430  [n45] starting http server at 0.0.0.0:8080
I180712 18:32:20.966547 5 server/server.go:1431  [n45] starting grpc/postgres server at cockroachdb-21:26257
I180712 18:32:20.966604 5 server/server.go:1432  [n45] advertising CockroachDB node at cockroachdb-21.cockroachdb.default.svc.cluster.local:26257
W180712 18:32:20.966646 5 sql/jobs/registry.go:300  [n45] canceling all jobs due to liveness failure
I180712 18:32:20.968635 415 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.970485 415 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.974508 818 gossip/client.go:129  [n45] started gossip client to cockroachdb-6.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.976437 818 gossip/client.go:134  [n45] closing client to node 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257): received forward from node 8 to 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.978406 951 gossip/client.go:129  [n45] started gossip client to cockroachdb-0.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.980249 951 gossip/client.go:134  [n45] closing client to node 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257): received forward from node 1 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.980437 937 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.982131 937 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.985579 1241 gossip/client.go:129  [n45] started gossip client to cockroachdb-17.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.987072 1241 gossip/client.go:134  [n45] closing client to node 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257): received forward from node 18 to 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.991022 1439 gossip/client.go:129  [n45] started gossip client to cockroachdb-20.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.992855 1439 gossip/client.go:134  [n45] closing client to node 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257): received forward from node 44 to 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.992986 1521 gossip/client.go:129  [n45] started gossip client to cockroachdb-0.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.994381 1521 gossip/client.go:134  [n45] closing client to node 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257): received forward from node 1 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.994495 1661 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:20.996604 1661 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:20.996839 1860 gossip/client.go:129  [n45] started gossip client to cockroachdb-17.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.000473 1860 gossip/client.go:134  [n45] closing client to node 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257): received forward from node 18 to 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.000631 1546 gossip/client.go:129  [n45] started gossip client to cockroachdb-20.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.002245 1546 gossip/client.go:134  [n45] closing client to node 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257): received forward from node 44 to 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.002367 2037 gossip/client.go:129  [n45] started gossip client to cockroachdb-6.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.004041 2037 gossip/client.go:134  [n45] closing client to node 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257): received forward from node 8 to 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.004217 2317 gossip/client.go:129  [n45] started gossip client to cockroachdb-17.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.005755 2317 gossip/client.go:134  [n45] closing client to node 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257): received forward from node 18 to 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.005899 2415 gossip/client.go:129  [n45] started gossip client to cockroachdb-20.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.007831 2415 gossip/client.go:134  [n45] closing client to node 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257): received forward from node 44 to 21 (cockroachdb-19.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.010697 2608 gossip/client.go:129  [n45] started gossip client to cockroachdb-19.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.012373 2608 gossip/client.go:134  [n45] closing client to node 21 (cockroachdb-19.cockroachdb.default.svc.cluster.local:26257): received forward from node 21 to 7 (cockroachdb-4.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.015359 2612 gossip/client.go:129  [n45] started gossip client to cockroachdb-4.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.017137 2612 gossip/client.go:134  [n45] closing client to node 7 (cockroachdb-4.cockroachdb.default.svc.cluster.local:26257): received forward from node 7 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.017257 2774 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.018934 2774 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.019218 2240 gossip/client.go:129  [n45] started gossip client to cockroachdb-17.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.020796 2240 gossip/client.go:134  [n45] closing client to node 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257): received forward from node 18 to 10 (cockroachdb-7.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.020985 2924 gossip/client.go:129  [n45] started gossip client to cockroachdb-7.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.022116 2924 gossip/client.go:134  [n45] closing client to node 10 (cockroachdb-7.cockroachdb.default.svc.cluster.local:26257): received forward from node 10 to 9 (cockroachdb-5.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.024817 3342 gossip/client.go:129  [n45] started gossip client to cockroachdb-5.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.026252 3342 gossip/client.go:134  [n45] closing client to node 9 (cockroachdb-5.cockroachdb.default.svc.cluster.local:26257): received forward from node 9 to 11 (cockroachdb-8.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.029189 3064 gossip/client.go:129  [n45] started gossip client to cockroachdb-8.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.030696 3064 gossip/client.go:134  [n45] closing client to node 11 (cockroachdb-8.cockroachdb.default.svc.cluster.local:26257): received forward from node 11 to 7 (cockroachdb-4.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.030848 3473 gossip/client.go:129  [n45] started gossip client to cockroachdb-4.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.032818 3473 gossip/client.go:134  [n45] closing client to node 7 (cockroachdb-4.cockroachdb.default.svc.cluster.local:26257): received forward from node 7 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.032998 3661 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.034808 3661 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.034929 3812 gossip/client.go:129  [n45] started gossip client to cockroachdb-17.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.036613 3812 gossip/client.go:134  [n45] closing client to node 18 (cockroachdb-17.cockroachdb.default.svc.cluster.local:26257): received forward from node 18 to 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.036733 3830 gossip/client.go:129  [n45] started gossip client to cockroachdb-20.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.038405 3830 gossip/client.go:134  [n45] closing client to node 44 (cockroachdb-20.cockroachdb.default.svc.cluster.local:26257): received forward from node 44 to 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.038599 4112 gossip/client.go:129  [n45] started gossip client to cockroachdb-0.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.039945 4112 gossip/client.go:134  [n45] closing client to node 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257): received forward from node 1 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.040099 4117 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.041530 4117 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.041648 4548 gossip/client.go:129  [n45] started gossip client to cockroachdb-6.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.043476 4548 gossip/client.go:134  [n45] closing client to node 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257): received forward from node 8 to 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.043648 4709 gossip/client.go:129  [n45] started gossip client to cockroachdb-0.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.045328 4709 gossip/client.go:134  [n45] closing client to node 1 (cockroachdb-0.cockroachdb.default.svc.cluster.local:26257): received forward from node 1 to 2 (cockroachdb-2.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.045489 4544 gossip/client.go:129  [n45] started gossip client to cockroachdb-2.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.047112 4544 gossip/client.go:134  [n45] closing client to node 2 (cockroachdb-2.cockroachdb.default.svc.cluster.local:26257): received forward from node 2 to 3 (cockroachdb-1.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.049973 4999 gossip/client.go:129  [n45] started gossip client to cockroachdb-1.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.051706 4999 gossip/client.go:134  [n45] closing client to node 3 (cockroachdb-1.cockroachdb.default.svc.cluster.local:26257): received forward from node 3 to 20 (cockroachdb-16.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.055734 5013 gossip/client.go:129  [n45] started gossip client to cockroachdb-16.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.057626 5013 gossip/client.go:134  [n45] closing client to node 20 (cockroachdb-16.cockroachdb.default.svc.cluster.local:26257): received forward from node 20 to 15 (cockroachdb-11.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.057870 5162 gossip/client.go:129  [n45] started gossip client to cockroachdb-11.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.059518 5162 gossip/client.go:134  [n45] closing client to node 15 (cockroachdb-11.cockroachdb.default.svc.cluster.local:26257): received forward from node 15 to 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.059667 5303 gossip/client.go:129  [n45] started gossip client to cockroachdb-6.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.060563 5303 gossip/client.go:134  [n45] closing client to node 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257): received forward from node 8 to 14 (cockroachdb-12.cockroachdb.default.svc.cluster.local:26257); already have active connection, skipping
I180712 18:32:21.954254 5649 gossip/client.go:129  [n45] started gossip client to cockroachdb-1.cockroachdb:26257
I180712 18:32:21.956093 5649 gossip/client.go:134  [n45] closing client to node 3 (cockroachdb-1.cockroachdb:26257): received forward from node 3 to 19 (cockroachdb-15.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.959777 5764 gossip/client.go:129  [n45] started gossip client to cockroachdb-15.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.961874 5764 gossip/client.go:134  [n45] closing client to node 19 (cockroachdb-15.cockroachdb.default.svc.cluster.local:26257): received forward from node 19 to 17 (cockroachdb-14.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:21.965248 5677 gossip/client.go:129  [n45] started gossip client to cockroachdb-14.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:21.966046 5677 gossip/client.go:134  [n45] closing client to node 17 (cockroachdb-14.cockroachdb.default.svc.cluster.local:26257): received forward from node 17 to 14 (cockroachdb-12.cockroachdb.default.svc.cluster.local:26257); already have active connection, skipping
W180712 18:32:21.967078 609 sql/jobs/registry.go:300  canceling all jobs due to liveness failure
I180712 18:32:22.955523 6001 gossip/client.go:129  [n45] started gossip client to cockroachdb-2.cockroachdb:26257
I180712 18:32:22.957016 6001 gossip/client.go:134  [n45] closing client to node 2 (cockroachdb-2.cockroachdb:26257): received forward from node 2 to 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:22.957182 6354 gossip/client.go:129  [n45] started gossip client to cockroachdb-9.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:22.958689 6354 gossip/client.go:134  [n45] closing client to node 12 (cockroachdb-9.cockroachdb.default.svc.cluster.local:26257): received forward from node 12 to 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257)
I180712 18:32:22.958865 6180 gossip/client.go:129  [n45] started gossip client to cockroachdb-6.cockroachdb.default.svc.cluster.local:26257
I180712 18:32:22.959895 6180 gossip/client.go:134  [n45] closing client to node 8 (cockroachdb-6.cockroachdb.default.svc.cluster.local:26257): received forward from node 8 to 14 (cockroachdb-12.cockroachdb.default.svc.cluster.local:26257); already have active connection, skipping
W180712 18:32:22.967255 609 sql/jobs/registry.go:300  canceling all jobs due to liveness failure

Up to this point everything was working phenomenally. I realize this is the result of user error, but I’m still wondering if there is an escape hatch in situations like this? Any help would be greatly appreciated

Thanks

Hi @amilli,

I have to admit I’m not totally sure how a large cluster would react to having all its nodes decommissioned, especially without knowing which --wait mode flag you were using when decommissioning. The logs you provided are unfortunately not enough for me to know what’s preventing recovery.

Would you mind running cockroach debug zip against one of the nodes and sharing the resulting zip file with me? It’d be the fastest way for me to understand what’s going on. You can send it to me at alex@cockroachlabs.com if you’re up for that.

If not, let me know and we can work out what’s going on more iteratively.

Hi Alex,

Will do. Really appreciate the quick response!

To follow up publicly in addition to our email thread, it turns out that this is a bug. I’ve written up some more info based on a local 3-node reproduction in https://github.com/cockroachdb/cockroach/issues/27444. To summarize, a cluster in which every node is decommissioned can’t currently recover because decommissioned nodes refuse to acquire range leases. And without range leases, none of the data in the cluster can be read or written, so the cluster can’t do anything.

If you’d like a quick fix in the meantime while we sort out a real fix, I can send you a docker container with the relevant 5 lines of code removed that you can use to recover your cluster. Or if you’re feeling adventurous, you can try building it yourself. The process (after modifying the code in a checkout of the git repository) is documented here: https://github.com/cockroachdb/cockroach/tree/master/build#deployment