Not able to recover cluster after accidentally shutdown the OS

The cluster not working after accidentally shutdown the instance, and bring it up.
W190909 06:52:47.391491 246 storage/store.go:1525 [n1,s1,r6/1:/Table/{System Con…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica ( n1,s1):1 not lease holder; lease holder unknown
W190909 06:52:48.112846 231 storage/replica_range_lease.go:506 can’t determi ne lease status due to node liveness error: node not in the liveness table
W190909 06:52:48.480924 246 storage/store.go:1525 [n1,s1,r6/1:/Table/{System Con…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica ( n1,s1):1 not lease holder; lease holder unknown
W190909 06:52:48.736314 252 storage/store_rebalancer.go:227 [n1,s1,store-reb alancer] StorePool missing descriptor for local store
W190909 06:52:49.438508 246 storage/store.go:1525 [n1,s1,r6/1:/Table/{System Con…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica ( n1,s1):1 not lease holder; lease holder unknown
I190909 06:52:49.960130 309706 internal/client/txn.go:618 [n1] async rollbac k failed: context deadline exceeded
W190909 06:52:50.459527 246 storage/store.go:1525 [n1,s1,r6/1:/Table/{System Con…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica ( n1,s1):1 not lease holder; lease holder unknown
W190909 06:52:50.716642 254 server/node.go:799 [n1] [n1,s1]: unable to compu te metrics: [n1,s1]: system config not yet available
I190909 06:52:50.952493 259 server/status/runtime.go:500 [n1] runtime stats: 277 MiB RSS, 257 goroutines, 126 MiB/104 MiB/271 MiB GO alloc/idle/total, 15 MiB/27 MiB CGO alloc/total, 16.5 CGO/sec, 6.2/1.6 %(u/s)time, 0.0 %gc (1x), 711 KiB/577 KiB (r/w)net
W190909 06:52:51.220367 267 jobs/registry.go:341 unable to get node liveness : node not in the liveness table
W190909 06:52:51.460125 266 storage/node_liveness.go:523 [n1,hb] slow heartb eat took 4.5s
W190909 06:52:51.460136 266 storage/node_liveness.go:463 [n1,hb] failed node liveness heartbeat: aborted during DistSender.Send: context deadline exceede d
W190909 06:52:51.572196 246 storage/store.go:1525 [n1,s1,r6/1:/Table/{System Con…-11}] could not gossip system config: [NotLeaseHolderError] r6: replica ( n1,s1):1 not lease holder; lease holder unknown

Hi @madhu72,

How are you connecting to your cluster and can you query it?

What specifically is not working, can you provide more information?

What version of CRDB is this?

Thanks,
Matt

Actually we have two data centers, each having three nodes.

Starting using the below command, however initially I only tried with first datacenter, at that time it not worked, but, once I started second data center. The cluster worked fine. How we can make, it work, when any of datacenter not avaialble too?

nohup ./cockroach start --insecure --store=data --locality=region=US,datacenter=GUCA --max-sql-memory=.25 --cache=.25 --advertise-addr=$HOSTNAME --listen-addr=$HOSTNAME --http-addr=$HOSTNAME:58080 --join=gcasa01v:26257,gcasa02v:26257,gcasa03v:26257,gcasc01v:26257,gcasc02v:26257,gcasc03v:26257 1>./data/logs/startcrdb.log 2>&1 &

Build Tag: v19.1.4
Build Time: 2019/08/06 15:34:13
Distribution: CCL
Platform: linux amd64 (x86_64-unknown-linux-gnu)
Go Version: go1.11.6
C Compiler: gcc 6.3.0
Build SHA-1: 51a6fdedf0ce1d1329d40d801a7deaf8206b6b07
Build Type: release

Thanks for the info.

Can you give me some more information on the exact steps you were taking when it did not work?

Have you changed the replication factor by any chance?

What were you trying to do when something wasn’t working?

Thanks!
Matt

I tried to verify the ports are listening or not, and identified ports are listening, but when I tried to connect for sql, it always i/o timed out. so I tried with bringing up the 2nd data center too. that resolved the issue.
Even the port 26257 listening, I am not able to get access to dashboard.

Sorry, I did not tried to change the replication factor at all. Just wondered what is going on and tried to look in online but non of the results given right information to resolve it.

I understand.

If by dashboard you mean the Admin UI, then that will be accessible on --http-addr=$HOSTNAME:58080.

Let me know if you have any other questions.

Yes, that too not worked (returned 404 not found error) when tried to accesss ADMIN ui, when the 2nd data center not up.

Anyway, currently both data centers are running. So no issues presently.

Awesome!

Thanks for the update.

If half of your nodes are running, there is no consensus, so your database will stop responding. Having 1 node running in the second DC would have been enough to get response again.