Cockroach pod restart issue gke

We are running Cockroach db cluster on kubernetes cluster. today i observed that one of the cockroach db pod restart automatically and after sometime when it up another pod was also restarted. Currently it’s stable but what was the cause of the restart. here is the stack driver logs of kubernetes cluster.

E 2019-09-30T07:10:35.162900451Z W190930 07:10:35.162656 159 storage/node_liveness.go:523  [n3,hb] slow heartbeat took 1.1s
 
E 2019-09-30T07:10:35.682476099Z W190930 07:10:35.682270 150 storage/node_liveness.go:523  [n2,hb] slow heartbeat took 1.0s
 
E 2019-09-30T07:10:37.850522101Z W190930 07:10:37.839867 174 storage/node_liveness.go:523  [n1,hb] slow heartbeat took 1.4s
 
E 2019-09-30T07:10:37.916989770Z I190930 07:10:37.916771 1 cli/start.go:765  received signal 'terminated'
 
I 2019-09-30T07:10:38.279542711Z initiating graceful shutdown of server
 
E 2019-09-30T07:10:38.279636503Z I190930 07:10:38.189084 1 cli/start.go:830  initiating graceful shutdown of server
 
E 2019-09-30T07:10:38.752799223Z I190930 07:10:38.719680 167 server/status/runtime.go:500  [n1] runtime stats: 960 MiB RSS, 212 goroutines, 241 MiB/185 MiB/472 MiB GO alloc/idle/total, 430 MiB/527 MiB CGO alloc/total, 188.3 CGO/sec, 2.6/0.6 %(u/s)time, 0.0 %gc (0x), 243 KiB/292 KiB (r/w)net
 
E 2019-09-30T07:10:40.695469439Z I190930 07:10:40.695242 143 server/status/runtime.go:500  [n2] runtime stats: 816 MiB RSS, 200 goroutines, 114 MiB/156 MiB/339 MiB GO alloc/idle/total, 446 MiB/536 MiB CGO alloc/total, 165.4 CGO/sec, 5.0/0.8 %(u/s)time, 0.0 %gc (1x), 244 KiB/334 KiB (r/w)net
 
E 2019-09-30T07:10:41.066399860Z W190930 07:10:41.066073 159 storage/node_liveness.go:523  [n3,hb] slow heartbeat took 2.5s
 
E 2019-09-30T07:10:41.066608482Z W190930 07:10:41.065727 150 storage/node_liveness.go:523  [n2,hb] slow heartbeat took 1.9s
 
E 2019-09-30T07:10:42.466792354Z W190930 07:10:42.466637 174 storage/node_liveness.go:523  [n1,hb] slow heartbeat took 1.2s
 
E 2019-09-30T07:10:43.039568111Z W190930 07:10:43.039335 138 storage/store.go:3704  [n1,s1,r151/1:/System/tsd/cr.store.r{a…-e…}] handle raft ready: 0.5s [processed=0]
 
E 2019-09-30T07:10:43.419860255Z I190930 07:10:43.419680 3733061 cli/start.go:840  8 running tasks
 
E 2019-09-30T07:10:43.610224746Z I190930 07:10:43.609879 152 server/status/runtime.go:500  [n3] runtime stats: 249 MiB RSS, 184 goroutines, 122 MiB/61 MiB/202 MiB GO alloc/idle/total, 70 MiB/84 MiB CGO alloc/total, 206.8 CGO/sec, 3.8/0.8 %(u/s)time, 0.0 %gc (0x), 224 KiB/742 KiB (r/w)net
 
E 2019-09-30T07:10:44.703615891Z W190930 07:10:44.703390 150 storage/node_liveness.go:523  [n2,hb] slow heartbeat took 1.1s
 
E 2019-09-30T07:10:48.239915903Z I190930 07:10:48.239737 3733061 cli/start.go:840  8 running tasks
 
E 2019-09-30T07:10:48.828346119Z W190930 07:10:48.828029 159 storage/node_liveness.go:523  [n3,hb] slow heartbeat took 1.3s
 
E 2019-09-30T07:10:50.692636524Z W190930 07:10:50.691012 87 gossip/gossip.go:1496  [n3] no incoming or outgoing connections
 
E 2019-09-30T07:10:50.692705497Z W190930 07:10:50.691233 189 vendor/google.golang.org/grpc/clientconn.go:1304  grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb.default.svc.cluster.local:26257 0  <nil>}. Err :connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting...
 
E 2019-09-30T07:10:50.692734505Z W190930 07:10:50.691621 92 vendor/google.golang.org/grpc/clientconn.go:1304  grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0  <nil>}. Err :connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting...
 
E 2019-09-30T07:10:50.692741805Z W190930 07:10:50.691800 167 storage/raft_transport.go:583  [n3] while processing outgoing Raft queue to node 1: rpc error: code = Unavailable desc = transport is closing:
 
E 2019-09-30T07:10:50.692748127Z W190930 07:10:50.692060 92 vendor/google.golang.org/grpc/clientconn.go:1440  grpc: addrConn.transportMonitor exits due to: context canceled
 
E 2019-09-30T07:10:50.693278292Z W190930 07:10:50.693178 189 vendor/google.golang.org/grpc/clientconn.go:1440  grpc: addrConn.transportMonitor exits due to: context canceled
 
E 2019-09-30T07:10:50.693936883Z W190930 07:10:50.693522 72 gossip/gossip.go:1496  [n2] no incoming or outgoing connections
 
E 2019-09-30T07:10:50.694099800Z I190930 07:10:50.694013 18727 gossip/client.go:128  [n3] started gossip client to cockroachdb-1.cockroachdb.default.svc.cluster.local:26257
 
E 2019-09-30T07:10:50.705579043Z W190930 07:10:50.705423 18729 vendor/google.golang.org/grpc/clientconn.go:1304  grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb.default.svc.cluster.local:26257 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb.default.svc.cluster.local: no such host". Reconnecting...
 
E 2019-09-30T07:10:50.705754521Z I190930 07:10:50.705677 18723 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322  [n3] circuitbreaker: rpc [::]:26257->1 tripped: failed to connect to n1 at cockroachdb-0.cockroachdb.default.svc.cluster.local:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb.default.svc.cluster.local: no such host"
 
E 2019-09-30T07:10:50.705827825Z I190930 07:10:50.705728 18723 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447  [n3] circuitbreaker: rpc [::]:26257->1 event: BreakerTripped
 
E 2019-09-30T07:10:50.705838332Z I190930 07:10:50.705758 18723 rpc/nodedialer/nodedialer.go:143  [intExec=lookup-auth-session,n3,txn=7400555f] unable to connect to n1: failed to connect to n1 at cockroachdb-0.cockroachdb.default.svc.cluster.local:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup cockroachdb-0.cockroachdb.default.svc.cluster.local: no such host"
 
E 2019-09-30T07:10:50.775211848Z E190930 07:10:50.774986 1375974 sql/distsqlrun/server.go:614  [n2] communication error: rpc error: code = Canceled desc = context canceled
 
E 2019-09-30T07:10:50.775390610Z E190930 07:10:50.775162 1375952 sql/distsqlrun/server.go:614  [n2] communication error: rpc error: code = Canceled desc = context canceled
 

Hey @vishal,

Is this the same cluster from the other post?

Hey @ronarev, No this is not that cluster which i posted in other post, This is another cluster which and i am observing the issue on that.