Hey @wonko
The error message you are seeing in reference to the CrashLoopBackoff state of the pod is the error coming from k8s. Since the default restart policy for a configured k8s pod is “Always” (check out the k8s docs here), this would mean that k8s always ries to restart a pod when it goes down. The CrashLookBackoff state is an indicator its tried to do that a high number of times.
The reason reported in the cockroach logs look like the server was unable to start correctly, due to an authentication error. This would cause the pod to be restarted, as k8s would try it over and over.
E190919 17:29:28.450594 486 server.go:2977 http: TLS handshake error from 10.244.0.1:37658: EOF
I190919 17:29:29.545154 409 cli/start.go:840 14 running tasks
W190919 17:29:30.582104 485 vendor/google.golang.org/grpc/server.go:666 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190919 17:29:30.635677 493 vendor/google.golang.org/grpc/server.go:666 grpc: Server.Serve failed to create ServerTransport: connection error: desc = "transport: http2Server.HandleStreams failed to receive the preface from client: EOF"
W190919 17:29:30.735050 114 storage/store.go:3704 [n3,s3,r38/3:/Table/7{1-2}] handle raft ready: 12.6s [processed=0]
E190919 17:29:30.798208 495 server.go:2977 http: TLS handshake error from 10.244.0.1:37752: EOF
E190919 17:29:30.860518 492 server.go:2977 http: TLS handshake error from 10.244.0.1:37692: EOF
W190919 17:29:30.444072 113 storage/engine/rocksdb.go:2040 batch [1/51/0] commit took 504.036489ms (>= warning threshold 500ms)
I190919 17:29:31.907425 539 gossip/client.go:128 [n3] started gossip client to cockroachdb-shared-cockroachdb-2.cockroachdb-shared-cockroachdb.data-lake.svc.cluster.local:26257
W190919 17:29:35.351312 174 storage/node_liveness.go:523 [n3,hb] slow heartbeat took 5.7s
W190919 17:29:35.363183 174 storage/node_liveness.go:463 [n3,hb] failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 4.5s
I190919 17:29:35.375590 409 cli/start.go:840 19 running tasks
W190919 17:29:35.546463 61 vendor/google.golang.org/grpc/clientconn.go:1304 grpc: addrConn.createTransport failed to connect to {cockroachdb-shared-cockroachdb-1.cockroachdb-shared-cockroachdb.data-lake.svc.cluster.local:26257 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: io: read/write on closed pipe". Reconnecting...
The CrashLookBackoff state is usually just an indicator that some more digging needs to be done
Let me know if there are any other questions.
Cheers,
Ricardo