Issue creating an secure deployment on EKS using the helm chart

deployment

(Berthold Alheit) #1

I’m evaluating Cockroach for use in one of our products. We are using helm, so I’m following the helm based kubernetes orchestration guides from the website: https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html

We are using a fairly standard EKS setup, using Kubernetes 1.11.

Setting up an insecure cluster works perfectly. However, setting up a secure cluster runs into issues with volume binding. The only difference between the two setups is that the secure flag is enabled.

This is what my values looks like for the secure cluster, just fyi:

Secure:
  Enabled: true
Storage: "32Gi"
StorageClass: gp2

The specific error message (with details censored) I’m running into with the helm chart is:

pod has unbound PersistentVolumeClaims (repeated 4 times)
AttachVolume.Attach failed for volume "pvc-****" : "Error attaching EBS volume \"vol-****\"" to instance "i-******" since volume is in "creating" state
Readiness probe failed: Get https://*****:8080/health?ready=1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I’m really not sure why the secure flag causes this behaviour. The pods never recover from this state, and deleting pods obviously just results in the same thing happening to the new pod.


(Berthold Alheit) #2

Ok, leaving it alone for about an hour made the made the volume issues go away, TLS is broken though.

W190131 05:22:41.741580 261 vendor/google.golang.org/grpc/clientconn.go:942 Failed to dial secure-****-cockroachdb-0.****-cockroachdb.****-roachdb.svc.cluster.local:26257: context canceled; please retry.

W190131 05:22:41.748003 270 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {****-cockroachdb-2.****-cockroachdb.****-roachdb.svc.cluster.local:26257 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for node, not ****cockroachdb-2.****-cockroachdb.****-roachdb.svc.cluster.local". Reconnecting...

W190131 05:22:41.838191 257 vendor/google.golang.org/grpc/server.go:603 grpc: Server.Serve failed to complete security handshake from "****:38110": remote error: tls: bad certificate

W190131 05:22:42.411839 294 vendor/google.golang.org/grpc/server.go:603 grpc: Server.Serve failed to complete security handshake from "****:32994": remote error: tls: bad certificate

(I noticed that there is an older issue that also touched on the tls certs on aws. Secure cockroachdb cluster on AWS EKS)