Pod names resolution across multiple Kubernetes clusters on AWS

(Andrei) #1

I’m trying to install CockroachDB across two Kubernetes clusters on AWS. Clusters are connected using VPC Peering, so pod-to-pod connectivity is guaranteed.
I’m facing a problem with exposing DNS Server for enabling pod name resolution between clusters as described in https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes/multiregion#exposing-dns-servers-to-the-internet
The Load Balancer definition provided in the GitHub Project (https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/multiregion/dns-lb.yaml) defines an UDP LoadBalancer, but the AWS does not support UDP LoadBalancers, so the configuration of stubDomains is not possible.

Are there alternative mechanisms for enabling cross cluster pod names resolution on AWS?


(Tim O'Brien) #2

Hi @ak-icc,

You should be able to add the following or replace the udp configuration on dns-lb.yaml:

  - name: dns
    port: 53
    protocol: TCP
    targetPort: 53

The docs on kube-dns are a bit thin, but as far as I can see that should be all that’s necessary to switch the protocol from UDP to TCP (or add it if it’s TCP only).

Let me know if that doesn’t work for you.


(Andrei) #3

Hi @tim-o,

i’ve tried it with TCP, the created LoadBalancer resource looks like this:

kind: Service
  - name: dns
    nodePort: 31166
    port: 53
    protocol: TCP
    targetPort: 53
    - hostname: internal-ad2a0449824aa11e9b54f02f5a217943-1440569433.eu-central-1.elb.amazonaws.com

The setup.py expects that the LoadBalancer will have an IP address:

external_ip = check_output(['kubectl', 'get', 'svc', 'kube-dns-lb', '--namespace', 'kube-system', '--context', context, '--template', '{{range .status.loadBalancer.ingress}}{{.ip}}{{end}}'])

In my case AWS provides only the hostname and the script stays handing in the wait loop.
I will try to extend the setup script and resolve IP for stubDomains from the hostname.


(Andrei) #4

Hi @tim-o,

after reading AWS documentation i’m not sure that the approach with cross-connecting DNS Server in both clusters via IPs of Load Balancers will work reliably. The Elastic Load Balancers on AWS can change theirs IPs (that’s why the the kubectl outputs for the LoadBalancer not the IP, but the hostname like internal-ad2a0449824aa11e9b54f02f5a217943-1440569433.eu-central-1.elb.amazonaws.com).
So the IPs we are configuring as stubDomains during deployment can change after some time and the pod name resolution will not function anymore. Or am i wrong?


(Jesse) #5

Hi @ak-icc

Thanks for continuing to investigate and dig in here. You’re right that the current configuration and docs require a stable public IP address for load balancing. That’s the approach we took for the documentation and our testing, which focused on GKE. Unfortunately, we just don’t have precise insight into getting this working on AWS at the moment. I’ve put this in our backlog to investigate get documented: https://github.com/cockroachdb/docs/issues/4314.

In the meantime, I’d suggest you look into ways to update the configuration to use a load balancer hostname that is resolvable and routable from all the clusters.




I’m also trying this approach but I also encountered problems that I haven’t successfully resolved…

I’ve done:

  • 3 Gossip Kubernetes clusters in different regions via kops
  • Port 26257 is open on k8s nodes security group
  • I don’t have Federated Kubernetes setup because v1 image is gone and v2 is not ready
  • VPC Peering is setup to ensure Node to Node connection with internal IP is available.
  • Services is accessible via external IP (AWS ELB).

After some tweaking the setup.py, I still have Readiness probe failed errors, and inside the log I found Secure node-node and SQL connections are likely to fail.

What I tweak is:

First of all, set service.beta.kubernetes.io/aws-load-balancer-type: "nlb" in annotations in dns-lb.yaml so there are static ips returned.

after the external_ip is defined in line 120, I first get the externalIP from load balancer of Kube-dns, which is a domain address with AWS ELB. I use dig to wait until it return the ips. I use those ips to prepare the Kube-dns ConfigMap for IPs on other clusters…

It seems fine… However, I’m still not able to connect across clusters to the Pod IP (100.x.x.x). Is there a way to do that? Do I need Federated Kubernetes in order to achieve that?

Also… is it possible to, instead a 3 replicas stateful service, create 3 separate services with its own service ip… don’t run cockroach start yet until everything is up and we have all the external service address to put in “join”? (or just capture the main one and have other join to that address)
I think in theory this should address my issue… But just curious if there is a better way to do achieve it…