Peer-finder hanging

Has anyone seen issues with the helm chart deploy of cockroachdb with peer-finder init container hanging? We are deploying to a non-default namespace and providing the proper namespace and domain args. It works fine for the first 2 nodes, but hangs on init for the 3rd.

Executing dig and nslookup commands on the pod return the peer list successfully.

A teammate has suggested it may be from this issue in go in the underlying peer-finder image:


Hi @rjaxin,

I haven’t personally seen that or seen it reported, but it wouldn’t shock me. Which versions of Kubernetes and the peer-finder are you using? Does the peer-finder log anything? What happens if you restart it?


Kube server 1.8.6, Helm 2.7.2


kubctl logs for bootstrap init container repeatedly shows error: lookup cockroachdb on x.x.x.x:x: no such host

deleting the pod doesn’t change the behaviour


Thanks @rjaxin. We recently removed the peer-finder from our non-helm Kubernetes configuration. Would also removing it from our Helm configuration be a reasonable solution from your perspective?

@a-robinson so you are relying just on the --join arg list? Are there any downsides? Does this reduce elastic scalability or is it going to just connect to one of those peers to join the cluster?

The only downside is that I have to make sure we can do a run-once init job via helm that won’t get re-run during helm upgrades and the like. Once the cluster has started, the new approach is equally scalable and actually better than the old one, because it removes the edge case of node 1 re-initializing a new cluster that the peer-finder was meant to protect against.

Sounds like a good approach


looks like the cluster-init job can be annotated with "": post-install and not post-upgrade?


Perfect, thanks for the pointer!

I’ve sent out to remove the peer-finder. Thanks again for the help!

…And the updated version is now live:

1 Like