Node discovery - best practises

Hi Everyone,

I’m currently in the process of perfecting my bash scripts that build my entire cloud infrastructure on Digital Ocean from scratch. During this process, I have all nodes Join the first node. Works flawlessly.

However, does anyone have any good methods for node discovery on reboot?.. lets say i reboot one of the nodes…what is the best practise or method for node discovery?

should i just have each node write its IP to a file that will read one of the lines and attempt to join?

How do you guys do it? I have a feeling i am making life complicated here. Wondering if there is some easy way to check for existing clusters on the private network. Thanks for any help in this regard

There are two different service discovery needs for CockroachDB: The nodes need to be able to find each other, and clients need to be able to find the nodes.

For nodes finding each other, it’s a little bit simpler than typical dynamic service discovery, since you don’t need to give every node a complete list of its peers - you can just give it any current member(s) and it will find out the rest. I would make a file containing the addresses of your first three droplets and copy that file to every node (current and future) to use in its --join flag. You could even just write your --join flag directly into your systemd service file (or whatever process manager you’re using) You don’t need to update this file as you add new nodes; the only reason you’d need to change it is if you lose all three of those first three nodes.

For clients discovering nodes, you could do something similar, but in this case you do want every node to get used so you’d need to update the file as the set of nodes changes. A simple trick you can do is run both a client application and a cockroach server on every node and then just point the application to localhost. If that doesn’t suit your needs (e.g. if you need more of one kind of process than the other), we recommend using a load balancer. Digital Ocean has managed load balancers or you can run your own with something like haproxy. Either way, configuring the load balancer is currently a bit of a manual process for DO. There’s nothing cockroach-specific here, you just need to keep the LB configuration up to date as nodes are added and removed.

Have you seen our digital ocean deployment docs? It doesn’t cover this issue specifically (it doesn’t talk at all yet about arranging for automatic restarts), but it may answer other questions for you.

1 Like

thank you for your reply and the hints you provided.