Aws AutoScalingGroup and NetworkLoadBalancer

Would it make sense to build an ASG for which the EC2 instances register to a network load balancer and have the new instances join the cluster by feeding it the NLB address during the CRDB startup on the newly created nodes?

In the ASG all EC2 instances get their IP addresses dynamically so the only thing I know that is always is pointing to the running instance[s] is the load balancer. If this is possible, the next challenge is to find out in which zone and region the newly created EC2 instance is started to put that into the locality flags.

If this approach is valid for scale-out, next thing is to find something to handle the scale-in. I think this should be done by one of the remaining nodes by monitoring the nodes that are registered in the NLB and compare that to the nodes that the database is aware of…

1 Like

Hi Ronald,

I’ll give you a short answer, but it deserves a bigger discussion. When you are using an ASG with an NLB, you do not want to pass the address of the load balancer itself to the new cockroachdb nodes. The ip address used in a join flag should point to individual nodes not a load balanced collection of nodes. Historically I’ve solved this problem using k8s stateful sets so I have a stable network identify for individual nodes (regardless of the ip assigned by AWS) so new nodes knew that pods named cockroachdb-0,-1,-2 would always be valid nodes to join, for example (k8s would figure out the ip address behind the scenes).

I’m sure we can come up with a pattern that would work with ASG that doesnt involved k8s. Let me ask around.

Another option from the team: make a non-auto-scaling group of ~3 nodes to use as the join targets everywhere, then use an auto-scaling group for the rest

Hi there,

I attempted this for our setup but found that there is no out-of-box way to do this even when using CloudFormation.

It requires a few Lambdas for setting up and tearing down Route 53 records so that every one of your CRDB servers is DNS addressable, then it also requires a Lambda to update a precreated SSM parameter to keep a list of your nodes for joining (when that list is empty set the CRDB join parameter to the node itself using the DNS address).

That should work theoretically but it was too much work for the time constraints I had so we needed up with a CloudFormation template that would create a cluster of 4 nodes with predetermined names (not using ASG) but would use a manual SSM parameter to join so we could spin up more node clusters and have them just join an existing node cluster.

Cheers,
Stefan

Thanks for your responses @charsleysa and @nate,

I have been fiddling a while and I will script it using user_data from the ScalingGroupConfiguration template. Set the host and dns names depending on the known hosts in the cluster.
Being able to register through the NLB would be easier but scripted is not that hard too.
I can imagine that during the registration the new node just has to pass it’s own name to the existing nodes. The existing nodes integrate the new node and start communication with the new node using the name/address they got passed. Is there a reason this is not possible -yet- ? This would simplify things a lot.
For scaling-IN I need some scripting any way to decommission dead nodes. For that I need some extra tinkering …

Hi Ronald,

In terms of node registration to the cluster, the node passes it’s address (can be either DNS or IP which you can set) to the cluster, but the issue is the node needs an entry point to the cluster so it needs to know the address of another node in the cluster which in an AutoScalingGroup can be difficult.

I have not yet tried to register nodes through the NLB but that’s an idea I’ll keep in mind when tinkering in the future.

In our CloudFormation file we use the user_data section for configuring the CockroachDB and can both commission and shutdown a node (fully decommissioning a node is still a manual process as I have yet to figure out a nice way to automatically recommission a node).

If you would like I can share our CloudFormation files which might help you out.

Cheers,
Stefan

Hi @nate,

I understand that it does not make sense to pass the address of the load balancer to a node. But being able to use the load balancer as a fixed point for registrations sure makes life a lot easier. What I imagine is a new node that wants to join the cluster does a call on the load balancer. The load balancer passes the call through to a healthy node. That node put’s it’s own addresses in the response back to the caller and now the caller has the address of an existing healthy node he/she can join to. In this scenario, the healthy node could of course return a set of addresses.
So in my idea, it is not so much a direct join but a slightly extended join protocol. This removes the need to hard code join addresses and makes things a lot more flexible.
And yes, I have seen k8 but that also has it’s own challenges.