Multi-active Availability

Hi there, I’m new to crdb, so I might have overlooked something. Feel free to point me to docs etc.

My question is about multi-active availability, which I understand to mean that all nodes are available for reads and writes at all times. So does that mean that in a distributed application every app node that wants to read/write data can round-robin through the crdb nodes and that will save me a set of load balancers in front of the crdb nodes?

Yes, I’ll need health checks to take crdb nodes out of and back into the rotation. But other than that, what particular advantage do I gain by load balancing the crdb nodes seperately? Here I am assuming that the client-facing app nodes will have to be load balanced anyway.

Ulrich

@ulim, that’s right: you can direct your reads/writes to any node in the cluster. You’ll pay a latency cost, of course, for accessing nodes in distant data centers. See Multi-Datacenter Fast Read Setup for some additional considerations.

Anyway, I assume you’re asking because our deployment documentation suggests using a separate load balancer? Most existing database drivers—at least the ones we’ve seen—don’t support sending requests round-robin to multiple hosts. That, in turn, means that most app frameworks (like Rails) want a single database URL. Putting a load balancer out front of your Cockroach cluster is the shortest path to balancing traffic across your cluster, since you can just specify the address of the load balancer in your existing app.

What you’re suggesting amounts to baking a simple load balancer into your app, I think. You’ll need some way of communicating to every client which Cockroach nodes are healthy and which aren’t. (Alternatively, you can have each client pay the cost of discovering that a Cockroach node is offline and trying the next one.) What if you want to do something more complicated than round-robin, like routing traffic to the server with the fewest active connections? So the two benefits of standalone load balancers that I see are a) you don’t have to write them yourself, and b) they can make better decisions about routing traffic than any individual instance of your client app can.

That said, your strategy is totally valid, provided you’re happy with whatever solution you engineer to update the list of hosts on every client app as nodes come in and out of service.

You are quite rightly singing the praises of professional load balancers such as haproxy, but please don’t forget that you need a bunch of them and failover between them using a floating IP or whatnot. This comparatively complicated and expensive infrastructure is already in place in front of my app nodes.

Assuming that the traffic is already balanced correctly across my app nodes, then why load balance again in front of the db nodes?

Since I have to do web session clustering across my app nodes anyway using something like Hazelcast, it would not be a problem to communicate lists of failing/working crdb nodes across the cluster. But I also think it won’t be necessary, because every app node is only ever interested in exactly one crdb node and has to monitor that. So every failure will be discovered exactly once and by a deterministic app node.

Ulrich

PS: I know we’re leaving the CockroachDB field with this architectural discussion, I was just wondering whether it is considered typical usage to read and write to all crdb nodes at the same time.