Highly Available Data

Hi
Anyone gotr a link on HA desing for handling failed nodes dureing upgrade/patching.

eg Inserting 100 ops/sec and taking a node offline and handling that ?

Only doc I can find is

https://www.cockroachlabs.com/docs/stable/demo-fault-tolerance-and-recovery.html

And thats not really useful.

thanks !

Hey @fakka

I believe this documentation may be what you are looking. This goes into configuring the HA Proxy and how it works with cockroach process.

Let me know if there are any other questions.

Hey @rickrock
That would certainly be p[art of it. But really looking more at how to develop with the appropriate driver to handle transitional errors with the load balancer/conneciton . Lets say I send a request and its being severed by a node in the cluster and it gets shutdown.

Something like this

https://docs.couchbase.com/python-sdk/current/failure-considerations.html

Hey @fakka

It sounds like the missing piece would be some client logic set up to retry the connection, if the connection is severed. The load balancer would then get the new request, and forward it to an available node.

Let me know if there are any other questions.

Sure.

But no specific best practices on handling those types of error messages

Eg – which error messages and how to retry.

Or just follow postgress bet practices ?

Hi @fakka,

Just want to point our that in a traditional 3 node cluster you can allow for one node maximum to be down at any given time for transitional purposes. (although not ideal).

So if your load balancer receives an error message (i.e the SQL statement couldn’t execute because of a severed connection) then the load balancer should fire up a new connection and send it to a node that can serve your request.

Thanks.

Where is that documented ?

So if your load balancer receives an error message (i.e the SQL statement couldn’t execute because of a severed connection) then the load balancer should fire up a new connection and send it to a node that can serve your request.

Im not aware of load balancing functionality that handles that.

See https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-on-premises.html#step-6-set-up-load-balancing for info on setting up our on-prem reccomendation for load balancing.

thanks @mattvardi

Was looking at how cockroach suggests to handle from more of a development perspective when there isnt a problem connecting - is the session is aborted midway through transaction.

Example from oracle would be TAF … but also handling session context. If the session was killed - Im not sure what context the load balancer would even know about.