GCE Load Balancer Connection Issues

Hello. I followed the GCE secure setup guide exactly and have a lot of experience setting up production environments on GCE. I’m stumped by this error I’m getting and have tried and failed to Google an answer. My deployment consists of 9 instances in 3 localities around the world all behind an TCP Proxy LB.

When I connect directly to an instance, everything works as expected. I get no errors. However, when I connect via cockroach sql through the LB, everything is fine for a minute or two and then I start getting these errors:

warning: error retrieving the transaction status: driver: bad connection
warning: connection lost!
opening new connection: all session settings will be lost

My application, which is in Python, also works for a short period of time and then also begins to fail.

I’ve tried changing the backend timeout amount, session affinity, and other such settings but nothing appears to be having an effect. I have not tried turning on proxy protocol but I doubt cockroachdb supports that and it doesn’t mention it in the docs.

Any ideas why the LB would be disrupting communications like this? Thanks everyone!

It turns out it was the backend timeout setting. It took a good 30 minutes to kick in. It’s normally set to 100 seconds. I set it to 3000 and now everything is working fine. I have a pool recycle time in my SQL connection setup which should keep the SQL connection active permanently.

Devs - I would suggest adding this to the GCE documentation here: https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb-on-google-cloud-platform.html

Sorry to be a bother with the thread! I hope this is helpful to someone in the future.

Thanks for sharing what the problem was! I’ve created an issue to track getting it into our docs: https://github.com/cockroachdb/docs/issues/2554

1 Like

@gsibble, mind clarifying something for me? Under what conditions was the backend timeout closing your connections? When the client wasn’t sending request? When a particularly long query was running without any results being returned?