Understanding CockroachDB Keepalive

I have been trying to understand the purpose of the below env in term of a problem that we are facing in our system.


We have a system with client A wanting to connect a standalone Cockroachdb server’s B via HaProxy.

A <—> haproxy <—> B

Now every now and then we happened to see “Broken Pipe” error.

Now, all that I know about COCKROACH_SQL_TCP_KEEP_ALIVE is set to send TCP keepalive request every 60 sec.

Also, we have used HAProxy settings mention in Cockroachdb guide

# TCP keep-alive on Client side. Server already enables them.
    option              clitcpka

With the following option, I’m assuming the Client and HAProxy connection would be Keepalive enable.

At last, I want to understand does

option srvtcpka

would actually help.

TCP keep-alive is purely to ensure the TCP connection remains active as long as both the client and server expect it to. It says nothing about the application-layer connection.

In SQL protocols, including the postgres wire protocol used with CockroachDB, a SQL connection can be very short lived, or can live for days/weeks/months depending on the application and the driver.

If you send a query once a day from the same client (a single program that remains alive the whole time), your postgres driver may attempt to cache the connection. However, since no data is flowing through it, haproxy will close the connection after 1 minute.

TCP keep-alives are at a much lower level: as long as both the client and server (your client and haproxy, or haproxy and a CockroachDB node) haven’t closed the connection, the keep-alive will make sure the connection is not considered dead by intermediate networking. It will also let the two parties know if the other one dropped off the map without explicit connection termination.

To summarize: using TCP keep-alive is a good idea given the many aggressive firewalls/load balancers/intermediates out there.
Client timeouts should be set based on the profile of your application.