Can't find the cause of "SQLSTATE[HY000]: General error: 7 SSL SYSCALL error: EOF detected"

I keep getting the following error message on my website when connecting to my CockroachDB cluster through haproxy:

SQLSTATE[HY000]: General error: 7 SSL SYSCALL error: EOF detected

The error occurs randomly. Sometimes I can run 10 queries before the message reappears, other times it appears for every query. Usually I can re-run the query a second after the message appears and the query succeeds.

So far I haven’t found anything in the log files for haproxy or CockroachDB.

If I connect directly to any of the database servers, the error never appears.

My connection string which uses PHP’s PDO extension:

try {
     // Connect to the server and the database
     $this->link = new PDO(‘pgsql:host=’.$this->serverName.’;port=26257;dbname=’.$usedb.’;sslmode=require;sslrootcert=’.$rootcert.’;sslkey=’.$userkey.’;sslcert=’.$usercert, $this->userName, null, $options);

} catch (PDOException $e) {
     echo ‘Caught PDOExcetion<br>’;
     echo 'Error Message: '.$e->getMessage();
     echo '<br>Error Code: '.$e->getCode();

     exit(1);
}

Haproxy is supposed to be passing through the SSL connections.

My /etc/haproxy/haproxy.cfg:

global
     log /dev/log local0
     log /dev/log local1 notice
     chroot /var/lib/haproxy
     stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
     stats timeout 30s
     user haproxy
     group haproxy
     daemon
     maxconn 4096

     cpu-map 1 0
     cpu-map 2 1

defaults
     log global
     mode tcp
     option dontlognull
     option tcplog

     # Timeout values should be configured for your specific use.
     # See: HAProxy version 1.8.30 - Configuration Manual
     timeout connect 10s
     timeout client 1m
     timeout server 1m

     # TCP keep-alive on client side. Server already enables them.
     option clitcpka

     errorfile 400 /etc/haproxy/errors/400.http
     errorfile 403 /etc/haproxy/errors/403.http
     errorfile 408 /etc/haproxy/errors/408.http
     errorfile 500 /etc/haproxy/errors/500.http
     errorfile 502 /etc/haproxy/errors/502.http
     errorfile 503 /etc/haproxy/errors/503.http
     errorfile 504 /etc/haproxy/errors/504.http

listen psql
     bind :26257
     mode tcp
     #balance roundrobin
     balance static-rr
     option httpchk GET /health?ready=1
     server cockroach1 192.168.17.150:26257 check port 8080
     server cockroach2 192.168.17.151:26257 check port 8080
     server cockroach3 192.168.17.152:26257 check port 8080

CockroachDB version: 21.1.8
HAProxy version: 2.4.7 (installed from source)

Where should I be looking for the cause of the error?

Does this happen when you have longer running queries?

I imagine it’s probably something similar to this StackOverflow thread

I also thought it was a timing issue but all queries are affected. Even a SELECT name FROM cooks WHERE cookid = 1 query can cause the error to display.

Interesting, I think this is definitely an HAProxy issue. Hard to debug without logs at the moment.
Is it possible to configure TCP keepalive?

In HAProxy? I have option clitcpka in haproxy.cfg. I’ll need to do some reading of HAProxy’s documentation to see what other keepalive options are available.

Got the following from HAProxy’s log file:

Oct 14 18:36:30 videorender haproxy[18029]: 192.168.17.103:38590 [14/Oct/2021:18:35:30.242] psql psql/cockroach2 1/0/60744 69730 cD 2/2/1/0/0 0/0
Oct 14 18:36:36 videorender haproxy[18029]: 192.168.17.103:38598 [14/Oct/2021:18:35:36.651] psql psql/cockroach3 1/0/60228 37551 cD 1/1/0/0/0 0/0
Oct 14 18:48:23 videorender haproxy[18029]: 192.168.17.103:38628 [14/Oct/2021:18:47:23.822] psql psql/cockroach1 1/0/60027 4168 cD 1/1/0/0/0 0/0
Oct 14 20:43:52 videorender haproxy[18029]: 192.168.17.103:38734 [14/Oct/2021:20:42:38.813] psql psql/cockroach2 1/0/74022 228270 cD 2/2/1/0/0 0/0
Oct 14 20:44:14 videorender haproxy[18029]: 192.168.17.103:38772 [14/Oct/2021:20:43:11.955] psql psql/cockroach3 1/0/62780 218654 cD 1/1/0/0/0 0/0

My webserver (192.168.17.103) is the only computer that uses the database.

Originally had HAProxy on an Odroid MC1 which uses a usb-to-ethernet adapter so I installed HAProxy on a different server (more RAM, faster CPU, pci-e network card) but that didn’t make a noticeable difference.

The “EOF detected” part of the error made me think the users certificate and/or key file where corrupted so I generated a new client certificate and key pair. Still getting the error. Going to check the node keys next.

Connecting to the database through the load balancer from my main computer also has connection problems. As long as the queries are less than a minute a part, everything is fine. If I go more than 1 minute between queries the error message below appears. This time period matches with the timeout client|server settings in haproxy.cfg. I connected as the root user from my main computer.

Error message seen on cockroachdb command line:

invalid syntax: statement ignored: unexpected error: driver: bad connection
warning: connection lost!
opening new connection: all session settings will be lost

I am able to continue running queries successfully after this error message appears.

Ran sudo tail -f /var/log/haproxy.log while using the command line and about a minute after my last query to the database the following line appeared:

Oct 15 20:21:51 dbproxyone haproxy[18656]: 192.168.17.12:32984 [15/Oct/2021:20:17:16.903] psql psql/cockroach1 1/1/274664 44541 cD 2/2/1/0/0 0/0


Other items I’ve checked:

  • Re-ran cockroach gen haproxy to see if I had removed or changed something and noticed that the IP addresses for cockroach2 and cockroach3 are reversed. The IP addresses are also reversed in the DB console.

  • Got the CockroachDB log files to show in the DB console. All of the lines are marked as “info”.

  • Checked the node certificates and keys on each node and they show the correct info.

  • When I installed HAProxy from source, I did not include multi-threading support so I removed the cpu-map 1 0 and cpu-map 2 1 lines from haproxy.cfg. After restarting HAProxy, the website actually seems faster.

Do you happen to have show idle_in_session_timeout; set?

I am able to continue running queries successfully after this error message appears.

I suspect the connection is being killed by some timeout, HAProxy probably just spins up a new connection afterwards.

Problem solved!

HAProxy does not seem to like PDO::ATTR_PERSISTENT being set to (boolean)True.

I changed it to (boolean)False and the error went away.