I have a 5 node cluster v19.1.3
I am not exactly sure what the cluster is doing but about every 8 minutes the nodes crash with
The servers are kvm centos 7 linux servers with 64GB of ram. It looks like at the time of the crash the free -m command will have
My /var/log/messages file is rather large at 2.4GB just for the day. I see a lot of messages like (sorry have to type all manually no copy/paste ability)
node: dbserver01 type=SYSCALL msg=audit arch=c000003e syscall=263 success=no exit=-2 items=1 comm=“cockroach” exe="/data/cockroach-v19.1.3.linux-amd64/cockroach subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=“delete”
There are tons of these that fill the log. I am working with the developer to see if there are deletes going on.
I saw this post
and it seems very similar to what we are seeing. I do not see any delete command in the cockroach gui however under statements. My data directory is 250GB in size.
Is there a way to reduce the logging to /var/log/messages, and from what I can see with the std::bad_alloc is that we ran out of memory however the available memory under free -m is always showing about 30GB free and then it crashes, available then jumps back to 50GB and the countdown back down to ~ 30GB happens again and the loop continues.
I have a 1GB swapfile. It seems like my first step might be to increase that and then maybe increase the RAM on one of the members in the cluster and see if this node survives.
I’m new to cockroachdb so any help is welcome.