Cockroach DB dies after a week of doing nothing

I have a cluster of 5 servers, which I’ve been testing CDB on. I have not put any new data into it in about two weeks, and do not have any clients reading from it.

Even though it’s not doing anything, the servers still kill themselves every now and then, with no FATAL logs, and nothing standing out in the ERROR logs.

Also, it’s using an average of about 700MB RAM even though it has not been asked to do anything /in two weeks/.

I don’t know if I’m overlooking something, but what is the minimum spec for a machine to have before it can have Cockroach running on it without exploding periodically? I see you put up a page recently explaining how to add CDB to a Digital Ocean droplet, but you did not specify /what spec/ you chose for the virtual machine.

Hi kae,
for your information most of the “natural growth” of a CockroachDB cluster for now is due to internal monitoring – the internal monitoring data is collected in the database over time so you can analyze activity retroactively in the admin UI.

There was a known limitation in that old monitoring data was not properly deleted, causing slow but unchecked growth. Hopefully this will be addressed in the next beta.

Besides this, there is also an asymptotic growth to the maximum RocksDB cache size in RAM. RocksDB will naturally increase its cache up to the limit set with --cache-size or the default which is 25% of physical RAM, and this memory is not deallocated until node shutdown/restart. The cache activity causing cache growth is, in your case, mainly due to the internal monitoring activity.

We do not expect other sources of unbounded memory growth. If you suspect there may be a memory leak, we have some additional tools that we can help you use in your deployment to collect debugging data. Let us know if you are interested.

You missed the other question (thanks for the info on RocksDB, though - didn’t know that)

My other question:

We haven’t attempted to define a minimum spec for CockroachDB yet; most of our long-term testing has been on larger machines. Our current recommendation is to use at least 2GB of RAM for production deployment (this appears in our new blog post about deploying on digitalocean but hasn’t been integrated into the other deployment docs).

That said, if you’re seeing continued memory growth once the RockDB cache reaches its 25% plateau, that’s a bug that we’d like to fix.