I’m running a single-node cluster of Cockroach to power Caddy’s telemetry data, but we’re on a tight budget since we don’t make any money from telemetry so I’m trying to keep the machine as lean as possible.
The vast majority of telemetry is inserts, but with a few very small (one-row) selects. Originally, CPU use was off the charts, until I started batching the inserts. Now the inserts happen on the order of ~10-2000 rows at a time (2000 small rows, or at most about 100 larger rows, no more than 1 MB total). Although CPU usage is more manageable now, I was surprised to see that memory usage continues to skyrocket.
I’ve followed the recommendations for running in production by setting --cache-size=.25
and the other things in the guides for deployment – great docs, by the way, thanks for those – but no luck.
Just to see how high it could get, I tried increasing swap space to like 6 GB and it still fills it all up. So I’m not sure that adding more memory will even solve the problem.
I did come across this article which says:
Finally, note how how the formula above has the number of nodes as divisor: adding more nodes, keeping all parameters equal, will decrease memory usage per node. This property can be used advantageously when running off cloud resources, where memory is often not priced linearly: adding 10 new nodes with 2GB of RAM each is usually cheaper than switching five existing nodes from 2GB to 6GB each. Scale can lower your costs!
Can anyone here independently verify this? I’d be willing to double our spending on telemetry for now if I can know it will work. (Also I can’t afford to 10x our costs, so adding 10 nodes is out of the question. I admit it is a fairly small node. Minimum recommended.)
Any other recommendations? Specifically, any quick wins to reduce memory use? Slightly slower performance is OK. I am happy to provide more information, just tell me what to do; a bit new at this still.