Cockroach-temp folder not cleaning up

Hello,

I’m trying to get some help dealing with a cockroach-temp folder that has expanded beyond expectations and I’m uncertain if manual intervention is needed.

Running cockroach 2.1.0 with the following command: “cockroach start --certs-dir=<cert_dir> --store=<store_dir> --port=26257 --http-port=443 --cache=.25 --max-sql-memory=.25 --logtostderr=ERROR --join=<other_node_1>,<other_node_2>”

Have a 3 node cluster with 250 GiB per node and the admin UI reports the data I have in it to be less than 20 MiB while the time series data (for the admin UI I believe) is 750 MiB.

Yet somehow have 100 GiB on two nodes in the their respective cockroach-temp folders. Is there anyway for me to see what is causing this giant set of data? How did it become this size given my configuration and default for “–max-disk-temp-storage” of 32 GiB? When can I expect it to be cleaned up? Can I force a clean up?

Some background before this happened:

  • We had some unknown issue cause one node’s (n3) disk to completely fill up which caused the node to go offline.
  • While it was offline we only had 2 nodes. Looking at the logs, that was causing errors related to being unable to replicate to 3 nodes. We believe this was what led to the one node (n1) write 100 GiB to its temp folder in about 24 hours (but are unsure).
  • When we were able to bring n3 back online, the disk usage stopped growing on n1 but over the next 48 hours we had 100 GiB written to the temp folder of n3. The writes to n3’s temp folder appears to have nearly stopped over the last few days.
  • Also, all the while, n2 has less than 30 MiB in its temp folder

Hey @crudolph Hey @crudolph

The cockroach-temp directory is a directory used for queries that exceed the available memory, and so are temporarily stored in this directory. The size available in memory will help offset usage of cockroach-temp by changing the --max-sql-memory setting in the cockroach start command. The cockroach-temp directory is cleaned up asynchronously in the background after the queries are completed. If a particular node is getting a large temp directory, then I would suspect that it was the node selected for usage for a particularly large query or transaction.

Since this is the earliest 2.1 minor release, at this point it would be advised to update to a much newer 2.1.X minor release, so that the latest updates from Oct 2018 until now have also been included in the cockroach binary. Let me know if this is possible, and we can take a closer look.

Cheers,
Ricardo

Okay will look into upgrading to newer 2.1.X.

For it being cleaned up asynchronously, can you give an expected order of magnitude when it should be cleaned up? Once an hour / day / week / month? Can some things in the temp directory by long lived?