Reclaiming storage capacity


I am testing cockroach and have imported / dropped data several times.

Currently, I dropped all the databases I created, so there is only the system database left (700KiB, for a total of 6 ranges)

On the disks and on the main dashboard, I can see that the cockroach store is using 1.6GiB with 133 ranges.

I can understand that many ranges were created during my tests but I understand that only 6 of the 133 ranges actually contain useful data (the system database) now that I dropped all my databases.

Is this correct ? Will cockroach do some kind of garbage collection to release the unused space ?

I would expect that the “capacity used (clusted dashboard)” = 3 * “database size (database dashboard)” with a default configuration and after cockroach has done some garbage collection of leftover replicas. Is this correct ?

There are two non-obvious ways that Cockroach may be utilizing more storage capacity than you’d expected.

First, Cockroach is continually writing time series metrics data that is what backs the admin UI. This data isn’t directly exposed, but still utilizes some storage. With that said, unless you’ve been running your cluster for quite some time, I doubt that this data makes up the 1.6 GiB storage utilization you’re seeing.

The second (and more likely, given your description) location that storage may be unexpectedly used has to do with the GC process, as you alluded to in your final statement. When data is deleted (or even updated, since Cockroach uses multi-version concurrency control), the old values are not immediately purged from the system. There is a GC period, during which time the old values can still be retrieved by using time-travel queries. The GC period also impacts things like incremental backups, which must be performed in the GC interval since the last backup.

By default, the GC period is 90,000 seconds (25 hours), but it can be configured via replication zone configs at the Cluster, Database, or Table level.

You should see your disk usage drop after the TTL period, unless you continue to add and delete lots of data. If that’s an expected workflow for you and you don’t need to retain data for historic queries or incremental backups, I’d suggest lowering the GC TTL interval, and that should reclaim the capacity more quickly :slight_smile:

Also, a note for @jesse; it looks like the docs have not been updated to reflect the change of default GC TTL interval from 24 to 25 hours, introduced in

Thanks for you precisions.

Is there a way to list the ranges that are “ready for gc” ?

Regarding the metrics, I thought they were inside the system database. Is there a way to surface them and see how much ranges they take (supposing they are stored as cockroach tables) ?

I read the section regarding metrics growing on but do you have an approximation of how metrics are growing like approx “KiB per metrics sample tick per node in the cluster” ?

according to (old issue) it would not be surprising to see 2.5GiB after a “couple of days” on a 6 node cluster ?

I think it would be very nice for onboarding to have a way of explaining the “capacity used” that is shown on the dashboard, by summing up things that relate to cockroach concepts (internal dbs, metrics, ranges, replicas).

As of now I will try reducing the TTL to 1 hour and report what happens on this thread.

echo 'gc: { ttlseconds: 3600 }' | cockroach zone set .default --insecure --echo-sql -f -

Thanks, @twrobel. Will fix this asap.

After a few hours, I can observe that cockroach has reclaimed some ranges.

From 144 ranges, it went down to 135 ranges and still 1.3GiB capacity used. So it reclaimed 300MiB. After a burst in “key written per seconds”, everythings seems pretty idle now so I guess that after 3 hours the GC reclaimed everything it could.

What seems strange to me is that before the GC, I had 432 replicas (144 * 3) and this number has not changed at all despite the decrease in total ranges. The “replicas per store” numbers also stayed constant. I would have thought they would go down to reach 135*3 = 405 replicas.

All in all, the 1.3 GiB would be consumed by the metrics (6 nodes alive for 6 days, all databases dropped, gc.ttl = 1 hour, cluster left alone for 3 hours)?

I found this old RFC about time series culling -

which states

Thus, the bytes needed per hour on a ten-node cluster is:

Total Bytes (hour) = 5500 * 242 * 10 = 13310000 (12.69 MiB)

After just one week:

Total Bytes (week) = 12.69MiB * 168 hours = 2.08 GiB

which is more or less what I observe.

10s metrics seem to be truncated after 30 days, so 1 node with the default configuration is approximately

Bytes Retained per node = 5500 * 242 * 24 * 30 = 958 320 000 Bytes ~ 1 GiB

These metrics seem to be stored in the replicated store, so with the default configuration, these data will be replicated 3 times.

If this reasoning is correct, a N node cluster will grow its capacity usage for metrics until it reaches around 3 x N logical GiB.

There may be some compression going on + this a done on a napkin.

From the observations in my tests with cockroach 1.1, it seems that the metrics capacity usage will grow during the first 30 days of operation and reach 1 GiB of capacity usage per node in the cluster. So ~ N GiB for N nodes.

Does that seem correct ?

Is there a way to reset the metrics ?

Not that I’m aware of, unfortunately. I’d also get value out some way to tell how much storage is being used by older versions of data which are due to be GC’d.

A complicating factor here is that a range may include both old data to be garbage collected, and live, current versions of data. So if a range includes any live data at all, it can’t be GC’d, only some of the values contained within it can. I’m actually not sure of how Cockroach handles this (if ranges are deleted once all the values contained within them are expired, or if sparse ranges are “merged” at any point).

I’m not 100% positive of the semantics around changing the GC TTL interval, but I think that the change only affects newly modified data. This is pretty standard for other systems which handle row-level TTL’s, such as Cassandra. So you may need to wait a full 25 hours since your last data was modified under the prior TTL to see the storage for it to be reclaimed (if a CRDB dev would like to, feel free to confirm/correct).

WRT you napkin calculations of storage utilization of time series metrics, those numbers seem about right to me, but I’ll defer to @matt (who authored that RFC) to confirm and make sure no other updates have been made since that RFC