Cockroach Update Behaviour

Hi,

I have a table that contains two columns:

CREATE TABLE processLog (
url STRING NOT NULL,
log BYTES,
CONSTRAINT “primary” PRIMARY KEY (url asc)
);

There are only 47 Rows within this table but the BYTES value of every row gets updated every 5 minutes with roughly 3 kB of data. Within roughly one day the table has grown to 104.6 MiB in size. It seems like cockroach stores every version of every row. Is this correct? Is there anything I can do to make sure such a table doesn’t grow out of proportion?

Best Regards
J

Hey there Jens,

It seems like cockroach stores every version of every row. Is this correct?

Yes, that’s more or less correct. Cockroach uses multi-version concurrency control (MVCC), so previous versions are retained for the length of the configured GC period, which is 25 hours by default. So once you get past a day’s worth of updates, you should actually see that usage number flatten out, as older versions expire.

If you don’t need or want that day’s worth of prior versions, the GC TTL value is configurable via Replication Zones, which you can apply to only your processLog table so that the behavior of the rest of the system is unaffected. See this section of the docs for details on configuring, and note that it’s recommended to not use a value less than 600 seconds (10 minutes).

1 Like

Hi Taylor,

thanks for your reply. I’ve reduced the GC TTL to 1200 seconds and the table size has been reduced. However it is still at 45.0 MiB which seems quite large given the fact that only 47 rows are present. Is there anything else I’ve to change besides ttlseconds?

Best Regards
J

Is the usage steady at 45, or still decreasing? I’m not positive, but I believe that the TTL change will only affect newly written data (a dev should be able to confirm that). If that’s the case, I’d expect it to decrease over the next day until it reaches a steady state where everything is being expired after the 20 minute window you configured.

We can do some back-of-the envelope math to see what we would expect usage to be:

47 rows * 5 versions per row (1 live and 4 historic in the 20 minute window) * 3kB per version * 3 replicas = 2115 kB, or right around 2MB. There’s also other metadata that the database tracks for internal use that I believe counts towards that total usage number. Still should be quite a bit smaller than the 45 MB you’re seeing tho.

Guess you’ve been correct again. The table size is now around 2-4 MiB. :slightly_smiling_face:

1 Like