Age-off strategies


#1

I am new to CockroachDB and am considering moving a Postgres table over. The table in postgres has approx 1billion rows, the table is partitioned by “week” (using a timestamp column). The partitioning by week is to enable “faster” age-off of data (we’ll only keep N weeks of data, dropping a whole partition has been faster than say “DELETE FROM X WHERE created_at < some_time”.

Any suggestions for how to accomplish this in CockroachDB?

Thanks


(Alex Narayan) #2

As suggested on the Gitter one way to do this without subscribing to Cockroach and obtaining an Enterprise license (which comes with some super nice perks) is to employ a trigger. The trigger could be written / applied to the table you want to keep current and then for every row added / updated etc., this trigger would fire and determine whether or not to allow the row to be added to the main table or have it added to an “old” table. The one caveat with this is that generally triggers come with large performance overhead and can make set based operations RBAR ones instead.


(Matt Jibson) #3

Cockroach doesn’t have an analogous way to do fast deletes of part of a table. The way to get the best performance for large deletes is to make sure that all indexes have a key prefix on whatever thing you are deleting on (created_at in this case), to prevent any full index scans during the delete.

Cockroach doesn’t have support for triggers.


#4

Such as: CREATE INDEX ON tbl (created_at)? what if I want to delete based on two columns, should I have an index with both columns? CREATE INDEX ON tbl (created_at, name);


(Matt Jibson) #5

That’s not quite it. You don’t need to add any indexes (which will only make a delete slower since there’s more things to delete). I mean that your existing primary key index and any other indexes that already exist should have their first column be created_at if delete speed is the most important thing. This may require changing your application or schema in ways that are not acceptable.

However this comes with problems if you have indexes on other columns that can’t have a prefix on created_at. The overall answer to your question may just be “sorry, we don’t do that yet”.


#6

What if I start from scratch and don’t have a primary key explicitly defined.


(Matt Jibson) #7

Then you should instead define the PK so that its first column is created_at. And, if possible, any indexes you create should also start with created_at. Note, however, that you may need an index on another column first (that’s kinda the whole point of indexes), so you may have to make a tradeoff between fast deletes and fast lookups.