Auto-Expiring data

I am evaluating the features of cockroach to maybe migrate data from Cassandra to Cockroach.
One thing I was wondering:

Does cockroach support the expiration of data (rows and/or cells) or will it ever support it?

Having TTLs in Cassandra is very useful for a lot of use cases. Expiring data in common SQL DBs can be very slow and painful. Especially if a small portion of data should be expired in millions or hundrets of millions of rows. Maintaining an index just to be able to discover old data produces overhead for inserts and maybe even more inefficient for distributed environments.
Not doing so requires full table scans. I don’t know how that will perform with Cockroach but for “plain old” RDBMS this can be very expensive.

CockroachDB does not currently support any sort of expiration for table data. It’s something we’ve talked about a little and we might support it in the future but we don’t have any definite plans to build at this time.

Thanks very much for your response!

This feature would be essential for users of Cassandra and other NoSQL databases that want to transition back to SQL with CockroachDB. I highly encourage you to invest some time into developing this feature as time series work loads that can be mixed with OLTP and Ad Hoc queries would be a game changer for companies looking to consolidate database platforms.

+1

Am 22.06.2017 21:48 schrieb “somecallmemike” <
cockroachlabs@discoursemail.com>:

This is fairly easy to workaround as you can include a timestamp col and have a job do a delete every so often based on the age. Although a nice to have, there are a few things that I think are more of a show stopper than this…

Sure. There is always a workaround. TTLs are fairly cheap in CS because
expired cells are simply dropped during compaction that has to be done
anyway.
Maintaining a expire column in SQL and delete row by row is far from being
cheap.
Elasticsearch has another, also nice aproach. It allows kind of
partitioning by adding multiple indexes to an alias. That enables you to
have say one index per week and to just drop expired indices. Maybe this
approach could be worth thinking about for Cockroach?
E.g. MySQL also supports paritioning. If there was an atomic command to
create new partitions and drop old ones or even use an automatic schedule
for that - that would be awesome and I could imagine that it has not a huge
overhead performance wise and should not collide with cockroachs internal
architecture as it already does data partitioning in terms of
chunks(,shards,whatever). Maybe it is possible to group these chunks into
partitions or put write barriers on them so that only a subset of them is
writeable and expired chunks can then be safely deleted.

Has TTL or auto-expiration been addressed by your team yet Ben @bdarnell ? Would really like to see this get added to CRDB.

No we didn’t. We don’t have crdb in production but I plan some evaluation
soon

Benjamin Roth, sorry should have been more specific. I was asking Ben Darnell @bdarnell of Cockroach the question.

We haven’t implemented it yet either. :slight_smile: It’s not on the roadmap for 1.2, but we’re tentatively planning it for 1.3.

Checking in on this, we’re up to 2.1 now and I don’t think this feature is available. Is there an issue number tracking this feature? If so, I couldn’t find it. Thanks

The issue is https://github.com/cockroachdb/cockroach/issues/20239

1 Like