Question about backup/dump consistency

Hi all,

first of all, thank you very much for this great database. I read your blog post about consistency and the topic of backup consistency isn’t clear to me. Depending on licensing we have 4 cases:

  1. dump with as of system time
  2. dump without as of system time
  3. backup with as of system time
  4. backup without as of system time

Case 3 can create an inconsistent backup, right? But what is about other cases?

Another question is how big databases are supported by dump? Or is there only time based limit like 25 hours?


AS OF SYSTEM TIME explicitly sets the logical time at which all the chunks of the backup (or dump) are run, but if it is not specified, we pick the current time as that time before we start, and it remains fixed at that throughout the backup/dump.

Basically, you can think of AS OF SYSTEM TIME now() being added by default if not specified.

So in all your cases, the whole op has one time and should remain consistent. Does that make sense?

As far as size, there are some practical limits since dump essentially runs a SELECT * under the hood meaning all the data has to be streamed back to the one dumping node to make the one, unified dump file, whereas BACKUP can write individual chunks directly from their storing nodes to the backup destination in parallel, since it is a custom chunked storage format. But there’s no explicit limits — just at some point a DB might be too big to easily stream it all though one node given that node’s hardware, network, config, other traffic, or other factors.

Thank you very much for your quick response!

So in all your cases, the whole op has one time and should remain consistent. Does that make sense?

I’m referencing this quote:

Relatedly, a backup, taken through the backup database command, is using as of system time queries under the hood, and so a particular backup might contain row 2 but not row 1.

If the backup contains only second row, then it can’t be restored due foreign check constraint, right? Or it is fixed since blog post?

So in the case mentioned in the blog, technically, the BACKUP is consistent at that time: what’s happening is that the second insert manages to pick a time before the first – and thus the select, or backup, if it is run at a time between those two, sees only the second (since according to the timestamp it now actually “happened” first). This is consistent, but certainly strange.

If there was actually a foreign key in place when those inserts were run though, that second insert would have needed to read the value form the first and thus had its transaction ordered after it, so it would not be possible to see it and not the first.

In any case, the backup is restorable – we don’t re-check foreign keys during restore since we only restore the foreign key relationship if we’re restoring both tables and – since the backup is known to be consistent at a given time – we can assume that if the relationship was valid in the backup cluster, it is still valid in the restored data.

Thank you again. I understand, that the backup is considered consistent with current global state of this distributed system. But it isn’t from data view. Even worse, you don’t notice the inconsistency of data and I can’t see any way how to check the integrity of data after restore. Because, as I understand, mentioned example is just one example and there are more cases.

Do you think it is technically possible to avoid this or even planned? I’ve read this blog post and can imagine some workarounds (yep, not solutions):

  1. I think to wait 250ms (or more) on writes for backup preparation is a really considerable workaround, if the goal to have a data consistent backup. I can imagine something like:

cockroach dump startrek episodes --insecure --user=maxroach *--data-consistent* > backup.sql

Maybe something more powerful, like creating a data-consistent checkpoints, which can be used for backups (or Time-Travelling). This allows to create backups in multiple steps (like iterator; sure, it is possible with Time-Travelling already). But it’s introduce CRDB specific commands.

While Time-Travel is nice (to have) feature, I really don’t care about it and it can stay inconsistent at data level, if checkpoints aren’t used.

  1. Activate global clock temporary. Temporary drop in write performance for same 250ms to get consistent timestamp is less disturbing in comparison to 1.

  2. Play clock ping-pong between nodes to get more precise clock delta between CRDB internal clock and system clock. While this doesn’t leads to data consistent backups, it makes the possibility far smaller and can be used with 1. or 2.

What do you think about it?

In case it wasn’t clear, if the two tables have an FK relationship, then the “causal reverse” can’t happen in the first place – the second insert is forced to check the value from the first, at which point the transaction interact and thus the “first” actually happens first. In general, that is the best way to avoid “causal reverse” situations – ensuring that writes re-read whatever it is they’re related to, either by explicitly selecting in the same txn or implicitly via an FK.

Without something like that, these anomalies can happen to any backup, dump and even a SELECT that just happens at an unlucky timestamp (or is assigned one via AS OF SYSTEM TIME). That said, in normal application traffic, where SELECTs just always run at current time, the window to observe such an anomaly would close quickly as current time moves beyond both writes, while a BACKUP or time travel query that runs in that window will continue to see it.

It sounds like what you’re suggesting is briefly, prior to a backup, temporarily bumping the whole system from serializable to linearizable so such anomalies are impossible just long enough to get timestamp that we can pick for BACKUP? We don’t currently have any sort of support dynamically making such a change, but it is an interesting idea! Unfortunately it is probably also pretty complex to implement, so I’m not sure it would be something we’d be able to do in the near term. For now, if causal reverses are a deal breaker, I think the best bet is to avoid them in the first place (e.g. with FKs) as described above.

Does that make sense?

Even it was stated, it was not clear to me, that:

SELECT amount FROM table WHERE id = 1;
UPDATE table SET amount = amount + 1 WHERE id = 1;;

avoids the “causal reverse” at least for updates.

Yes, exactly. Like it’s described in blog post for spanner, but only temporal and may be without blocking writes.

Well, I can’t know that, but from my point of view it is a critical feature. I understand, that the whole point of cockroachdb is to have operational reliable database. But if backups are not data consistent, well, it can have a huge impact if they are needed for disaster recovery.

I think this topic can be marked as resolved. Should we track it via github?

Hey @david, I’ve another idea for this challenge:

What do you think about it?