Rare race condition issue with AS OF SYSTEM TIME across selects

I have one table that has a non-explicitly-FK’d reference to another. In separate SELECTs I first grab a row from the one table along with clock_timestamp(), then I use that timestamp in an AS OF SYSTEM TIME query for the other table. Recently after millions of invocations, I found one failure in my logs that said:

AS OF SYSTEM TIME: cannot specify timestamp in the future

I’m using lib/pq in Go, Cockroach 2.1, and literally taking the time.Time from the result and passing its UnixNano() digits as the AOST value. I am trying to get some kind of consistency using historical queries without explicitly FK’ing or running in an explicit transaction.

So, is this a bad way of doing it (e.g. because that timestamp is node specific)? Is there a better way while still avoiding fk’s and txn’s such as a different current-time value/function, even if I’m willing to accept rare cases of mismatch? Maybe just subtract my maximum expected cluster skew from the AOST value and accept I may miss values? Or, and pardon my lack of update, has anything changed in the more recent versions that might solve this (e.g. this PR) that may affect resolution?


What version exactly are you running?

How bad is the latency between modes when this happens?

Does it go away if you link the tables with a foreign key?

Does subsequent queries result in data coming back or once you see this all selects are like that?

clock_timestamp() is the wall clock and is unrelated to anything that’s happening inside the database. Heck, your system time may be going backward (and clock_timestamp() too) or a century in the future for a few seconds and CockroachDB would not notice.
However — of course — that means it also has nothing to do with transaction ordering and is thus unsuitable for use with AS OF SYSTEM TIME.

Also, the PR you’re linking too is unrelated.

There are various approaches available to create useable properties out of transaction ordering, depending on the use case. What about you first try to explain to us what you’re trying to achieve (and why you’re avoiding SQL transactions), and then after the problem is stated we look at solutions together?

Thanks for responses!


I am unaware as this was one rare case in my logs.

Have not tested. Surely it does because that gives referential integrity that cockroach respects.

Yes, everything else works fine.

Yup, figured it was the wrong one. The question is what is the right one?

Basically, I have a table named foo with a column bar_ids which has an array of identifiers to bar. I do a SELECT bar_ids, clock_timestamp() FROM foo WHERE foo_id = 'someid'. Then, with the result I do SELECT * FROM bar AS OF SYSTEM TIME <result_of_clock_timestamp> WHERE bar_id IN ('result1', 'result2').

Clearly clock_timestamp is wrong and I should have read the docs better (should have at least used current_timestamp). I want to avoid a FK on bar back to foo and in general this is not the exact schema obviously, but curious about the answer without reworking schema. I wanted to avoid transaction logic (especially retries) in this fast, read-only path. I’ll even take dirty/stale data, and am considering just querying as of a certain amount of time in the past for all calls. Would that be a better approach?

am considering just querying as of a certain amount of time in the past for all calls. Would that be a better approach?

Yes. That would be the better approach. (AS OF SYSTEM TIME accepts a negative value to indicate a time in the past relative to the current transaction).
Note that in 19.1 we are releasing a new feature called “follower reads” which may help solve your problem too.

1 Like

Thanks! I will do that. So, in my first select, can I have SELECT bar_ids, statement_timestamp() - '5s' FROM foo AS OF SYSTEM TIME '-5s' WHERE foo_id = 'someid' and trust that the returned timestamp is the same a was used for the AOST part? The docs seem to indicate that this is the case.

No, statement_timestamp() is also unrelated.

Also, the behavior of any of the _statement() timestamps in combination with AS OF SYSTEM TIME is somewhat unpredictable and subject to change.

If I look at your previous example:

I have a table named foo with a column bar_ids which has an array of identifiers to bar .

  1. I do a SELECT bar_ids, clock_timestamp() FROM foo WHERE foo_id = 'someid' .
  2. Then, with the result I do SELECT * FROM bar AS OF SYSTEM TIME <result_of_clock_timestamp> WHERE bar_id IN ('result1', 'result2') .

My suggestion was to do step 1, then wait 5 seconds, then step 2 with select * from bar as of system time '-5s'. You do not need to use a _timestamp() function.

The AOST page mentions that the interval is added to that value.

Hrmm. This may be worth noting on the AOST page because otherwise I assume I can use timestamps as though the AOST clause was not there.

This doesn’t really fit my use case. My use case is to do the two selects as fast as possible, remaining separate selects, and outside of a transaction. Basically as though I was doing a JOIN (but I can’t do a join because the derivation of bar_ids is more involved than just the select in my example) though I’ll tolerate a (very) rare race condition of mismatch which I had previously but had that one rare case of clock_timestamp in the future across selects.

Can’t you first pick a timestamp using SELECT now() or SELECT now() - interval '1s' and then run both SELECTS with that same AS OF SYSTEM TIME timestamp? I think this would achieve what you want.

Note that AS OF SYSTEM TIME is designed for historical queries, and trying to use it for times too close to the current time can result in certain anomalies. You’ve seen one of them, that you may reach a node whose clock is running a bit behind and it complains about the time “in the future”. More subtly, if you do a write and then a read with AOST now(), you’re not guaranteed to see the value you just wrote (because you may do the read on a node with a slower clock than the one you wrote too).

The latter case is why read-only transactions sometimes need to retry: by default we detect possible reorderings like this and that may force us to change results we’ve already given to you. AOST suppresses this detection, so you’re not guaranteed to see the absolute latest writes even if you’re not giving it a time in the past.

With the exception of retries, read-only transactions don’t have any performance overhead. Doing these two reads in a transaction will be just as fast as doing them separately. I would generally recommend using transactions instead of linking up two separate statements using AOST. Using new syntax in the upcoming 19.1 release, you can combine AOST and transactions to avoid restarts:


or, if you’re using a framework doesn’t let you customize the BEGIN statement,


Then you can do your two selects in a transaction, which is far enough in the past to avoid the risk of retries.

Occasionally you may still want to run separate statements at the same timestamp (for example, if you want to run them in parallel). In this case it’s best to select a timestamp first and pass it in to all the queries. Getting the timestamp at which the first query ran is possible, but subtle, and it doesn’t avoid the “timestamp in future” problem (we deliberately don’t document the function involved because it’s surprisingly difficult to make robust use of it).

Have you considered a subselect?

SELECT * FROM bar WHERE bar_id IN (SELECT bar_ids FROM foo WHERE foo_id='some_id')

If you can’t find a way to express what you want as a JOIN, it’s often easier to write it as a subselect. This will get everything in one statement so you don’t have to worry about either transactions or AOST, and it should be at least a little faster. It will at least avoid the extra round trip to send bar_ids back to the client and come back with the second query. In the best case the optimizer will be clever and see some way to optimize the combined query an a way that it couldn’t do with two separate queries.


Thanks, this is all good info, especially that read-only transaction as of a certain time. I also appreciate the value of selecting a timestamp and then using it in subsequent queries.

I had, but I have to do some side work before I get what I’d need to subselect/join with (and in my strange use case, it’s not reasonable for me to denormalize on the way in to have it ready).

Thanks again for the info.