Phantom Reads & Secondary Index Locks

tl;dr: Does a query inside of a transaction grab pessimistic read locks on all of the required ranges of secondary indexes, for the duration of the transaction?

I have a question around a variant of the phantom read anomaly.

Example:
Let’s say we’re designing a system to detect collision of objects on a line, and each object takes up some space [x, y] on the line. Within each transaction we want to perform a query from [a, b] (likely larger than the object itself), calculate where there is room to determine a new [x, y] where there is no existing object, and commit to the database this new object.

The line is very large, and we want to support high throughput over the entire line, but we expect the throughput on a practical range to be relatively low. So we would like to perform the following transaction:

pseudo code below:

tx = BeginTransaction();
objects = tx.Query("SELECT * FROM objects WHERE objects.x >= a AND objects.y <= b";)
// calculate where we can place the object, let's call these i and j
tx.Exec("INSERT into OBJECTS (x, y) VALUES (i, j)";)
tx.Commit()

We want a guarantee that between the Query and the commit, no new objects have been placed on the line between a and b (ie: a ReadLock on the Range[s] of secondary indexes). By only grabbing it on that range we should maintain throughput on the rest of the line (or most of it at least, it’s ok if any pessimistic lock is grabbed on a slightly larger range than we queried for).

Thanks for posting, @steeling1.

I’m going to let @nathan, the author of that blog post and one of our core engineers, respond with the truth here when he can. But my sense is that it’s not quite right to think about locks in CockroachDB. I think we use a mechanism called the “timestamp cache” to make sure writes don’t conflict with ongoing reads. As we state here, if the timestamp for a write is less than the timestamp cache (the latest read), we attempt to push the timestamp for the write forward to a later time, which might cause the write transaction to restart.

Best,
Jesse

Thanks for the response Jesse! From the footnote at the bottom of this post https://www.cockroachlabs.com/blog/transaction-pipelining/, it sounds like Transactions use pessimistic locks such that other transactions must wait to acquire the lock.

I could see that this may not be the case for a read across an entire table, but we have strong safety requirements to ensure that each write has *considered the other objects in it’s range.

  • I forgot to mention in the previous post that it is not enough for us to preserve uniqueness in the x-y range, because we can allow collisions in some special cases, but we need to make sure that each case is at least considered.
1 Like

What @jesse said about the “timestamp cache” is spot on. CockroachDB does not perform any locking during reads, only during writes. Instead, when performing reads CockroachDB bumps an in-memory data structure (backed by a concurrent skiplist) that records the largest timestamp of reads over arbitrary spans of rows. This “timestamp cache” is then consulted during write operations to prevent a write in a different transaction from invalidating a previous read at a higher timestamp. Remember that CockroachDB is an MVCC database, so reads and writes only interact if a write has a lower timestamp than a read. So the rules are that:

  • If a write finds that it overlaps with any prior reads at higher timestamps then it is forced to move its write timestamp forward, which may induce a transaction restart if we’re not able to prevent it.
  • If a finds that it overlaps only with reads at earlier timestamps then it is free to proceed without issue.

For more on this approach, you can read Maysam Yabandeh’s A critique of snapshot isolation, which uses a similar structure (called a status oracle in the paper) to achieve serializability.

Thanks for your reply Nathan!

So to clarify, in your example here https://www.cockroachlabs.com/blog/consistency-model/ under “CockroachDB does not offer strict serializability”

As a next step, if Nathan were to attempt to apply any mutation to the database his write would have failed because his read timestamp would have been behind the newly applied write timestamps?

If yes it sounds like CockroachDB meets our requirements as described in the problem above.

As a next step, if Nathan were to attempt to apply any mutation to the database his write would have failed because his read timestamp would have been behind the newly applied write timestamps?

It depends on what Nathan was trying to write. If he (I :slightly_smiling_face:) was trying to write to rows that had been modified at later timestamps than my initial read then I would have to restart my transaction. The problem in the blog post is very derived though to show exactly the case where serializability and strict serializability diverge. As such, it’s not particularly easy to extend or talk about more abstractly.

I think it’s actually easier to talk about the example you provided above, which doesn’t have anything to do with strict serializability. In that example, the property you want is that the read portion of your transaction can’t be invalidated before the write portion completes and commits. This is exactly the guarantee that serializable isolation gives us. Serializability says that the execution of transactions in a system must always be equivalent to some serial history of those transactions. Here’s a more formal definition: https://jepsen.io/consistency/models/serializable. Less formally, I find it useful to think that unlike with snapshot isolation in an MVCC database, where a transaction has a read timestamp and a write timestamp, with serializable isolation in an MVCC database a transaction only has a single timestamp where it both reads and writes.

Another way of phrasing this is that serializable isolation is immune from the phantom read anomaly and that CockroachDB provides serializable isolation.