Hi Jens, thank you for the thoughtful question.
What you’re observing is a known deviation from postgres’ concurrency control behavior, and it relates to more than just SELECT … FOR UPDATE. In postgres, row-level locks never block data querying, they only block writers or lockers of those rows. This is discussed in https://www.postgresql.org/docs/current/explicit-locking.html.
In CockroachDB, things are different. Reads wait on both writes (e.g. a read will wait on an UPDATE) and on locks (e.g. a read will wait on a SELECT … FOR UPDATE, as you noticed).
There are two reasons I can give to justify this difference in behavior. The first is that CockroachDB is a distributed system, which means that our approach to concurrency control has additional challenges. The most relevant of these is that commit information for transactions is not centralized in cockroach, like it is in postgres. So even with a policy of never blocking reads on writes, reads would still need to perform remote RPCs when they find an exclusive lock to determine whether the lock and the provisional value it is protecting should be visible. So we’d lose a lot of the benefit of such a policy as this RPC would impose latency and extra load on the system. As it turns out, waiting is actually a better policy for total system throughput, all things considered.
The other reason for this difference in that is that CockroachDB was designed with serializable isolation as its focus. Meanwhile, Postgres only gained support for real serializable isolation in version 9.1 (2011, 15 years after the initial release). This history is important when it comes to the behavior we’re discussing here. Having reads never block on writes is a big win for lower levels of isolation like snapshot and repeatable read. However, at higher levels of isolation, it can actually hurt performance as it leads to scenarios that require transaction retries (discussed in https://www.postgresql.org/docs/current/transaction-iso.html#XACT-SERIALIZABLE). Here’s an example that demonstrates this:
nathan=# show transaction_isolation; |
(1 row) |
nathan=# begin; |
| nathan=# begin;
nathan=# insert into kv values (100, 100); |
INSERT 0 1 |
| nathan=# select * from kv;
| k | v
| (0 rows)
nathan=# select * from kv; |
k | v |
100 | 100 |
(1 row) |
| nathan=# insert into kv values (200, 200);
| INSERT 0 1
nathan=# commit; |
| nathan=# commit;
| ERROR: could not serialize access due to read/write dependencies among transactions
| DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt.
| HINT: The transaction might succeed if retried.
Notice that because the read in the second transaction did not block on the first transaction’s write, it returned a result that could not be serialized with the rest of its effects, so the transaction had to throw an error. These retry errors are disruptive to users and require retry loops in applications. They also hurt system throughput, as it can lead to thrashing. Cockroach was aware of this during its design, and it decided that since it was primarily focused on serializable isolation, it would be better to avoid these situations by blocking reads on writes. We still need transaction retry loops but in fewer situations.
That all said, these two arguments mostly apply for exclusive locks acquired by mutations, not to SELECT … FOR UPDATE, which itself isn’t actually protecting any provisional changes. For that reason, we are exploring ways to make it so that reads do not block on this form of lock. This is being tracked in https://github.com/cockroachdb/cockroach/issues/49684.
Cockroach also supports the ability to override this behavior using transaction priorities. A high priority transaction will not block on a lower priority transaction’s locks, which can have close to the effect you are looking for. See https://www.cockroachlabs.com/docs/stable/transactions.html#transaction-priorities for more.