Read-only transaction / read statement is blocked by writing transaction

Hi!

We have a case of slow transactions which could be explained by the following example:

-- create a table
create table test(id int primary key);
-- run this in the first SQL console:
begin transaction; insert into test(id) values(0);
-- run this in the second SQL console, will be blocked until first transaction resolves:
begin transaction; select * from test where id = 0;

In the scenario above, the second transaction will be blocked until the first one resolves (commits or aborts). We have a similar case when we have a “read-only” transaction doing a read from “the end” of the table (table is ordered by timestamps and we read the tail of that table) and another transaction is doing a write at the end of that table (write transaction is slow for some unidentified reason – we are still investigating – could be explained by contention with other write transactions).

The way it manifests is that our “read-only” transactions (they consist of few SELECT statements) take too much time.

Is there any way to let those read-only transactions to proceed without waiting for write transactions to resolve?

Here are the options we tried:

  • Marking transaction as “read only”, with the theory that read-only operations should not be blocked in a MVCC database – no effect. Also, a simple “select * from test” statement is also blocked.
  • Snapshot isolation is no longer supported – this would have solved it as well, I think?
  • Doing “AS OF SYSTEM TIME ‘-1 usec’”. Going back some amount of seconds (2 seconds on my laptop) solves the issue, but not clear which value should we use here. 2 seconds of “staleness” for these particular types of read-only transactions is acceptable for us, but something like 30 seconds is, perhaps, too much.
  • Setting priority to HIGH for these “read-only” transactions seems to do the trick, but looks like a very crude hammer to use (and we only have 3 priorities, so if we ever would need to use them for something else, one priority would be “used” already…).

Any other suggestions?

2 Likes

which database version are you using?

@idubrov I think you summed up your options nicely.

Using AS OF SYSTEM TIME -2s is probably your cleanest solution here, especially it it’s not critical to complete the read at current time. Generally, you can expect that you’d eliminate contention if you move your read further back than your longest running transaction not using AOST.

Hope that helps!

We use v19.1.5.

Hmm… -2s solution is problematic since we don’t know how far back we should go. The further we go, the less useful our reads are sice they become more and more stale; at the same time, there is still no guarantee that that would be enough. -2s was enough for my toy test on a simple table.

This is something where I though SNAPSHOT isolation would help, as, in theory, it should be able to completely avoid blocking: you get your own “snapshot” at the beginning and at the end only check write-write conflicts. Since we are only doing reads, there should never be any conflicts, so we should be fine.

However, to my surprise, it doesn’t look like this is how SNAPSHOT used to work in CockroachDB: I went back as far as v1.1.9 and it was still blocking.

No blocking in Postgres – though it seems like in general it is more optimistic, even with real conflicts it will let transaction to move forward and then roll it back…

This is something where I though SNAPSHOT isolation would help, as, in theory, it should be able to completely avoid blocking: you get your own “snapshot” at the beginning and at the end only check write-write conflicts. Since we are only doing reads, there should never be any conflicts, so we should be fine.

As you’ve discovered, SNAPSHOT isolation did not help with write-read conflicts (it helped with read-write conflicts, meaning a write happening after a read performed by another txn). SNAPSHOT meant that you read a consistent snapshot. If a txn performs the same read twice, it’s supposed to read the same thing. This means that a read cannot simply ignore uncommitted data: what if you read it again later and this time it is committed?

No blocking in Postgres

The default isolation level in Postgres is read committed - a very weak isolation level (weaker than snapshot). With higher isolation levels you should see the blocking.

-2s solution is problematic since we don’t know how far back we should go. The further we go, the less useful our reads are sice they become more and more stale; at the same time, there is still no guarantee that that would be enough. -2s was enough for my toy test on a simple table.

The amount of staleness required for your reads to not block on overlapping writes is dictated by the duration of the respective writing transactions.

Just ignore it. Since it is a “snapshot” isolation, it should continue reading data from its “snapshot”. In MVCC, this would mean reading data that was committed before transaction has started.

Initially I though this wouldn’t cause anomalies for read-only transaction, but it can: https://www.cs.umb.edu/~poneil/ROAnom.pdf. However, I would still expect true “snapshot” isolation not to block reads.

No, that’s not the behavior I’m observing. I tried serializable isolation in Postgres – it does not block nor it rolls back either transaction. It would roll back one of two transactions if they do writes, but if one does writes and another reads, no blocking. Reading transaction will see the state of the data as it was before writing transaction committed, even if writing transaction committed before read transaction “committed”.

That’s terrible. I don’t want to make any assumptions about our writes performance – especially, given that they vary a lot.

Just ignore it. Since it is a “snapshot” isolation, it should continue reading data from its “snapshot”. In MVCC, this would mean reading data that was committed before transaction has started.

I think that in fact we did use to “ignore it” when running in snapshot isolation in older versions (contrary to what I said before), but I don’t remember the exact details. The “ignoring” happened by “pushing” uncommitted writes out of a reader’s way. So I’m a bit surprised that it didn’t work for you on older versions.

No, that’s not the behavior I’m observing. I tried serializable isolation in Postgres – it does not block nor it rolls back either transaction

You’re right, I forgot that Postgres uses the Serializable Snapshot Isolation technique for their SERIALIZABLE isolation, and not some more traditional locking. Unfortunately for CRBD, that’s harder to implement in a distributed setting.

That’s terrible. I don’t want to make any assumptions about our writes performance – especially, given that they vary a lot.

If you don’t want to read stale data, and you also don’t want your readers to block for writers, you can set a high priority for your readers (as you’ve figured out). This will favor the readers over the writers, and can lead to possible starvation of the writers. What happens is that readers will “push” writers out of their way, forcing them to commit at a higher timestamp. They’ll only be allowed to commit at this higher timestamp if nobody has modified their read set since the transaction started - so there’s a performance penalty in checking all the read set and, under contention, they might fail the check and restart over and over.

I am working on following version on windows and started a cluster using powershell. I tried the suggested commands but not getting error.
build: CCL v19.2.1 @ 2019/11/18 23:34:09 (go1.12.12)

Please confirm whether “read” is hanging for you even after commit.
Is there something wrong in the way I am trying to reproduce the scenario?
– powershell 1
cockroach start --insecure --listen-addr=localhost:26257 --join=localhost:26257 --join=localhost:26258 --join=localhost:26259 --http-addr=localhost:8080 --store=node1

– powershell 2
cockroach start --insecure --listen-addr=localhost:26258 --join=localhost:26257 --join=localhost:26258 --join=localhost:26259 --http-addr=localhost:8081 --store=node2

– powershell 3
cockroach start --insecure --listen-addr=localhost:26259 --join=localhost:26257 --join=localhost:26258 --join=localhost:26259 --http-addr=localhost:8082 --store=node3

–powershell 4
– create a table
create table test(id int primary key);
– run this in the first SQL console:
begin transaction; insert into test(id) values(0);

–powershell 5
– run this in the second SQL console, will be blocked until first transaction resolves:
begin transaction; select * from test where id = 0;
commit;
root@:26257/defaultdb> begin transaction; select * from test where id = 0;
Now adding input for a multi-line SQL transaction client-side (smart_prompt enabled).
Press Enter two times to send the SQL text collected so far to the server, or Ctrl+C to cancel.
You can also use \show to display the statements entered so far.
->
-> commit;
id
±—+
0
(1 row)

Time: 1.5424ms

Make sure you press “enter” twice after you run begin transaction; insert into test(id) values(0);.