SET and PREPARE statements are unsafe in retryable txns

Here’s something I wanted to get some opinions on: as you know, crdb automatically retries statements when it can, in response to retryable errors. The interaction between these retries and the behaviour of non-transactional statements has not been thought out. For example, PREPARE and some SET statements have effect that outlive the transaction. This can lead to serious consequences:


Assume the SELECT gets a retryable error. On the first attempt, the txn copies data from foo.t. On subsequent attempts, it will copy data from bar.t.
Less catastrophic, but more likely, PREPAREing in a txn will fail on anything but the first attempt because the prepared statement will already exist.

The reasoning behind our automatic retries is that, if you discard the intents laid down by a txn, there’s no other side effects to take into consideration. The problem exists with user-directed retries too, except that, as opposed to automatic retries, in this case the client at least theoretically can restore state / avoid re-executing some statements.

What do y’all think we should do about this? The only option I see is to make all statements transactional: apply all side-effects only when a txn commits.

@knz @jordan

For one I would recommend peeking into what pg has to say on this topic. Last time I checked:

  • PREPARE is not transactional. To fix this we should just disable automatic retries when PREPARE is encountered.

  • for SET there is a difference between SET SESSION and SET LOCAL. The latter is transactional, whereas the former is not. Again to fix the problem at hand we must disable auto retries when SET SESSION is issued.

From pg docs:

If SET (or equivalently SET SESSION) is issued within a transaction that is later aborted, the effects of the SET command disappear when the transaction is rolled back. Once the surrounding transaction is committed, the effects will persist until the end of the session, unless overridden by another SET.

The alternative is too dangerous, I think. Even if we disable automatic retries, I don’t think we can burden the client with restoring state on retries.

okay I hadn’t read that.

Nevertheless my answer still serves as a general solution: for every statement X which is specified by pg to not be transactional, we must disable auto-retry whenever X is encountered.

Nevertheless my answer still serves as a general solution: for every statement X which is specified by pg to not be transactional, we must disable auto-retry whenever X is encountered.

OK, so we agree that we have a problem with SET regardless of automatic retries, right? Perhaps we should just refuse it inside of txns (with exceptions such as SET TXN ISOLATION LEVEL).

Let’s talk about PREPARE (I don’t think there’s any other stmts to worry about?). Perhaps we could disable the auto-retry when we see a PREPARE (and also when the preparing is done through pgwire), but wouldn’t it be better if preparing was transactional? Seems to me pretty analogous to SET [SESSION]. Even without automatic retries, I think clients will make mistakes and not restore state proerly when they want to use user-directed retries.
I’m now inclining to say that, short of making preparing transactional, we should reject preparing named statements inside transactions. I think unnamed prepares can stay, as those are meant to be used immediately (I think).

Dude why are you making this so complicated.

What about for every statement X which is specified by pg to not be transactional, we must disable auto-retry whenever X is encountered.

And just stick to this. Let the txn proceed, report the retry error if one occurs; perhaps enhance the error message to explain to the user why auto-retry has been disabled.

X = { set, prepare, set cluster setting, probably ALTER … SPLIT AT too }

The point I’m trying to make here is that there is virtually zero reason to reject such transactions outright. We can absolutely run them no problem, it’s just we can’t support auto-retry. The client is responsible to “shape” their txn the way that our optimizer requires if they want to get max flexibility, but it is merely an optimization - we should not prevent users from not using it!

My concern is not just about auto-retries, but also client-directed retries. Will users be careful to restore whatever state needs to be restored when retrying?

Once we document and specify which statements are not transactional, then it is the client’s responsibility to shape their txns accordingly, yes.

This is incidentally already true in pg, which has also non-transactional statements.

The state doesn’t necessarily have to be restored. Suppose I want to use extract on timestamps with a client-supplied time zone from a Go application. I must start a transaction, use SET TIME ZONE within that transaction (because without a transaction I have no guarantee of transaction reuse), then do my query. (We should also support AT TIME ZONE to avoid this use of session state, but that’s another matter). In this application, I can make no assumptions about the state of the session time zone unless I’ve set it within my transaction. I don’t care what happens to the time zone when my transaction is rolled back.

For PREPARE, we should disable automatic retries when it is used in a transaction. But I don’t think that helps for SET SESSION. To match postgres, we need to restore the previous value when the transaction is rolled back, whether it is retried or not. Or we decide not to match postgres in this respect and the auto retry is fine.

wouldn’t it be better if preparing was transactional?

But that differs from postgresql’s implementation (and, I think, general sql expectations). Many clients maintain a set of statements that have been prepared and reuse them when possible; if these clients expect to be able to reuse statements after a transaction rollback, then we can’t make the preparation contingent on the transaction commit.