Automatically retrying the first batch of statements after a BEGIN


I’ve been investigating some problems we have with CREATE TABLE statements sometimes returning retryable errors to clients. This is a problem particularly when the client is not prepared to handle such an error (e.g. when the CREATE is done by an ORM. There are specific improvements we can make such that a CREATE is less likely to return an error, but generally retries can be necessary for many reasons, for creates and other statements.

These errors are not observed by clients when they do either implicit transactions (single statements outside of a txn) or when the error is encountered by a statement sent in the same batch of statements as its BEGIN; in these cases transactions are automatically retried at the server. The reasoning for the latter is that we know that none of the statements in the BEGIN’s batch were the result of some conditional logic the client might have based on reads performed in the transaction.

Ben had a good observation the other day and I want to see if anyone has any thoughts: the reasoning about the client logic not being conditional extends to statements in the first batch after the BEGIN, if the BEGIN is trailing (or alone in) a batch. More generally, it extends to statements sent by the client before any results of previous statements from the same txn have been sent to it. So, we could also auto-retry such first batches. This would prevent clients from seeing retryable errors when they do, say BEGIN, then CREATE TABLE, then COMMIT, with the BEGIN and CREATE being in separate batches.

It is obviously true that statements in such a first batch are not conditional on reads (or generally stmt results) generated by their transaction. They could, however, be conditioned by things observed through other concurrent transactions - and it’s theoretically possible that, if the client was in charge of directing a retry, it’d refuse to perform it (or perform different logic in the retry). This is already true with the existing automatic retries; I can’t see a clear difference with extending the retries in the way proposed… Apart from the fact that, with this proposal, the server could start a retry after an arbitrarily long period of time after the BEGIN has been sent, whereas before the time was controlled exclusively by how long the server took to execute statements.
Does anyone have any thoughts on this?

A related question is when we should chose a timestamp for (the first attempt of) a transaction with its BEGIN in a separate batch. Currently, we chose the timestamp when we see the BEGIN (as opposed to deferring it until the first kv operation is performed). This is because we want functions like cluster_logical_timestamp(), which might be evaluated early, to return a consistent timestamp. It seems weird that it’s possible for the reads in that txn to not observe state that the client has observed (through other concurrent txns) before issuing them. So a proposal would be to defer the timestamp assignment at least to when we see the first statement in the txn after the BEGIN. However, @tschottdorf was telling me in another context that choosing the timestamp early allows us to guarantee to clients that, if they wait after doing the BEGIN, and only then issue statements they get . But now I can’t reconstruct what that guarantee might have been exactly…

cc @knz @bdarnell

I don’t think we’re violating any hard guarantees by deferring the execution of BEGIN, but it can be a little counter-intuitive. For example, take a database on which all inserts are basically instantaneous (i.e. no long-running txns).

The session BEGIN; sleep(5s); SELECT ...; COMMIT would never be expected to run into conflicts, but it may after the change. Either way, even with the pipelining, we can still chose the timestamp early, if we want a simple explanation on what timestamp is chosen. I think it’s intuitive that a transaction that begins earlier sees older data than a later one, but I think either option can work and doesn’t at all mean that we should be auto-retrying transactions whenever we can.