What does the Transaction Retry Error - RETRY_DELETE_RANGE mean?

Sometimes (very less) in my log I am seeing exception like below -

ERROR [2017-11-06 19:31:44,684] [U:232,F:394,T:626,M:1,878] storage.config.CockroachConfigStore:[CockroachConfigStore:executeTransaction:200] - [config-store-exec-8] - Got exception with error code 0
org.postgresql.util.PSQLException: ERROR: restart transaction: HandledRetryableTxnError: TransactionRetryError: retry txn (RETRY_DELETE_RANGE): "sql txn" id=f6f9cba2 key=/Table/54/1/10000/9/1266132426/"_canonical"/1509996703546/0 rw=true pri=0.01513008 iso=SNAPSHOT stat=PENDING epo=0 ts=1509996704.672742665,1 orig=1509996704.669006245,0 max=1509996704.669006245,0 wto=false rop=false seq=3
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2284)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2003)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:200)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:424)
        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:321)
        at org.postgresql.jdbc.PgConnection.execSQLUpdate(PgConnection.java:472)
        at org.postgresql.jdbc.PgConnection.releaseSavepoint(PgConnection.java:1592)
        at org.apache.commons.dbcp.DelegatingConnection.releaseSavepoint(DelegatingConnection.java:461)
        at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.releaseSavepoint(PoolingDataSource.java:380)

Can someone let me know what does this exception denote?How can this be avoided?

The code that is throwing this error

conn.setAutoCommit(false);
conn.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);
...
private boolean executeTransaction(Connection conn, Transaction transaction) throws SQLException {
        Savepoint sp = conn.setSavepoint(SAVE_POINT);
        boolean isSucess = false;
        try {
            isSucess = transaction.run(conn);
            conn.releaseSavepoint(sp); // This line throwing the exception
            logger.info("Transaction successful");
        } catch(SQLException e) {
            logger.error(String.format("Got exception with error code %d", e.getErrorCode()), e);
            conn.rollback(sp);
            conn.releaseSavepoint(sp);
            return isSucess;
        }
        conn.commit();
        return isSucess;
    }

@tuk RETRY_DELETE_RANGE is returned when the internal DeleteRange command
is sent and conflicts. (The presence of a DeleteRange in a txn effectively
acts like a SERIALIZABLE transaction regardless of the chosen isolation
level.) I couldn’t tell you offhand why that happened without seeing the
SQL that caused this error.

As for the fix, it looks like your retry handling isn’t quite right (unless
there’s some context I’m missing from your code sample). Transactions
should be run in a loop and when a retryable exception is caught, the loop
should be retried.

@dan - The cockroach logs (--vmodule=executor=2,session=2) are attached. cockroach.log.ttf (570.1 KB)

Why it is acting like a SERIALIZABLE transaction even though I have set
conn.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);

How can I reliably use SNAPSHOT_ISOLATION?

My intention is to use SNAPSHOT_ISOLATION and not doing client side retry.
I want to retry it based on my application logic.

Rename the file to *.gz from *.ttf.

The table structure -

CREATE TABLE config_versions (
    customerid integer NOT NULL,
    objecttype integer NOT NULL,
    objectid bigint NOT NULL,
    propertyname text NOT NULL,
    timeinstant bigint NOT NULL,
    updatetimestamp timestamp with time zone DEFAULT now() NOT NULL,
    PRIMARY KEY (customerid, objecttype, objectid, propertyname, timeinstant, updatetimestamp),
    INDEX config_versions_by_version (updatetimestamp)
);

CREATE TABLE config_str (
    customerid integer NOT NULL,
    objecttype integer NOT NULL,
    objectid bigint NOT NULL,
    propertyname text NOT NULL,
    timeinstant bigint NOT NULL,
    updatetimestamp timestamp with time zone DEFAULT now(),
    value text,
    PRIMARY KEY (customerid, objecttype, objectid, propertyname, timeinstant)
);

CREATE TABLE config_blob (
    customerid integer NOT NULL,
    objecttype integer NOT NULL,
    objectid bigint NOT NULL,
    propertyname text NOT NULL,
    timeinstant bigint NOT NULL,
    updatetimestamp timestamp with time zone DEFAULT now(),
    value text,
    PRIMARY KEY (customerid, objecttype, objectid, propertyname, timeinstant)
);

The issue here is not that you’re not reliably using snapshot isolation. If you asked for TRANSACTION_READ_COMMITTED, you are using snapshot isolation.
The issue is that snapshot isolation does not prevent all types of transaction restarts / retriable errors; it only prevents some of them. The client still needs to be prepared to retry (but, if in your workload the frequency of these restarts is low enough, then perhaps you don’t need to handle them explicitly; you may rely on some high-level retries that your application may be doing anyway for all sorts of other errors to - including, for example, serving an error web page to some user asking her to press refresh :slight_smile: ). Conceptually, this state of affairs is the same with most, if not all, databases. What varies is just the frequency of restarts, and crdb is more retry-happy than most (but it’s also more consistent and “correct” than most!).

Now, in the particular case of the RETRY_DELETE_RANGE, I think things can be improved on our side. It turns out that, as the implementation currently stands, any DELETE query will essentially make a snapshot txn behave like a serializable one. That’s quite unfortunate and, I believe, not necessary. I filed https://github.com/cockroachdb/cockroach/issues/19876 to track it.

Hope this helps,

  • a_m