Transactional test: threads stuck on SELECTs


#1

Hello,

I am testing some transactional multi-threaded workload connecting from Python to CockroachDB.
My script is here

By some reason what I see is all threads getting stuck on

cur.execute(“SELECT SUM(val) av FROM __c”)

At this time I am completely stuck and do not know how to resolve this.

Any advice is appreciated.


(Ben Darnell) #2

Hi Vadim,

Thank you very much for your report, and I apologize for the delayed response. It looks like you’ve uncovered an interesting bug in our handling of deadlocking transactions. This code changed significantly in the beta-20170309 release, but I think the crucial part of the bug was the same before and after the change.

The issue occurs when transactions perform writes before reads. Each transaction is able to lay down intents for its initial write, which means that it must either commit or be aborted before any other transaction will be able to read that value. The problem is that if the only source of conflicting queries are reads (the writes in this workload never conflict with each other), we never attempt to abort the transaction. The faulty code is here, where we decide whether or not to abort the conflicting transaction based on whether the current operation is a read or a write. Instead, we should abort for reads as well, if they are part of a writing transaction.

Prior to beta-20170309, we aborted transactions based on their priority, so a few threads would get through while others would get stuck in a backoff loop. beta-20170309 made the process deterministic rather than priority-based, so all of the threads would get stuck (and it also removed the backoff from the loop, so CPU usage on the server would go to 100%)

We’ll work on getting this fixed for the next beta release.


#3

Ben,

Thank you for the response.
I will wait for the next beta release!


(spencer) #4

Vadim, the proximate problem is in your test code, where you mistakenly use the postgres connection to commit or rollback the active transaction, instead of the cursor where the transaction is actually being run (https://github.com/vadimtk/trx-c/blob/master/write-skew-crdb.py#L61-L66).

When I made that change I was able to let the test run, with the exception of your code not currently handling transaction retry errors, which you must handle when using CockroachDB (see https://www.cockroachlabs.com/blog/building-application-cockroachdb-sqlalchemy-2/) for Ben’s helpful article on using CockroachDB from SQLAlchemy. We provide a means to automatically retry transactions.