Transaction deadlocks in CRDB

I see that CRDB tarnsactions are sometimes described as lockless, but also the documentation mentions about distributed deadlock detection. Could you please elaborate how CRDB protects transactions from deadlocks and what is meant by distributed deadlock detection for CRDB.

Hello,

Each transaction guarantees ACID semantics spanning arbitrary tables and rows, even when data is distributed. If a transaction succeeds, all mutations are applied together with virtual simultaneity.

If any part of a transaction fails, the entire transaction is aborted, and the database is left unchanged. CockroachDB guarantees that while a transaction is pending, it is isolated from other concurrent transactions with serializable isolation.

CockroachDB’s concurrency protocol has moved from optimistic to pessimistic. What that means in practice is that contending transactions rarely restart each other anymore. Instead, contending transactions queue up to make modifications, meaning that transaction intents act a lot more like traditional locks. This approach is more complicated because it requires a distributed deadlock detection algorithm which you can read more about here.

Let me know if you have any other questions.

Thanks,
Matt

Hi Matt,

CockroachDB’s concurrency protocol has moved from optimistic to pessimistic

Thank you for pointing out that bit of history. I did not know.

But it still contradicts with a docs page I provided earlier [1]. It describes a concurrency control as optimistic:

Optimistic concurrency with distributed deadlock detection

Also it would be great to learn more deadlock detection. A page [2] describes it very briefly and as understood only for a single node case. And as I know to detect a distributed deadlock it might be needed to ask several nodes. Could please point out an additional documentation about a deadlock detection? Is some kind of well-known algorithm is used for it?

Hi @pavlukhin. The original post you’re linking to is from 2016, so it looks like that blog post has gone stale. I’ll mark it as something we need to update.

As for deadlock detection, we use a data structure to track which transactions are blocked, as well as the transactions they’re blocked by: TxnWaitQueue. To actually detect the deadlocks, each node with blocked transactions polls the other nodes on which it’s blocked to see if there’s a deadlock.

Hi @Sean. Thank you for an explanation!

@Sean and @mattvardi ,
Is the TxnWaitQueue exposed in the web gui or in an internal view so it help me monitor for application contention issues, etc.?
Thanks
Allen

Hey @aherndon – we have a custom timeseries chart builder you can use to check out a lot of detail about the TxnWaitQueue. Here’s a quick example:

<YOUR_ADMIN_UI_URL>/#/debug/chart?charts=%5B%7B"metrics"%3A%5B%7B"downsampler"%3A3%2C"aggregator"%3A3%2C"derivative"%3A0%2C"perNode"%3Afalse%2C"source"%3A""%2C"metric"%3A"cr.store.txnwaitqueue.deadlocks_total"%7D%5D%2C"axisUnits"%3A0%7D%5D"

In the Metric Name box, you can start typing in TxnWaitQueue to see all of the individual metrics it publishes.

@sean,
Thanks! interesting. what is the definition of a pusher and a pushee?
and is there a way to save these custom charts?
Allen

A pusher is a leaseholder that moves another transaction into its TxnWaitQueue; the transaction that gets moved is the pushee.

Unfortunately no way to save the charts atm. However, I have a PR open to add a bunch of predefined charts that you’d be able to load more easily.