I am working on a masters project to reduce the abort rates in Cockroach DB under high contentions. I have briefed the idea of my work below. I would really appreciate any insights and discussions on this.
CockroachDB uses a fixed timestamp allocation scheme, and assigns timestamps at the start of the transaction, which are used as transaction’s commit timestamp. This commit timestamp is used to order the transactions in logical timestamp order, and hence enforce serializability. SSI, unlike SI, detects and handles read-write conflict but the restrictive fixed timestamp ordering leads to higher number of aborts under contention.
To reduce the number of aborts at high contention, we plan to employ dynamic timestamps, in contrast to fixed timestamps, for allocating the commit timestamp of a transaction. In this technique, system tries to dynamically allocate a timestamp range to the transaction, based on the conflicts on elements accessed in the transaction. At commit, it finds a timestamp for the transaction from that range, to fit that transaction on a logical serializable timestamp order. We plan to explore the dynamic timestamp allocation technique we designed in our previous work MaaT (paper attached to mail), which is an abbreviation of “Multiaccess as a Transaction" Adapting the dynamic timestamp allocation technique with the SSI implementation in CockroachDB would eliminate some of the aborts. Our technique targets cases where transactions seem non-serializable under fixed a timestamp ordering, while are serializable when timestamps are dynamically allocated to fit them in a serializable timestamp order.
Consider the following two transactions.
T1 : r1(y) r1(x)
T2 : r2(x) w2(x)
Suppose the execution of the two transactions is as follows. H1 : b2 r2(x) b1 r1(y) r1(x) c1 w2(x) c2
At w2(x), the SSI approach in Cockroach DB sees that x is read by T1, which is a transaction with later timestamp. To ensure timestamp ordering, with the fixed timestamp allocation, transaction T2 is aborted since it causes RW conflict with T1.
In case of using dynamic timestamping technique, suppose lowerbound(T1) and
lowerbound(T2) are the lower bounds and upperbound(T1) and upperbound(T2) are the upper bounds of the transactions T1 and T2 respectively. On detecting the RW conflict at w2, the lowerbound(T2) is made greater than upperbound(T1) so that history will be equivalent to the serialization order T1 −→ T2. Rather than aborting the transaction due to the initially allocated timestamp order, the dynamic timestamp allocation can re-order the logical transaction order, and lead to commitment of both the transactions in this case.
The implementation changes would include
1) introducing write soft locks and read soft locks on data item just to track the conflicting operations on the data item ( the existing write intents could be modified to work as soft write lock, but it should be changed to support multiple intents )
2) Introducing field in transaction record to hold list of transactions that have placed write soft lock when it tried reading and list of transactions placed read soft locks when it tried writing.
2) Modifying read timestamp cache to hold just committed reads.
2) Introduction of validation phase at end of transaction to decide on the commit timestamp based on all read and write locks on the data item.
Insights shared by Andrei Mate @andrei,
- As you’ve seen, CockroachDB’s model is quite different from MaaT’s. Changing it to fit would be a very big undertaking. Do not underestimate the complexity of our code. Things like introducing a validation phase before commit, at the very least, would be quite involved.
- supporting multiple “write intents” on a key is something that has been discussed a number of times before and never pursued because of complexity. Most recently, this conversation happened that you might want to skim through: https://github.com/cockroachdb/cockroach/issues/5861.
- One thing you should be aware of - - transactions that consist of a single batch of operations (so there’s no “client” logic) attempt execution as “1-phase-commit” (an atomic, non-transactional batch of commands). If that’s successful (no conflicts) then there’s no extra “commit” Raft command, no intents, etc. See replica.executeWriteBatch. This is a valuable optimization and it might affect your benchmarks.
Andrei Matei’s answers to my questions.
If you were to adapt MaaT to cockroach DB, what all changes would you have done in Cockroach DB. ( just a briefing on design and code)
Andrei : That’s a tough question :). I have no idea. CRDB’s transaction model is not my main area of work.
I’d start by thinking how MaaT needs to change for MVCC. Then, the idea of maintaining a window of time to which a txn can be pushed seems to be fundamental to MaaT, so I’d think about how that can be maintained (one thing to keep in mind is that, since we’re a general SQL database, there’s no delimited read/write/validation phases to our transactions, as there seemed to be in MaaT. However I couldn’t tell if the MaaT actually needs these phases to be distinct).
Which among those changes would you think could be a hick-up or would disturb the working of cockroachDB to much.
Andrei : Well, one thing that again comes to mind are the complications around allowing multiple write intents on a key (https://github.com/cockroachdb/cockroach/issues/5861). In MaaT these are the write locks.
Overall all how much time would you think it would take like newbies to cockroach DB like me do all those changes.
Andrei : Let’s put it this way - changing anything around the transaction model is an ambitious enterprise. The codebase is large and this is one of the more complex layers - subtleties everywhere. Changing the transaction model completely is not something that I would personally attempt, even though I work on this product and understand most of the code.
There might, however, be limited cases that can be improved with ideas from MaaT without replacing everything. Perhaps along the lines you were already thinking about. I would personally encourage anyone to start small when it comes to this particular layer and attempt more targeted code changes before deciding to rewrite too much.
If I have to place the validator ( MaaT’s ) in crdb. Where do you think will be appropriate? According to my understanding, it will be good at dist_sender, because it has access to transaction record and will have knowledge on all keys transaction has touched.
Andrei : Depending on what this “validator” does exactly, I think maybe you want to put assign this role to the replica that processes the commit request (i.e. the EndTransactionRequest). These requests are routed to the range where the txn record lives (which is the same as the range of the first key written to by the transaction). One of the replicas of that range will process this EndTransactionRequest, and it has (local) access to the txn record (see replica_command.go - evalEndTransaction).
Thanks Andrei for the inputs. It really helps.
I would appreciate if anyone can pitch in to share more opnions and insights on my approach. I will be eager to discuss and understand cockroach DB better.