Approaches for synchronizing client activity without race conditions

I’m developing a service in which users can add and remove content which exists as DB metadata and files-on-disk for the data. If two users add the same content the data file is shared. When adding content I don’t care about race conditions because adding the same data file twice is not a big problem.

However, when deleting content there is a chance that another user will add the same content at the same time and cause a “hole” in the on-disk data:

Thread1: DB row deleted
Thread2: DB row created (since no db rows matched the user must upload data)
Thread2: file data stored
Thread1: file data deleted (uh, oh - missing data)

I hope this simplified example makes sense.

I could solve this problem using a global lock around adding and deleting but this would create a fairly serious lock contention problem when many users are adding and removing many things. CockroachDB only supports transaction serialization errors to flag conflicts like this, but I can’t rollback the deletion of a file on disk. Is there a common pattern to deal with this type of problem?

Make the block ID a unique indexed column (eg a primary key), and next to it make a reference count column.

Whenever a new copy of the block is made increment the reference count.
When a copy is deleted decrement the reference count.

During a deletion if the reference count goes to 0 (or was 1 to start with) you also delete the file (after the SQL transaction has committed successfully)

Does this help?

Thanks for the reply Raphael,

The problem occurs if the transaction has completed successfully and the chunk is then deleted, but in between those two steps a new reference is created to that block and the file is re-uploaded. In this case the delete will remove the on-disk data but the db will think it exists. There is a very small chance that it will happen, but this system is dealing with millions of blocks being created and destroyed every day and at some point it will happen.

With PostgreSQL I was able to lock rows during a transaction so I could do client work while the lock was held and prevent any other transactions from re-referencing a newly deleted chunk while the file delete was happening. I initially had hoped that the transaction priorities in CDB would let me set the deletion transactions to high priority so they would always boot the low priority insertion transactions, but that’s not really a reliable way to do it because if the insertion transaction starts first it can still cancel a higher priority deletion transaction that starts slightly later.

With CDB I need to come up with a lockless way to do it. The solution probably involves implementing some kind of test-and-set on each db row that is a candidate for deletion. Is it possible to add a condition to a SQL expression inside a transaction that causes it to fail if the value of a column is set to a specific value?

Then you can generate a random salt every time you upload a new block that did not exist, and add that salt to both the SQL record and the file name.

This way it becomes impossible to upload “the same file” two times - if the same content is re-uploaded at the critical time you explained to us, it would get a new random salt and would not be deleted by the concurrent txn.

Would this help?

Unfortunately the system relies on deduplication to function efficiently so we can’t use a salt because it means each copy of the same data would get a unique “name” and therefore prevent deduplication - otherwise that’s a great solution.

I’ve just completed a working version which uses a test-and-set approach. The deletion code finds unreferenced data and marks it as “in progress” with a dedicated column inside a Tx. Then, the code is free to delete the data file and remove the row. The code that creates new data first checks if any pending deletes are in progress within the insert transaction and fails if any are marked as in progress - which triggers a retry. If none are in progress then the row can be inserted and data can be uploaded.

This approach works with any DB which is nice. The next step is to go through the code and reduce the chances of Tx collisions as much as possible since they all tend to be hitting the same tables at the moment and probably cause “restart transaction” errors more frequently than necessary.

Oh that is a nice idea indeed. Thanks for sharing it.