High transaction contention due to bad choice by optimizer

Hi!

I was investigating transaction contention issues we are having on one of our tables and I found that in some cases cost-based optimizer would make choices that reduce concurrency and make transactions to conflict heavily.

Here is an example. I start with the following table:

create table test(id string primary key, key string, value string);
create index idx_test on test(key);
insert into test(id, key, value) values 
    ('1hello', '1hello', '1hello'),
    ('5hello', '5hello', '5hello'),
    ('9hello', '9hello', '9hello');

Then, if I run the following command immediately:

explain update test set value = '1hello' where key = '1hello';

I can see that it uses index just fine:

root@:26257/bend> explain update test set value = '1hello' where key = '1hello';
            tree            |  field   |          description           
+---------------------------+----------+-------------------------------+
  count                     |          |                                
   └── update               |          |                                
        │                   | table    | test                           
        │                   | set      | value                          
        │                   | strategy | updater                        
        └── render          |          |                                
             └── index-join |          |                                
                  │         | table    | test@primary                   
                  └── scan  |          |                                
                            | table    | test@idx_test                  
                            | spans    | /"1hello"-/"1hello"/PrefixEnd  
(11 rows)

Time: 784µs

However, after a while it (presumably) decides that table is small and doing full scan is better:

root@:26257/bend> explain update test set value = '1hello' where key = '1hello';
         tree         |  field   |  description    
+---------------------+----------+----------------+
  count               |          |                 
   └── update         |          |                 
        │             | table    | test            
        │             | set      | value           
        │             | strategy | updater         
        └── render    |          |                 
             └── scan |          |                 
                      | table    | test@primary    
                      | spans    | ALL             
                      | filter   | key = '1hello'  
(10 rows)

Time: 2.132ms

Now, the issue is that if I run two transactions concurrently:

begin transaction;
update test set value = '1hello' where key = '1hello';

and

begin transaction;
update test set value = '9hello' where key = '9hello';

They conflict with each other (first, run begin transaction statement in two SQL consoles, then run udpate statements in both. Sometimes, one update is “blocked”. In other cases, both updates are allowed to proceed, but last transaction to commit is rolled back:

pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): id=27d15def key=/Table/281/1/"9hello"/0 rw=true pri=0.00102576 stat=PENDING epo=0 ts=1568053031.112388000,1 orig=1568053029.328891000,0 max=1568053029.328891000,0 wto=false seq=2

This is very dangerous behavior for our system as one of our tables is used for internal synchronization. The table is small, but with frequent updates which should be independent. We would typically do a lot of work in other tables and then update this one small table.

However, choice of full scan instead of index scan makes these transaction to conflict with each other, “blocking” other transactions which should be independent and also rolling them back due to serialization error.

If I turn off the optimizer (set optimizer = 'off';), things go as expected: CockroachDB chooses to do index scan, I get no conflicts.

I have not tried yet it on our production to see if that resolves some of the issues we were seeing, but that little experiment is very disturbing.

Two questions:

  1. Is there a way to force index use on an update statement? I tried update test@idx_test set value = '1hello' where key = '1hello';, but it does not have any effect.
  2. What else can we do? I’m currently considering turning off optimizer just for the queries that go to that “synchronization” table.

Hey Ivan,

What version of CockroachDB are you running?

  1. Currently, we don’t support forcing an index on an update statement, we’re working on that feature!

  2. CockroachDB’s isolation level is serializable by default, so the transaction should still move forward even though it has been pushed forward and has been forced to roll back. The updates will be applied to the table as if they were independent. If turning the optimizer off works for that particular use case then you can try that as well.

Does that help?

Best,
Matt

  1. Is where a github issue I can watch?

  2. Yeah, I’m going to try switching off optimizer for that one particular update. Not using index for that query is really bad for concurrency in our case.

Here you go: https://github.com/cockroachdb/cockroach/issues/38799

1 Like