Hi!
I was investigating transaction contention issues we are having on one of our tables and I found that in some cases cost-based optimizer would make choices that reduce concurrency and make transactions to conflict heavily.
Here is an example. I start with the following table:
create table test(id string primary key, key string, value string);
create index idx_test on test(key);
insert into test(id, key, value) values
('1hello', '1hello', '1hello'),
('5hello', '5hello', '5hello'),
('9hello', '9hello', '9hello');
Then, if I run the following command immediately:
explain update test set value = '1hello' where key = '1hello';
I can see that it uses index just fine:
root@:26257/bend> explain update test set value = '1hello' where key = '1hello';
tree | field | description
+---------------------------+----------+-------------------------------+
count | |
└── update | |
│ | table | test
│ | set | value
│ | strategy | updater
└── render | |
└── index-join | |
│ | table | test@primary
└── scan | |
| table | test@idx_test
| spans | /"1hello"-/"1hello"/PrefixEnd
(11 rows)
Time: 784µs
However, after a while it (presumably) decides that table is small and doing full scan is better:
root@:26257/bend> explain update test set value = '1hello' where key = '1hello';
tree | field | description
+---------------------+----------+----------------+
count | |
└── update | |
│ | table | test
│ | set | value
│ | strategy | updater
└── render | |
└── scan | |
| table | test@primary
| spans | ALL
| filter | key = '1hello'
(10 rows)
Time: 2.132ms
Now, the issue is that if I run two transactions concurrently:
begin transaction;
update test set value = '1hello' where key = '1hello';
and
begin transaction;
update test set value = '9hello' where key = '9hello';
They conflict with each other (first, run begin transaction
statement in two SQL consoles, then run udpate
statements in both. Sometimes, one update
is “blocked”. In other cases, both updates are allowed to proceed, but last transaction to commit is rolled back:
pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): id=27d15def key=/Table/281/1/"9hello"/0 rw=true pri=0.00102576 stat=PENDING epo=0 ts=1568053031.112388000,1 orig=1568053029.328891000,0 max=1568053029.328891000,0 wto=false seq=2
This is very dangerous behavior for our system as one of our tables is used for internal synchronization. The table is small, but with frequent updates which should be independent. We would typically do a lot of work in other tables and then update this one small table.
However, choice of full scan instead of index scan makes these transaction to conflict with each other, “blocking” other transactions which should be independent and also rolling them back due to serialization error.
If I turn off the optimizer (set optimizer = 'off';
), things go as expected: CockroachDB chooses to do index scan, I get no conflicts.
I have not tried yet it on our production to see if that resolves some of the issues we were seeing, but that little experiment is very disturbing.
Two questions:
- Is there a way to force index use on an
update
statement? I triedupdate test@idx_test set value = '1hello' where key = '1hello';
, but it does not have any effect. - What else can we do? I’m currently considering turning off optimizer just for the queries that go to that “synchronization” table.