Benchmark UPSERT

Hi,

im currently testing the UPSERT Statement.

The Cluster:

  • 4 Servers each 32 Cores, 128 GB ram, 2 SSDs
  • Each Server has one CockroachDB Instance with 2 Stores

The Data:

  • 200 sequential Transactions
  • Each Transaction on average 330 words
  • 23755 different words

Table:
CREATE TABLE words (
word STRING(255) NOT NULL,
frequency INT NULL,
CONSTRAINT “primary” PRIMARY KEY (word ASC),
FAMILY “primary” (word, frequency)
)

Transaction:
BEGIN;

SAVEPOINT cockroach_restart;

UPSERT INTO words (word, frequency) VALUES(?, ?);

RELEASE SAVEPOINT cockroach_restart;

COMMIT;

I have writen a Java App for this test. Executing all 200 Transaction takes over 5 minutes. I think this is very slow. Is there something wrong with my app or are this normal values for cockroach?

Hi @Enrico. That is much slower than I’d expect. If you can group your
query into a single upsert with multiple values (UPSERT INTO words VALUES (?, ?), (?, ?), ..., (?,?)) then it is likely to be much faster.

If that doesn’t help, we have some mechanisms for getting us the debug
data. One is SHOW TRACE FOR UPSERT INTO ..., which will show performance
debug info for one query. Another is to run SET CLUSTER SETTING sql.trace.txn.enable_threshold = '1s', which will print debug info in the
logs for every query slower than the threshold you specify. This slows down
every query, so don’t leave it on in production (you can remove it with
RESET CLUSTER SETTING sql.trace.txn.enable_threshold). You can also use
SHOW QUERIES to list the currently running queries and see if any of them
have been running for a long time.

Hi @dan

is the perfect tip. From over 5 minutes to under 30 seconds.

Thank you.

Now I use 1000 Transactions. With one connection the execution time is about 40 seconds. I think this is fast.

But if I use more than one connection the time increases enormously.

  • Number of parallel connections: execution time
  • 2: 50 s
  • 4: 8min to 39min
  • 8: 13min to 20min
  • 16: 10min to 33min

Why does the execution time grow so fast and why does it differ so much (8min vs 30 min)

Edit:

The Service Latency: SQL, 99th percentile shows often 10 seconds.

Hi,

I now used the tracing to log the queries. I looked at one trace of a transaction that took 11 seconds. Its an upsert with 646 different values.

The most of the time it waits for overlapping requests. For example:

212.237ms 8.547ms event:[n3,s5,r5903/1:/Table/33{4-5}] read-write path
212.542ms 0.305ms event:[n3,s5,r5903/1:/Table/33{4-5}] command queue
215.207ms 2.665ms event:[n3,s5,r5903/1:/Table/33{4-5}] waiting for 142 overlapping requests
282.763ms 67.556ms event:[n3,s5,r5903/1:/Table/33{4-5}] waited 67.551666ms for overlapping requests
285.246ms 2.483ms event:[n3,s5,r5903/1:/Table/33{4-5}] applied timestamp cache
298.022ms 12.776ms event:[n3,s5,r5903/1:/Table/33{4-5}] evaluated request
304.944ms 6.922ms event:[n3,s5,r5903/1:/Table/33{4-5}] replica.Send got error: conflicting intents on /Table/334/1/“xyz”/0
304.956ms 0.011ms event:[n3,s5] pushing 1 transaction(s)

It must wait often over 80ms. Is the {4-5} the range, where the key is in? Can it be faster, if I choose smaler range sizes?

In the logs are 48:

1597.617ms 1.815ms event:[n3,s5] 22ae3905-7fae-49f8-bb2e-7ff2a694ae85 is now ABORTED

Can anybody help me speed it up?

Can anyone help me?

Or is a scenario with many conflicts the wrong use case for cockroach?

@enricotal70 we’ll need more information to answer this question. Queries with many conflicts will be slow on any database because a database guarantees serializability. You could try turning on isolation level SNAPSHOT.