Cockroach Write Performance

Cockroach Newbie. I deployed a one node cockroach database. No tuning whatsoever. The start command is:

cockroach start --insecure --host=0.0.0.0

Inserting data into the database takes in average 400 ms. I made about 10000 requests with about 10 requests per seconds.

I took the the very same test script and inserted the same data set into H2 (not embedded). The H2 is running on the very same node that cockroach is running on. The client machine is exactly the same. With H2, I get average of 7ms to write to the database.

The machine has 32 GB RAM and 4 CPU Cores. OS is Ubuntu.

I presume this is not right. Anything I should do to setup cockroach properly?

Hi @leon,

It’d be helpful to get a bit more detail before answering

  • What is the schema, and how many rows are in each insert?
  • Is h2 configured to sync all its changes to disk?

CRDB will write all changes to disk, so if H2 is running in memory mode, that would explain the difference in part.

Hi @tim-o ,

There is just one row per insert (in fact it is an upsert). For H2, I use a merge. You can find the schema at: https://snag.gy/DfYhwn.jpg

I use the out of the box H2 and the data is persisted. Start command is: ./bin/h2.sh -tcp -tcpAllowOthers. So H2 is not running in-memory.

Thanks leon. I realized I may have misunderstood originally, so to clarify: you’re seeing each individual upsert take 400ms? That’s definitely unusual. I rebuilt your table and with some dummy data I saw each individual upsert take several ms. It might be helpful to get a copy of the script if you don’t mind sharing it so we can try to reproduce on our own.

Yes, that’s 400 ms per individual upsert. I took the basic sample from the website and did some testing with it: https://www.cockroachlabs.com/docs/stable/build-a-java-app-with-cockroachdb.html.

I modified the insert section as follows:

        long total=0;
        for(int i=0; i<10000; i++){
        	long start=System.currentTimeMillis();
        	db.createStatement().execute("INSERT INTO accounts (id, balance) VALUES ("+i+", 1000)");
        	long end=System.currentTimeMillis();
        	total=total+(end-start);
        }
        System.out.println("Average: "+(total/10000.0)+"ms");

This gives me an average response time of 215 ms for a sample of 1000 requests and 269 ms for a sample of 10000 requests. I thought about possible network latency. But given H2 gives 7 ms, it is hard to fault the network.

I’m not able to reproduce this at all. I’m seeing the entire operation take about 7.5-8.5s, .7-.8ms per transaction. Here’s the entire script I’m running:

import psycopg2
import time
import uuid

conn = psycopg2.connect(user='root', host='localhost', port=26257)
conn.set_session(autocommit=True)

cur = conn.cursor()
cur.execute("CREATE DATABASE IF NOT EXISTS test")
cur.execute('''
DROP TABLE IF EXISTS test.transaction;
CREATE TABLE IF NOT EXISTS test.transaction (
  id STRING(255) PRIMARY KEY,
  address STRING(255),
  arrival_time BIGINT NOT NULL,
  bundle STRING(255),
  current_index BIGINT NOT NULL,
  hash STRING(255),
  last_index BIGINT NOT NULL,
  last_modified_date_millis BIGINT NOT NULL,
  tag STRING(255),
  timestamp BIGINT NOT NULL,
  value REAL NOT NULL,
  INDEX idx_transaction_address (id, address),
  INDEX idx_transaction_bundle (id, bundle)
)
''')
start = time.time()
print "Start time was {0}".format(start)
for i in range(1,10000):
    id = str(uuid.uuid4()) # For testing UUID performance
    #id = str(i) # For testing serial integers and testing for transaction contention.
    cur.execute('''
UPSERT INTO test.transaction
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)''',
        (id, 'abc', 123, 'abc', 123, 'abc', 123, 123, 'abc', 123, 123.456))
end= time.time()
print "End time was {0}. Total time was {1}. Average time was {2}".format(end, end - start, (end - start)/10000)

Could you try executing that and see if it delivers similar results for you? That’ll at least tell us whether the latency is caused by the application, or something about the node itself.

Here is what I got:

Remote machine (LAN): End time was 1529094886.11. Total time was 1827.13291001. Average time was 0.182713291001
Local: End time was 1529096570.08. Total time was 1373.38310289. Average time was 0.137338310289

I am going to do another run on EC2.

Thanks leon. Wanted to make sure you’re aware of our production checklist and deployment guide for AWS, since you mentioned this is your first test. Also, something I should have asked up front: what version are you running?

EC2 (m1.large with SSD): End time was 1529097541.67. Total time was 59.201392889. Average time was 0.0059201392889

EC2 (t2.large, no SSD): End time was 1529098499.23. Total time was 25.2586421967. Average time was 0.00252586421967

So, it seems the problem lies with my box. But I can’t say what it is. And I can’t say either why H2 has a better significantly much better performance on the same box.

Version is v2.0.2.

I posted the same question on: https://serverfault.com/questions/916933/ubuntu-server-performance-issues-with-cockroachdb. This is baffling.

@leon is your H2 database configured to fsync all writes to a write-ahead log?

It might be an interesting experiment to configure CockroachDB to not sync all log writes by running SET CLUSTER SETTING kv.raft_log.synchronize=false; (warning: we don’t recommend this in a production deployment).

@nathan Thanks for pitching in. I did not do any extra H2 configuration. I am not able to find how to even enable fsync for H2. But the following section of the documentation seems to imply that H2 is not synching after each commit:

By default, MySQL calls fsync for each commit. When using one of those methods, only around 60 write operations per second can be achieved, which is consistent with the RPM rate of the hard drive used. Unfortunately, even when calling FileDescriptor.sync() or FileChannel.force(), data is not always persisted to the hard drive, because most hard drives do not obey fsync(): see Your Hard Drive Lies to You. In Mac OS X, fsync does not flush hard drive buffers. See Bad fsync?. So the situation is confusing, and tests prove there is a problem.
Trying to flush hard drive buffers is hard, and if you do the performance is very bad. First you need to make sure that the hard drive actually flushes all buffers. Tests show that this can not be done in a reliable way. Then the maximum number of transactions is around 60 per second. Because of those reasons, the default behavior of H2 is to delay writing committed transactions.

Applying the configuration your provided gives me these results:
End time was 1529186381.76. Total time was 8.10462498665. Average time was 0.000810462498665

So, is the following take away accurate?

  1. Cockroach syncs after each commit
  2. H2 does not sync after each commit

If this is correct, I am still left wondering why the EC2 t2.large box would outperform my box given the disk read/write parameters of my disk seem to be better.

Yes, I think that take away is accurate.

It would be very interesting to see whether H2 exhibits similar write latency with syncing enabled. If you do find a way to enable that, I think it would be a valuable test.

It might be interesting to run some filesystem-level benchmarks like Sysbench’s fileio test suite against your box.

Here we go:

Command:

sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run

Results on My Box:

Operations performed: 17700 Read, 11800 Write, 37683 Other = 67183 Total
Read 276.56Mb Written 184.38Mb Total transferred 460.94Mb (1.5364Mb/sec)
98.33 Requests/sec executed

Test execution summary:
total time: 300.0038s
total number of events: 29500
total time taken by event execution: 157.4719
per-request statistics:
min: 0.00ms
avg: 5.34ms
max: 58.13ms
approx. 95 percentile: 18.23ms

Threads fairness:
events (avg/stddev): 29500.0000/0.00
execution time (avg/stddev): 157.4719/0.00

Results on EC2. t2.large with magnetic EBS

Operations performed: 45900 Read, 30600 Write, 97910 Other = 174410 Total
Read 717.19Mb Written 478.12Mb Total transferred 1.1673Gb (3.9844Mb/sec)
255.00 Requests/sec executed

Test execution summary:
total time: 300.0006s
total number of events: 76500
total time taken by event execution: 273.9510
per-request statistics:
min: 0.00ms
avg: 3.58ms
max: 627.72ms
approx. 95 percentile: 18.98ms

Threads fairness:
events (avg/stddev): 76500.0000/0.00
execution time (avg/stddev): 273.9510/0.00

The previous test I did on EC2 might have had an SSD EBS. I now changed it to magnetic. Below is the results of re-running the cockcroach script:

EC2

Start time was 1529345688.39
End time was 1529345713.91. Total time was 25.5243439674. Average time was 0.00255243439674

My Box:

Start time was 1529345453.66
End time was 1529346019.83. Total time was 566.170274973. Average time was 0.0566170274973

So, EC2 seems 20 times faster on cockroach inserts.

@leon Thanks for running those tests. It appears that EC2 is significantly faster on these tests than your box. I think that’s an area to explore further, as it’s going to have a significant negative effect on any database that provides ACID durability by syncing after each commit.