CockroachDB insert performance expectations

I am evaluating CockroachDB for my next project and I really love the whole concept. Being able to run symmetric self-balancing nodes and having SQL transactions with Postgres compatibility, is really a holy grail of relational databases.

So I tried to evaluate the performance of a cockroach cluster and this is where I got stuck. Here is what I did:

Cockroach Cluster

Created a 3 node CockroachDB cluster on digital ocean

Configuration of each node:

  • 2 CPU
  • 4GB Memory
  • 80 GB storage
  • Debian 10
  • Cockroach version: 20.2.6

Started the nodes using these commands:

cockroach start \
--insecure \
--store=node \
--locality=zone=blr \
--advertise-addr=10.122.0.3 \
--join=10.122.0.3:26257,10.122.0.4:26257,10.122.0.5:26257 \
--background

HAProxy

Created a node and hosted HA proxy to distribute the connections. Configuration of node:

  • 2 CPU
  • 4GB Memory
  • 80 GB
  • Debian 10

Postgres

Created a node for Postgres with exact same configuration.

  • 2 CPU
  • 4GB Memory
  • 80 GB
  • Debian 10
  • Postgres version: 13.2

Load Generation

Created another node to run benchmark app. Configuration:

  • 4 CPU
  • 8GB Memory
  • 160 GB

Used sysbench (version 1.0.20) to generate the load.

Results

Postgres

Command

sysbench --db-driver=pgsql --pgsql-host=10.122.0.2 --pgsql-port=5432 --pgsql-db=postgres --pgsql-user=postgres --pgsql-password=xxx --time=120 --report-interval=1 --threads=32 /usr/share/sysbench/oltp_insert.lua --auto-inc=off --tables=1 --table-size=10000000 run

Output

[ 115s ] thds: 32 tps: 8186.12 qps: 8185.12 (r/w/o: 0.00/8185.12/0.00) lat (ms,95%): 6.09 err/s: 0.00 reconn/s: 0.00
[ 116s ] thds: 32 tps: 8474.54 qps: 8474.54 (r/w/o: 0.00/8474.54/0.00) lat (ms,95%): 6.21 err/s: 0.00 reconn/s: 0.00
[ 117s ] thds: 32 tps: 9092.22 qps: 9092.22 (r/w/o: 0.00/9092.22/0.00) lat (ms,95%): 5.67 err/s: 0.00 reconn/s: 0.00
[ 118s ] thds: 32 tps: 9062.07 qps: 9062.07 (r/w/o: 0.00/9062.07/0.00) lat (ms,95%): 5.99 err/s: 0.00 reconn/s: 0.00
[ 119s ] thds: 32 tps: 9463.94 qps: 9463.94 (r/w/o: 0.00/9463.94/0.00) lat (ms,95%): 5.09 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 32 tps: 8748.85 qps: 8748.85 (r/w/o: 0.00/8748.85/0.00) lat (ms,95%): 5.77 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            0
        write:                           1137820
        other:                           0
        total:                           1137820
    transactions:                        1137820 (9476.58 per sec.)
    queries:                             1137820 (9476.58 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          120.0645s
    total number of events:              1137820

Latency (ms):
         min:                                    0.73
         avg:                                    3.37
         max:                                   70.30
         95th percentile:                        5.47
         sum:                              3836109.61

Threads fairness:
    events (avg/stddev):           35556.8750/269.73
    execution time (avg/stddev):   119.8784/0.02

Performance: 9500 inserts/second

Cockroach

Command

sysbench --db-driver=pgsql --pgsql-host=10.122.0.2 --pgsql-port=26257 --pgsql-db=defaultdb --pgsql-user=root --pgsql-password=xxx --time=120 --report-interval=1 --threads=32 /usr/share/sysbench/oltp_insert.lua --auto-inc=off --tables=1 --table-size=10000000 run

Output

[ 116s ] thds: 32 tps: 962.04 qps: 962.04 (r/w/o: 0.00/962.04/0.00) lat (ms,95%): 77.19 err/s: 0.00 reconn/s: 0.00
[ 117s ] thds: 32 tps: 775.88 qps: 775.88 (r/w/o: 0.00/775.88/0.00) lat (ms,95%): 86.00 err/s: 0.00 reconn/s: 0.00
[ 118s ] thds: 32 tps: 692.06 qps: 692.06 (r/w/o: 0.00/692.06/0.00) lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00
[ 119s ] thds: 32 tps: 700.75 qps: 700.75 (r/w/o: 0.00/700.75/0.00) lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 32 tps: 561.19 qps: 561.19 (r/w/o: 0.00/561.19/0.00) lat (ms,95%): 104.84 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            0
        write:                           102933
        other:                           0
        total:                           102933
    transactions:                        102933 (857.34 per sec.)
    queries:                             102933 (857.34 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          120.0588s
    total number of events:              102933

Latency (ms):
         min:                                    2.93
         avg:                                   37.32
         max:                                 1036.31
         95th percentile:                       82.96
         sum:                              3840974.33

Threads fairness:
    events (avg/stddev):           3216.6562/100.22
    execution time (avg/stddev):   120.0304/0.01

Performance: 857 inserts/second

Question

So keeping everything else the same, Postgres is about 10x faster than Cockroach for inserts. I fully understand that Postgres is a single node database and Cockroach is multi-node, scalable database. So this comparison is not entirely fair. But a difference of 10x seems a bit too much. I just wanted to confirm whether this is expected? To get the same performance as one Postgres node, do I need to deploy a cluster of 30 cockroach servers?

One problem here may be that we aren’t getting much parallelism internally. Our load-based split threshold defaults to 2500 qps but that is likely too low for these small servers. Consider trying to set it to something like 300 just to see if it helps (SET CLUSTER SETTING kv.range_split.load_qps_threshold = 300).

Another thing with cockroach which I suspect is a big deal in this test is that cockroach has a reasonably high amount of background work that it has to do. When there are just 2 vCPUs, I suspect that that overhead comes at a greater cost.

One thing which might help diagnose problems would be to check out the hardware dashboard during the test. What resource is being squeezed? Is it CPU? If so, consider grabbing a CPU profile and sharing it here.

And here are the hardware dashboard snapshots during the test:

I executed this command on cockroach cluster:

SET CLUSTER SETTING kv.range_split.load_qps_threshold = 300

And again ran the insert test (using the same command as in original question). Here are the results:

SQL statistics:
    queries performed:
        read:                            0
        write:                           81863
        other:                           0
        total:                           81863
    transactions:                        81863  (681.91 per sec.)
    queries:                             81863  (681.91 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          120.0473s
    total number of events:              81863

Latency (ms):
         min:                                    5.10
         avg:                                   46.92
         max:                                  310.31
         95th percentile:                      102.97
         sum:                              3840606.43

Threads fairness:
    events (avg/stddev):           2558.2188/270.46
    execution time (avg/stddev):   120.0190/0.01

In summary, I cannot see any improvements in the insert performance.

It is worth repeating again: I fully understand and accept that cockroach insert performance wouldn’t match Postgres considering there is so much more that is handled by cockroach behind the scenes. My only question is that how much the performance hit would be? I am getting 1/10th of Postgres. Is it expected? If not, how much is the norm?