We’re evaluating CockRoachDb for our use case, and starting by UPSERTing all our existing data (~100M rows, 4KB each, 30 columns, primary-key+1-index) into a 71-node (i3.4xlarge 16-core 2-SSD) cluster.
SQL Byte Traffic graph says we’re dumping 150MB/sec cluster-wide, which is ok. Looking at random nodes, the CPU utilization (via “htop”) seems to float between 50% and 100%, and “iostat -dxm 2” says the 2 SSD drives are writing about 100MB/sec combined, on average (about 20% utilization fwiw).
Thinking about the 71-node cluster overall, that’s about 7100MB/sec of writes, which is a 47x write amplification. That seems high to me… can you help me understand whether I’m doing something wrong, or my mental model of how CockRoachDb works is wrong. I can understand a 3x write increase due to replication. And I can imagine the data needs to move from logs to .sst files a few times over the lifetime of a row, but I’m surprised all that adds up to 47x. Also, I’m ignoring compression, which I would think would give back a factor of 2x or so (the data is mostly English text and urls).
Similar question for CPU… These are 2.3GHz machines, and doing an (admittedly very crude) estimation, at 50% CPU utilization means that cluster-wide (71 nodes x 16 cores), there are about 1.3T clock cycles per second running code, or about 8700 cycles per SQL byte written, which seems a little high.
The rows being written have a pretty random distribution of primary keys (the key is a GUID effectively). If the answer is just that there is way more SSTable churn happening under this workload than I expected, would it make sense to try to group writes into batches that are likely close enough to end up in the same key range? Any other tips for getting large amounts of data into CockRoachDb as fast as possible?
Running v1.2-alpha.20171204 with default options.