[import data performance] import data into cockroachdb too slow

I use benchmarksql5.0 generate 100 warehouses in postgresql,
then use pg_dump to dump the data, then use psql import data into cockroachdb.

But i found import data into cockroachdb too slow!!!

[root@a1 run]# time psql -p 26257 -h a1 -d benchmarksql -U root< 100ware-cock.sql
COPY 4
COPY 100
COPY 1000
COPY 3000000
COPY 3000000
COPY 100000
COPY 3000000
COPY 900000
COPY 10000000
COPY 29994046

real	151m50.190s
user	0m11.515s
sys	0m4.673s

I deploy 3 nodes in three physical PC in one server room.

cockroach Version:

Build Tag: v1.0.6
Build Time: 2017/09/14 15:15:48
Distribution: CCL
Platform: linux amd64
Go Version: go1.8.3
C Compiler: gcc 6.3.0
Build SHA-1: 8edb4c9235a7bb8aa80761375db12bb4a7e5afeb
Build Type: release

I use the default settings like below:

+---------------------------------------------------+---------------+------+-----------------------------------------------------------------------------------------------------------------------------------------------+
|                       name                        | current_value | type |                                                                  description                                                                  |
+---------------------------------------------------+---------------+------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| diagnostics.reporting.enabled                     | true          | b    | enable reporting diagnostic metrics to cockroach labs                                                                                         |
| diagnostics.reporting.interval                    | 1h0m0s        | d    | interval at which diagnostics data should be reported                                                                                         |
| diagnostics.reporting.report_metrics              | true          | b    | enable collection and reporting diagnostic metrics to cockroach labs                                                                          |
| diagnostics.reporting.send_crash_reports          | true          | b    | send crash and panic reports                                                                                                                  |
| kv.allocator.lease_rebalancing_aggressiveness     | 1E+00         | f    | set greater than 1.0 to rebalance leases toward load more aggressively, or between 0 and 1.0 to be more conservative about rebalancing leases |
| kv.allocator.load_based_lease_rebalancing.enabled | true          | b    | set to enable rebalancing of range leases based on load and latency                                                                           |
| kv.gc.batch_size                                  |        100000 | i    | maximum number of keys in a batch for MVCC garbage collection                                                                                 |
| kv.raft.command.max_size                          | 64 MiB        | z    | maximum size of a raft command                                                                                                                |
| kv.raft_log.synchronize                           | true          | b    | set to true to synchronize on Raft log writes to persistent storage                                                                           |
| kv.snapshot_rebalance.max_rate                    | 2.0 MiB       | z    | the rate limit (bytes/sec) to use for rebalance snapshots                                                                                     |
| kv.snapshot_recovery.max_rate                     | 8.0 MiB       | z    | the rate limit (bytes/sec) to use for recovery snapshots                                                                                      |
| kv.transaction.max_intents                        |        100000 | i    | maximum number of write intents allowed for a KV transaction                                                                                  |
| server.declined_reservation_timeout               | 1s            | d    | the amount of time to consider the store throttled for up-replication after a reservation was declined                                        |
| server.failed_reservation_timeout                 | 5s            | d    | the amount of time to consider the store throttled for up-replication after a failed reservation call                                         |
| server.remote_debugging.mode                      | local         | s    | set to enable remote debugging, localhost-only or disable (any, local, off)                                                                   |
| server.time_until_store_dead                      | 5m0s          | d    | the time after which if there is no new gossiped information about a store, it is considered dead                                             |
| sql.defaults.distsql                              |             1 | e    | Default distributed SQL execution mode [off = 0, auto = 1, on = 2]                                                                            |
| sql.metrics.statement_details.dump_to_logs        | false         | b    | dump collected statement statistics to node logs when periodically cleared                                                                    |
| sql.metrics.statement_details.enabled             | true          | b    | collect per-statement query statistics                                                                                                        |
| sql.metrics.statement_details.threshold           | 0s            | d    | minmum execution time to cause statics to be collected                                                                                        |
| sql.trace.log_statement_execute                   | false         | b    | set to true to enable logging of executed statements                                                                                          |
| sql.trace.session_eventlog.enabled                | false         | b    | set to true to enable session tracing                                                                                                         |
| sql.trace.txn.enable_threshold                    | 0s            | d    | duration beyond which all transactions are traced (set to 0 to disable)                                                                       |
+---------------------------------------------------+---------------+------+-----------------------------------------------------------------------------------------------------------------------------------------------+

SQL file size

6.8G Oct 12 14:00 100ware-cock.sql

iostat on node1

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     3.00    0.00 1351.00     0.00 109411.50   161.97     1.76    1.30    0.00    1.30   0.04   5.30
dm-0              0.00     0.00    0.00 1354.00     0.00 109411.50   161.61     1.77    1.31    0.00    1.31   0.04   5.20
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00 1112.00     0.00 57995.00   104.31     0.57    0.49    0.00    0.49   0.06   6.70
dm-0              0.00     0.00    0.00 1117.00     0.00 57995.00   103.84     0.55    0.49    0.00    0.49   0.06   7.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.00    0.00  928.00     0.00 21573.50    46.49     0.09    0.10    0.00    0.10   0.03   3.00
dm-0              0.00     0.00    0.00  929.00     0.00 21573.50    46.44     0.09    0.10    0.00    0.10   0.03   3.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

vmstat on node1

[root@a1 run]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0 324420 523800     84 73691968    0    0     1     7    1    0  0  0 100  0  0
 1  0 324420 524972     84 73689160    0    0     0  9227 15628 23318  4  0 95  0  0
 1  0 324420 525000     84 73688632    0    0     8  8362 20180 31127  4  0 95  0  0
 3  0 324420 522212     84 73691704    0    0     0 10265 20996 30985  6  0 93  0  0
 0  0 324420 525268     84 73688760    0    0     0 11935 19104 29105  5  0 95  0  0
 0  0 324420 520768     84 73693872    0    0     0  6206 19787 30572  4  0 96  0  0

It seems that disk is not so busy.
I want to know how to make crdb max performance?

Upgrading to the 1.1 version released will improve performance. We are currently working on improving performance over many aspects of the system, so things will continue to get faster.

upgrade to v1.1.0, import data get faster than v1.0.6, but still much slow:

[root@a1 run]# time psql -p 26257 -h a1 -d benchmarksql -U root< 100ware-cock.sql

real    133m37.685s
user    0m10.923s
sys     0m4.174s

from 151m50.190s to 133m37.685s, just import 6.8G csv datafile.

You’re in luck! We have a new command that is specifically for importing CSV data. Can you try https://www.cockroachlabs.com/docs/stable/import.html and report your results? (Note that you can also see its progress in the jobs tab on the admin UI.)

1 Like

thx, i will try it, then report the result!

the import seems good, load 100warehouse from 132min to 12min

1 Like

Awesome. That’s what we here hoping to hear.

@louis let us know if you have any suggestions on how we can make the user experience better with import csv, but glad to hear that you are getting a good performance boost!

Two suggestions:

  1. support import data to table which has already created.
  2. support import data with local file instead of http file server or cloud storage.

@louis thanks for the feedback! I opened an issue for importing into an existing table here: https://github.com/cockroachdb/cockroach/issues/19432 It is a big complex in terms of implementation, so we won’t get to it until after 1.2.

I believe you can import from a local file, although it’s not really suggested. https://www.cockroachlabs.com/docs/v1.1/import.html. Perhaps you meant that you would prefer not to have to set up cloud storage for temp storage, or was this what you were looking for?