Why sql shell load data from mysqldump is so slow?

Hi,

I try to load data from mysqldump using sql sub command, I found it’s so slow compare to mysql.

Why mysql load is take about few seconds, while CockroachDB sql sub command load need to take 12 hours?

The sql.file have total 193189 rows
The two command are on the same type of hardware ( CockroachDB with cluster of three nodes)

Hardware info:
Cpu: Intel® Xeon® CPU X5660 @ 2.80GHz
Mem: 128G
OS: CentOS6.4

CockroachDB version:
Build Tag: beta-20161013
Build Time: 2016/10/13 20:03:56
Platform: linux amd64
Go Version: go1.7.1
C Compiler: gcc 4.9.2

Mysql version:
mariadb-10.0.13-linux-x86_64

CockroachDB is running at node1,2,3 in a local area network ( sql execute on node1 )
Mysql is running at node3 ( sql execute on node3 )

mysql < sql.file
start at 2016-10-25_09:59:27
end at 2016-10-25_09:59:42

cockroach sql < sql.file
start dic_project_compute_fee_rate_20161024.sql at 2016-10-24_22:36:14
end dic_project_compute_fee_rate_20161024.sql at 2016-10-25_07:41:03

Here are more information for CockroachDB load:

For file: dic_project_compute_fee_rate_20161024.sql
line-number: statement -> rows count
49:INSERT INTO dic_project_compute_fee_rate -> 10926
50:INSERT INTO dic_project_compute_fee_rate -> 10610
51:INSERT INTO dic_project_compute_fee_rate -> 11052
52:INSERT INTO dic_project_compute_fee_rate -> 11014
53:INSERT INTO dic_project_compute_fee_rate -> 11006
54:INSERT INTO dic_project_compute_fee_rate -> 10835
55:INSERT INTO dic_project_compute_fee_rate -> 11008
56:INSERT INTO dic_project_compute_fee_rate -> 10953
57:INSERT INTO dic_project_compute_fee_rate -> 11014
58:INSERT INTO dic_project_compute_fee_rate -> 11070
59:INSERT INTO dic_project_compute_fee_rate -> 11013
60:INSERT INTO dic_project_compute_fee_rate -> 11003
61:INSERT INTO dic_project_compute_fee_rate -> 11013
62:INSERT INTO dic_project_compute_fee_rate -> 11007
63:INSERT INTO dic_project_compute_fee_rate -> 11015
64:INSERT INTO dic_project_compute_fee_rate -> 10677
65:INSERT INTO dic_project_compute_fee_rate -> 10655
66:INSERT INTO dic_project_compute_fee_rate -> 7318

CockroachDB sql sub command load outputs:
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
DROP TABLE
OK
OK
CREATE TABLE
OK
OK
INSERT 10926
INSERT 10610
INSERT 11052
INSERT 11014
INSERT 11006
INSERT 10835
INSERT 11008
INSERT 10953
INSERT 11014
INSERT 11070
INSERT 11013
INSERT 11003
INSERT 11013
INSERT 11007
INSERT 11015
INSERT 10677
INSERT 10655
INSERT 7318
OK
OK
OK
OK
OK
OK
OK
OK
OK
end dic_project_compute_fee_rate_20161024.sql at 2016-10-25_07:41:03

Some sql file sample here ( it’s been fixed some syntax issue, say comments and back-quote, etc. )

-- Table structure for table dic_project_fee_rate
--

DROP TABLE IF EXISTS dic_project_fee_rate;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE dic_project_fee_rate (
  project_code varchar(32) NOT NULL,
  rate_type char(1) NOT NULL,
  fee_rate decimal(18,2) NOT NULL,
  fee_min_limit decimal(18,2) ,
  create_datetime char(14) ,
  PRIMARY KEY (project_code,rate_type,fee_rate)
) ;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table dic_project_fee_rate
--

--LOCK TABLES dic_project_fee_rate WRITE;
/*!40000 ALTER TABLE dic_project_fee_rate DISABLE indexS */;
INSERT INTO dic_project_fee_rate VALUES ('20502400101     ','1',0.90,0.00,'20160815152008'),('20511180101     ','1',1.00,0.00,'20160815152008'),('21511010101     ','1',0.60,0.00,'20160815152008'),('21511010102     ','1',0.60,0.00,'20160815152008'),('22101120101     ','1',0.92,0.00,'20160815152008'),('22101120102     ','1',0.92,0.00,'20160815152008'),('23211030101     ','1',1.00,0.00,'20160815152008'),('23211040101     ','1',1.00,0.00,'20160815152008'),('30111010101     ','1',0.70,0.00,'20160815152008'),('30201010101     ','1',0.70,0.00,'20160815152008'),
... ( omitted )

@chinglin, we’re aware of this (see issue 5981) and are actively working on improvements.

@mjibson, can you add any details?

1 Like

We have some stuff we’ve been working on to address exactly this problem. It takes in a SQL file and attempts to produce very fast writes. We are still some weeks away from having it work exactly like that, but we’re fully aware of this issue and have a team of people working on it as their main duty.

1 Like

Thank you. Now I know the status of the issue.

Jesse early post (link here ) that have suggest:

insert lots of rows in a single statement. I’d recommend something in the 100-1000 range

If I understand the sentences correctly, It means to have about 100-1000 rows in a single insert statement.

Later on I may also try (to limit the single insert statement’s length):

mysqldump --net_buffer_length=4096
mysqldump --net_buffer_length=83000

And load again from it, To see if it have different results.

The less rows for a single insert statement do have improved speed ( from 12 hours to 4 minutes )

I tested two mysqldump --net_buffer_length parameter setting for 4096 and 83000 as follow

Total rows are the same, there are 193189 rows

dic_project_compute_fee_rate_20161026_83000.sql
10:05:15
10:55:08
about 872 rows per insert statement
all load takes 50 minutes

dic_project_compute_fee_rate_20161026_4096.sql
11:30:30
11:34:46
about 42 rows per insert statement
all load takes 4 minutes

1 Like

Thanks for your experiment! This can be a useful result to other users too :slight_smile:

I can provide fixsql.sh too, hope it can help for other users too.
( even though I don’t think my solution is good enough compare to your guys genius, I think it does solve the issue for me ).

Human write sql statement for Mysql, I think they are much the same for CockroachDB, but mysqldump’s sql file is more formal, so more difference.

Some are Mysql specific(say InnoDB), I wrote a command that can fix that little difference in order to succeed sql load execution for CockroachDB).

The sql semantics is not changed much, but remove the useless part ( or changed in someway ).

The sql intended function are the same ( table and index’s definition, and insert ) , but works for cockroach sql.

Here is my humble transformation command (there’s only one command, I wrapped it in fixsql.sh, so it’s easy to run with ./fixsql.sh sql.file ).

  sed -i -e 's/`//g' \
      -e 's/int(.*)/int/' \
      -e 's/ENGINE=InnoDB //' \
      -e 's/DEFAULT CHARSET=utf8//' \
      -e 's/DEFAULT NULL//' \
      -e 's/^LOCK/--LOCK/' \
      -e 's/^UNLOCK/--UNLOCK/' \
      -e 's/COMMENT=.*$/;/' \
      -e 's/COMMENT\ .*$/,/' \
      -e 's/int NOT NULL AUTO_INCREMENT/SERIAL NOT NULL/' \
      -e 's/AUTO_INCREMENT=.* //' \
      -e 's/COLLATE utf8_unicode_ci//' \
      -e 's/COLLATE=utf8_unicode_ci//' \
      -e '/PRIMARY/!s/KEY/index/' \
      -e 's/tinyint/SMALLINT/' \
      -e 's/int NOT NULL DEFAULT \x270\x27/int NOT NULL DEFAULT 0/' \
      -e 's/USING BTREE//' \
      -e 's/\x27\\\x27\x27/e\x27\\x27\x27/g' sql.file

It’s may also denote some difference between them.

(I wrote this for version beta-20161013, Since CockroachDB is rapidly iterated, things maybe changed.)

1 Like

Thanks for sharing your script! I have created a tech issue to follow up on this:
https://github.com/cockroachdb/cockroach/issues/10291

1 Like