Question about SERIAL and its performance

Hi everyone,

several days ago i tested the performance of crdb,and find some critical problems for us(because our insert workload is auto increment):

1.Although the single random insert performance is fine for us(14k qps for 5 nodes),but auto increment single insert is actually not very satisfying for us(1.3k qps for 5 nodes,and dont scale with node numbers).
I know it is mainly caused by request piled up in raft procedure,but i test tidb ,its auto increment insert is better than crdb: 5k qps for 1 node,and scale to 12k qps for 5 node.
I guess there may be two main reasons for this result:

  • tidb is not strictly auto increment ,it seems each node of its cluster will ask for a range of numbers(e.g. node 1 has 1~10000, node 2 has 10001~40000),and each node auto increment in its own range.so each node will not cause much contention when auto increment insert.
  • tidb seems will batch several requests and send it all together to raft,maybe this feature increased the throughput of auto increment insert for tidb.

2.crdb dont support serial increment numbers(like 1,2,3,4…),but like i said above,tidb support it.So i want to know does crdb have a plan to support it too? Actually i think it is a really useful features for mysql user.

The above feature seems not very diffcult to realize(because it is not strictly auto increment which is diffcult for distributed system to realize),but will make crdb much friendly for us and mysql (really wishes crdb could be better and better):smile:.
Besides,is there any tricks to improve the single auto increment insert performance currently?

Hello @floatsliang

First of all please clarify, what feature do you call “auto increment”? How have you tested it?

CockroachDB currently supports two features to generate row IDs: the built-in function unique_rowid() and SQL sequences. Both have been optimized for performance, although SQL sequences are more restricted in scalability.

Have you tried both these solutions? Can you report what you found for each of them?

@floatsliang, the docs on sequences are here, they are supported: https://www.cockroachlabs.com/docs/stable/create-sequence.html

What transaction isolation level are you using in TiDB? It’s worth noting that TiDB supports (at best) repeatable read isolation. See their docs here: https://github.com/pingcap/docs/blob/master/sql/transaction-isolation.md. All transactions in CRDB are serializable, which definitely accounts for part of the difference, though we’re generally able to achieve performance parity with other DBs. I’m just pointing out it’s not an apples to apples comparison.

Assuming the incremented ID is a key, CRDB and TiDB behave identically when it comes to distributing those rows to nodes. All data in CRDB is ordered, so if you’re adding rows with an auto-incrementing key, you’ll be limited by the speed at which the transactions can complete, which should be limited by IO on the disk doing the writing.

Per @knz’s point above, more detail would be appreciated. If we could see the schema, and get more details about the tests you’ve run, it would be very helpful.

Hi,@tim-o @knz

Sorry , i forgot crdb support sequence,but yesterday i tested its insert performance in single node and find the qps is still about 1.5k .
For the test details , it is in the question i asked you before:Is this insert performance normal?. So all my tests is based on sysbench 1.0. In these tests i achieved auto increment in CRDB by using SERIAL.
And the schema for TiDB is:

CREATE TABLE `sbtest1`(
  `id` INTEGER NOT NULL AUTO_INCREMENT,
  `k` INTEGER DEFAULT '0' NOT NULL,
  `c` CHAR(120) DEFAULT '' NOT NULL,
  `pad` CHAR(60) DEFAULT '' NOT NULL,
  PRIMARY KEY (`id`),
  KEY k_1 (`k`),
)engine=Innodb,AUTO_INCREMENT=200010001;

For the isolation problem,i know TiDB is Repeatable-Read(but actually is a kind of SNAPSHOT isolation),but i also test the SNAPSHOT isolation(which will be deprecated in v2.1) in CRDB,and find the qps is still 1.3k.

So I want to know what caused the auto increment insert performance differences(it is really a big difference…) between CRDB and TiDB ,and if it is a isolation problem,do you have plan to make a compromise design in it for users who have a strong demand for auto increment insert?

Thanks a lot for your reply!

Thanks @floatsliang for clarifying.

So the question is why the performance with SERIAL is limited to 1.5kTPS.

I think it is likely you are encountering transaction contention as explained here: https://www.cockroachlabs.com/docs/stable/performance-best-practices-overview.html#understanding-and-avoiding-transaction-contention

This is because SERIAL uses unique_rowid() which causes new data from multiple concurrent clients to be inserted on the same ranges. See here for more details: https://www.cockroachlabs.com/docs/stable/sql-faqs.html#what-are-the-differences-between-uuid-sequences-and-unique_rowid

As you can see on this table, if a primary key uses unique_rowid() the maximum insert throughput is limited by the max CockroachDB performance on 1 node.

To obtain scalable inserts, the solution is a UUID column with DEFAULT uuid_v4().

Note that we may explore an extension to unique_rowid() in the future to do something more similar to TiDB. However this will likely not happen in the next release.

Does this help?

Hey @floatsliang, @knz’s point about transaction contention is true, we generally do recommend using UUIDs, but I don’t know that the serial ID column is the reason that you’re seeing such a drastic difference in performance. I’m a bit confused given that in the earlier thread you mentioned you were able to achieve 6000 TPS with auto-insert. We were able to achieve 11k TPS on a 3 node cluster. We must be testing differently here. What changed between the 6k TPS test and now? Can you share the sysbench command you’re using to execute the test?

Hi,@tim-o

Sorry for confusing you,i made a mistake in the former thread : it is not auto increment insert but a kind of random insert performance(i modified the test script in sysbench so that it will insert random primary key which bigger than table size,so if the range splitted into two ,the performance will be double).

And i wish i could use random UUID as primary key, but our current tables in MySQL is auto increment primary key,it would be a big cost if we change the schema .

But still thank you for you guys sincerely help!

Hey @floatsliang, I think there’s something about the test that’s probably limiting throughput on CRDB. We can’t replicate such low throughput (a test over the weekend showed >13k TPS on serial inserts). What’s the sysbench command you’re using to execute the test? Are there any other details you can tell us about the test setup?

Yes,of course

I’m using crdb v2.1-alpha-730 and sysbench 1.0.15 to do this test.I have 3 node(one node per machine) in cluster and one extra machine to perform as load balancer(using haproxy).
I run sysbench in the haproxy machine.

Here is my sysbench command:
./sysbench --db-driver=pgsql --pgsql-host= --pgsql-port=6257 --pgsql-user=root --pgsql-password= --time=120 --report-interval=1 --threads=160 /home/fee/sysbench/share/sysbench/oltp_insert.lua --auto-inc=on --tables=1 --table-size=10000000 run

Here is my schema for serial insert:
image

Here is my result:

2

And the cpu and ssd status:

And the configure file of haproxy:
image

Thank you for the extra information.

For clarity can you please also share the content of the script oltp_insert.lua and the meaning of the flag --auto-inc=on.

The real question for us is what are the SQL queries sent to the server – you have not included this yet.

Thank you in advance!

oh ,sorry i forgot it.:sweat_smile:

image

And --auto-inc is the sysbench.opt.auto_inc variable in oltp_insert.lua script, so with pgsql driver it would use SQL query:

INERT INTO sbtest.sbtest1 (k,c,pad) VALUES (%d,%s,%s);

And let the crdb to auto generate the primary key id.

Thank you!

@tim-o I think this is a combination of using a 2ndary index + co-locality of inserts for the primary key. This is different from our internal testing where we do not use secondary indexes and use uuid primary keys. I’ll let you investigate further.