Sysbench Select performance test

I have done the sysbench test on CockroachDB v1.1.2. And I did the test 2 times. Firstly, I used only one machine to do the test.Secondly, I use two machines of the cluster do the same test. Then I found that there maybe a problem about the result of my test. The indicator query per second is always around 11000. The first time is 10926, and the second on two machines totally (5544 add 5693) is 11237. I have checked the use of disks, networks, memory and cpu during the test, all of these did not reach the bottleneck. So I want to know the reason why the indicator queries per second can not improve. Which limits the performance of the indicator?
Now I will give the detail information of the test.

testing environment

OS CentOS 7
CPU 64 vCPUs, Intel® Xeon® CPU E7-4820 v2 @ 2.00GHz
RAM 256G

node start command:

/usr/local/bin/cockroach start --insecure --host= --store=/ssd2/cockroach-data --join=,, --cache=25% --max-sql-memory=25% --http-port=2333 &

The first test

Cluster Monitoring


disk and cpu useage


the secnod test:
Cluster Monitoring

disk and cpu useage
-------the machine

--------the machine

the result:
the machine

the machine

1 Like


In both cases, you used the same four node cluster, correct?

The results are very puzzling, especially the fact that there is almost no CPU usage on the second node (which definitely shouldn’t be the case if the node is receiving a lot of queries). The service latency graphs also seem to suggest that the first node is getting all the queries… Can you post the command lines for sysbench in the second test?

thanks for posting this benchmark … and you found a bug . (or so it appears)


Thanks. I used the same four node cluster. The details of the cluster are described in the pictures below . And during the sysbench test the Dead Node which ID is 1 is live node.

the cluster information

I run a bash file to do the sysbench test.
the command lines for sysbench on machine in the second test

the command lines for sysbench on machine in the second test

Both of those command lines use the same host…

These two tests were designed to do the pressure test of host So the second test I use two machines, to do the pressure test of host I think testing on two machines can distribute the performance impact caused by run the sysbench test only on the machine 101 to two machines. But the result I have given in the picture shows that it doesn’t improve more.

I don’t expect the benchmark itself to take up a lot of resources; it probably doesn’t matter if you run it twice from the same host, or on two different machines.

In both cases, all queries go to .101, which is running all the queries and is the bottleneck. If it can only sustain around 11000 qps, it’s not surprising that running the test twice doesn’t give you more total throughput.

If you are asking why we are hitting a limit even though CPU or disk usage isn’t 100%, it is a fair question. How many ranges are used by the sysbench data? It should be a small factor times the number of vCPUs if we are to use all of them.

The ranges szie used by the sysbench data is 1814.

I see. Can you get a profile from the .101 cockroach host while the second benchmark is running? You can go to http://<host>:8080/debug/pprof/profile?debug=1 and wait for a bit.

This is the profile I get from the .101 cockroach host. I add a .pdf suffix to support the upload function.profile.pdf (228.5 KB)