Sysbench Select performance test

I have done the sysbench test on CockroachDB v1.1.2. And I did the test 2 times. Firstly, I used only one machine to do the test.Secondly, I use two machines of the cluster do the same test. Then I found that there maybe a problem about the result of my test. The indicator query per second is always around 11000. The first time is 10926, and the second on two machines totally (5544 add 5693) is 11237. I have checked the use of disks, networks, memory and cpu during the test, all of these did not reach the bottleneck. So I want to know the reason why the indicator queries per second can not improve. Which limits the performance of the indicator?
Now I will give the detail information of the test.

testing environment

OS CentOS 7
CPU 64 vCPUs, Intel® Xeon® CPU E7-4820 v2 @ 2.00GHz
RAM 256G
DISK 275G SSD

node start command:

/usr/local/bin/cockroach start --insecure --host=172.16.50.103 --store=/ssd2/cockroach-data --join=172.16.50.101:26257,172.16.50.102:26257,172.16.50.103:26257 --cache=25% --max-sql-memory=25% --http-port=2333 &

The first test

Cluster Monitoring

32_single_sqlquery

disk and cpu useage

result:

the secnod test:
Cluster Monitoring

disk and cpu useage
-------the machine 172.16.50.101:

--------the machine 172.16.50.102:

the result:
the machine 172.16.50.101
32_merge_result_101

the machine 172.16.50.102
32_merge_result_102

1 Like

Hi,

In both cases, you used the same four node cluster, correct?

The results are very puzzling, especially the fact that there is almost no CPU usage on the second node (which definitely shouldn’t be the case if the node is receiving a lot of queries). The service latency graphs also seem to suggest that the first node is getting all the queries… Can you post the command lines for sysbench in the second test?

thanks for posting this benchmark … and you found a bug . (or so it appears)

Hi,

Thanks. I used the same four node cluster. The details of the cluster are described in the pictures below . And during the sysbench test the Dead Node which ID is 1 is live node.

the cluster information

I run a bash file to do the sysbench test.
the command lines for sysbench on machine 172.16.50.101 in the second test

the command lines for sysbench on machine 172.16.50.102 in the second test

Both of those command lines use the same 172.16.50.101 host…

These two tests were designed to do the pressure test of host 172.16.50.101. So the second test I use two machines 172.16.50.101, 172.16.50.102 to do the pressure test of host 172.16.50.101. I think testing on two machines can distribute the performance impact caused by run the sysbench test only on the machine 101 to two machines. But the result I have given in the picture shows that it doesn’t improve more.

I don’t expect the benchmark itself to take up a lot of resources; it probably doesn’t matter if you run it twice from the same host, or on two different machines.

In both cases, all queries go to .101, which is running all the queries and is the bottleneck. If it can only sustain around 11000 qps, it’s not surprising that running the test twice doesn’t give you more total throughput.

If you are asking why we are hitting a limit even though CPU or disk usage isn’t 100%, it is a fair question. How many ranges are used by the sysbench data? It should be a small factor times the number of vCPUs if we are to use all of them.

The ranges szie used by the sysbench data is 1814.

I see. Can you get a profile from the .101 cockroach host while the second benchmark is running? You can go to http://<host>:8080/debug/pprof/profile?debug=1 and wait for a bit.

This is the profile I get from the .101 cockroach host. I add a .pdf suffix to support the upload function.profile.pdf (228.5 KB)