Different tpmc from repeated tpcc run

Hi

I’m trying to find the concurrence number (–worker flag in tpcc) that maximizes the tpmc result. Currently, we are doing this:

cockroach start-single-node --insecure --background
			
workers_lst = [1, 5, 15 … , 150]
for x in workers_lst:
	cockroach workload init tpcc --warehouses=100 --drop 'postgresql://root@localhost:26257?sslmode=disable'
    cockroach workload run tpcc --warehouses=100 --workers=x --wait=false --ramp=3m --duration=10m 'postgresql://root@localhost:26257?sslmode=disable'

We fixed the warehouses number and duration and tried a different numbers of workers to get the tpmc. But I found that with the same setting, tpmc can be very different. For example, from the above code, I got 35 as my best worker number that maximizes the tpmc in my search space [1, 5, 15 … , 150]. And I tried repeating the following setting for 10 times:

cockroach workload init tpcc --warehouses=100 --drop 'postgresql://root@localhost:26257?sslmode=disable'
cockroach workload run tpcc --warehouses=100 --workers=35 --wait=false --ramp=3m --duration=10m postgresql://root@localhost:26257?sslmode=disable 

And this is what I got
image

Clearly, the tpmc with worker as 35 is not very stable and then I tried the default workers (warehouses x 100 = 1000) setting and repeated for 15 times:

cockroach workload init tpcc --warehouses=100 --drop 'postgresql://root@localhost:26257?sslmode=disable'
cockroach workload run tpcc --warehouses=100 --ramp=3m --duration=10m postgresql://root@localhost:26257?sslmode=disable 

Here is what I got. It seems much more stable than the 35 workers result.
image

So I think my question is:

  1. Why tpmc varies a lot when the worker is small (e.g. 35), while it’s much more stable when I’m using the default worker (e.g. warehouse x 10 =1000)? Does the stability depends on the number of workers?
  2. Since the ultimate goal for me is to find out the worker that maximize the tpmc, with the variation of the tpmc here, I think It might be better to repeat the setting for 10 times and average the tpmc as my sample, then compare across the averaged tpmc from the search space (e.g. [1, 5, 15 … , 150]). But this is very time-consuming. So I’m wondering if there is any better way to get the worker that maximize the tpmc?

Thanks!

Hi Cherie, sorry for the slow response; this one slipped through the cracks. Your proposed plan of repeating each worker count ten times sounds like a good start.

To ensure stable performance, have you ensured that your deployment follows the guidelines from the CockroachDB Production Checklist?