Help interpretating TPC-C performance with cockroachdb

I have deployed a default cockroachdb cluster with:

helm install wh2500crdb cockroachdb/cockroachdb --namespace=thesis-crdb

And I’ve initiated a workload of 2500 warehouses:

cockroach workload init tpcc --warehouses=2500 'postgresql://root@wh2500crdb-cockroachdb-public:26257?sslmode=disable'

This took around 5 hours to complete. Then I run the workload:

cockroach workload run tpcc --warehouses=20 --ramp=3m --duration=10m 'postgresql://root@wh2500crdb-cockroachdb-public:26257?sslmode=disable'

I run the test with warehouses=10 and then I’ve increased warehouses=20 and then 40, 80 and all the way up to 2500.

Everything seems to be following a pattern, the throughput is increasing with increased number of workload. But something strange happends at 1280 and 2500 warehouses.

1280 warehouses has a tpmC of 15515
2500 warehouses has tpmC of 8918

Is there a logical explanation for this?
I’ll attach my excel sheet here so you can see for yourselves.

Hi @drogo681,

The pattern you see up to 1280 looks reasonable. As the number of warehouses that the load gen is accessing increases, the maximum throughput increases. This is because the max throughput is artificially limited per warehouse count in TPC-C. For more on this, see references to “pacing” in the TPC-C spec.

The jump from 1280 warehouses to 2500 warehouses is likely passing the saturation point of the system, in which CPU becomes saturated and the system becomes overloaded. Once sufficiently overloaded, it is possible that throughput could actually decrease, as the overload could allow transaction contention to compound. It would be interesting to ramp up slower between 1280 warehouses and 2500 warehouses to get a better picture of the scaling curve. For instance, you could test out all warehouse counts between 1300 and 2500, at 100 warehouse increments.

By the way, what kind of hardware is this running on? We run TPC-C very regularly, so we have a pretty good understanding of what kind of throughput limits to expect on different hardware.

Nathan

Hi @nathan

Thank you for the clarification. I will continue running the tests with smaller steps between 1280 and 2500 to see where the throughput starts to decrease.

I am doing my thesis at a big company that probably wants to be unnamed. I have been given an already deployed kubernetes cluster in which I have been allowed to run the tests with cockroachdb. So I don’t know which hardwares that is being used, unless there is a command-line or some other way to get the hardware details?

You say that you do TPC-C very regulary, do you perhaps have public data / references / documents that could be shared with me? It could prove to be valuable assets for my thesis “Evaluation of CockroachDB as a cloud-native distributed database” :slight_smile: