Workload tpcc init failed

#1

when I use workload to init tpcc data, got the following error

time ./workload init tpcc --warehouses 1000 --drop "postgresql://root@172.16.50.103:26257?sslmode=disable"
Error: failed insert into customer: pq: result is ambiguous (error=rpc error: code = Unavailable desc = transport is closing [propagate])

the node 172.16.50.103 tpcc connected works all right. the log like below:

172.16.50.103 crdb log

================================================

I retry workload init and got the following error:

[root] # time ./workload init tpcc --warehouses 1000 --drop "postgresql://root@172.16.50.103:26257?sslmode=disable"
Error: failed insert into customer: pq: TransactionStatusError: transaction deadline exceeded
real    340m12.730s
user    19m9.961s
sys     0m58.122s
(Jamie Hops) #2

I got the same error running the tpcc workload against a 3 node cluster on gce instances. What causes this and how do I get around it?

(Tim O'Brien) #3

@jhops what instance type are you using on GCE? What type of storage are you using and with what mount options?

(Jamie Hops) #4

I used the helm chart that you guys provided on github to spin up 3 pods in a container. I got around that error by dropping the tpcc db and reloading,it worked the seond time. I ran the test with 100warehouses.
Im now getting a new error after running tpcc init with 1000 warehouses. When i try to run the test i get the following:
bash-3.2$ cockroach workload run tpcc “postgres://root@10.22.193.175:26257?sslmode=disable” --warehouses 1000 --duration 10m

Error: pq: update-setting: split failed while applying backpressure: could not find valid split key

(Ron Arévalo) #5

Hey @jhops,

The error you’re seeing usually happens when MVCC versions saturate a range causing it to takes up more than 128 MB (twice the max range size). This is normally due to repeatedly writing to a single range. I’m not sure if it’s the tpcc workload that is causing it in this case, but a quick fix is to lower the value of gc.ttlseconds from it’s default of 90000 (25 hours) to something a lot lower, though you should ideally change this back to 25 hours. You’d need to issue the following command:

ALTER RANGE default CONFIGURE ZONE USING gc.ttlseconds = <NEW VALUE HERE>

Also, it would be great to know the instance type, storage type and mount options as well. It might help us debug and get to the root issue why you’re running into the issue of repeatedly writing to the same range.

Thanks,

Ron

(Tim O'Brien) #6