Java program with batch inserts

sql
(Allen Herndon, FedEx) #1

I have a java benchmark program that I use to test various Relational DBMS products. First I load n tables with m rows, and usually commit every 75 rows or so per batch. Then I benchmark against the loaded tables with oltp SQL. Both the load and the benchmark SQL utilize isolated connections to the db, that is each db connection has its own separate table to load or do oltp SQL against. Its not a concurrency test.

But when testing on cockroachdb, ( using the latest postgres driver ) , I repeatedly hit the below exception. After alot of trial and error I found I had to limit my batch insert loops to one row per commit. (My code turns off autocommit). How can I perform a batch of inserts with cockroachdb? I am using prepared statements with addBatch for each insert, then executeBatch when my configured batch size is reached… which now only can be 1 per batch for cockroach.
Thanks

here is the exception…
org.postgresql.util.PSQLException: ERROR: restart transaction: HandledRetryableTxnError: TransactionRetryError: retry txn (RETRY_ASYNC_WRITE_FAILURE): “sql txn” id=cc7f3d63 key=/Table/2622/1/427711737021530114/0 rw=true pri=0.00635316 iso=SERIALIZABLE stat=PENDING epo=0 ts=1550597663.495137117,0 orig=1550597663.495137117,0 max=1550597663.495137117,0 wto=false rop=false seq=7
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgConnection.executeTransactionCommand(PgConnection.java:755)
at org.postgresql.jdbc.PgConnection.commit(PgConnection.java:777)
at PBJ.loadData.loadUser(loadData.java:122)
at PBJ.loadData.run(loadData.java:188)
at java.lang.Thread.run(Thread.java:748)

(Tim O'Brien) #2

Hi @aherndon,

How does your test go if you try committing either 64 or 128 rows per batch? Assuming you’re using pgjdbc and the reWriteBatchedInserts=true option, adjusting the batch size to a power of two ought to lead to a big improvement in performance. This is pretty obscure and only documented in the code and on SO.

If that doesn’t work, can you run set cluster setting kv.transaction.write_pipelining_enabled=false and send us fresh logs? That should give us a bit more information, at a performance cost.

Thanks!

(Allen Herndon, FedEx) #3
Tim,
I found a problem in my program. now I have found that the rewritebatchedinserts connect string option is the key. I was able to insert around 6,000 rows/sec.
Thanks!
Allen
(Tim O'Brien) #4

@aherndon that’s great to hear! Could you outline what the issue was so that others can fix it if they encounter it?

(Allen Herndon, FedEx) #6

using reWriteBatchedInserts=true boosted throughput