Not getting high enough write speeds but there is no resource bottleneck

(Arnab Kundu) #1

I am trying to write a data frame containing 1M rows to my cockroach cluster from spark using the JDBC Postgres driver. It’s taking more than 15 mins to write. I was not able to identify any throughput bottlenecks on my cluster end with CPU reaching 2-3%, memory reaching about 1 GB and write IOPS at 3.5k.
Current cluster topology is 3 instances (c5.4xlarge) -
16vCPU
32Gb ram
15000 provisioned IOPS

Need some help from the community in getting this performance under 2 min.

(Tim O'Brien) #2

@arnabkund how many rows are you attempting to write per insert statement?

(Arnab Kundu) #3

Batch size of 10,000 from the JDBC connector

(Rich Loveland) #4

Hi @arnabkund

We have a couple of issues to document how to make JDBC inserts faster. Here are the suggestions (and docs issue links for more context).

  1. Make sure your JDBC connection is passing the rewriteBatchedInserts=true param (https://github.com/cockroachdb/docs/issues/3578)

  2. Try using 128 inserts per batch (or some other power of 2 such as 512 – https://github.com/cockroachdb/docs/issues/4399).

Please give these a try and let us know how it goes.

1 Like