Increase TPS using golang with pq driver

I am reading a text log file in golang and inserting the parsed records into the CRDB single table. I have CRDB deployed on single node baremetal. I am running the go binary on the same node as CRDB. From the admin dashboard I could see I am having 530 inserts per second. It does not increase further. I was wondering if there is any way to increase the number of INSERTS since my CPU and RAM utilisation is very nominal. Please find the attached screenshots.

The following is my source code if there is any room for improvement on performance.

db, err := sql.Open("postgres",
	"postgresql://root@x.x.x.x:26257/defaultdb?ssl=true&sslmode=require&sslrootcert=certs/ca.crt&sslkey=certs/client.root.key&sslcert=certs/client.root.crt")
if err != nil {
	log.Fatal("error connecting to the database: ", err)
}
defer db.Close()

csvFile, _ := os.Open("log.txt")
reader := csv.NewReader(bufio.NewReader(csvFile))
reader.Comma = '|'

for {
	line, error := reader.Read()
	if error == io.EOF {
		break
	} else if error != nil {
		fmt.Println(error)
		continue
	}

	a := []rune(line[0])
	dt := string(a[0:19])

	t, err := time.Parse("02-01-2006 15:04:05", dt)

	if err != nil {
		fmt.Println(err)
	}
	
	msisdn := string(a[len(a)-13:])
	agent := string(line[2])
	
	if _, err := db.Exec(
		"INSERT INTO logstore_follow (log_date, msisdn, agent) VALUES ($1,$2,$3)", t, msisdn, agent); err != nil {
		fmt.Println("Failed to INSERT ", line)
		fmt.Println(err)
		return
	}
}

Hey @ahmedreza,

It looks like you have a single node - you’re much more likely to be exhausting IO than CPU and RAM here. How much Disk IO do you have available while inserting? What type of disk are you using on the node? What’s the DDL?

Hi @tim-o

I am using 10K SAS on Raid 1. However, when I ran the log insert program from 2 nodes (one program is hosted on the CRDB node and another program is on the same LAN), the QPS became ~950. So, the node can actually write 950 INSERTS per second with the existing HDD. DDL is as follows :

CREATE TABLE logstore_follow (
log_date TIMESTAMPTZ NULL,
msisdn STRING NULL,
agent STRING NULL,
FAMILY "primary" (log_date, msisdn, agent, rowid)

)

After running INSERT from 2 nodes, the disk IO increased. I was wondering how can I increase QPS from running it from single node.

Hi Ahmed,

To increase insert throughput, you can:

  1. parallelize - use multiple connections (and so multiple goroutines). Split your CSV into multiple shards and process one shard per connection.
  2. batch inserts to get more throughput from a single connection. With lib/pq, the best way to do this might be with the copy statement. See the example in the docs, in the “bulk imports” section here:
    https://github.com/lib/pq/blob/9eb73efc1fcc404148b56765b0d3f61d9a5ef8ee/doc.go
    I’m not completely sure that works with CRDB since our copy support is not perfect, but I’d be curious to find out.
    If that doesn’t work, you need to manually put a bunch of rows into the same insert statement. I think you can’t use placeholders at that point and need string concatenation instead (and btw, if you remove the placeholder from your insert you should see a throughput improvement even with a single row per statement). So, the insert syntax you’d be using is insert into logstore_follow(...) values ('x','y', 'z'), ('x', 'y', 'z'), ....

If you’re not married to lib/pq, you can try using pgx which is generally a better driver, which has another batching interface. See https://github.com/jackc/pgx/blob/master/batch.go.

The reason for the slowness is you are doing a single INSERT per transaction. When you run db.Exec with a single statement (the INSERT), it implies a transaction with just that statement. Transactions have some overhead regardless of their size. To get more speed you should indeed parallelize your INSERTs.

I suggest trying these two and seeing which is faster (Andrei’s suggestion of COPY will also work).

  1. (Similar to what Andrei said): make a single string that’s a lot, say, 10,000 INSERTS. Use go’s strings.Builder for that. It should look like INSERT INTO logstore_follow (log_date, msisdn, agent) VALUES ($1, $2, $3), ($4, $5, $6), (...). Then you can make a really big slice with all the placeholders. Running db.Exec on that should work.
  2. Use db.Begin to start a real transaction, do 10,000 single INSERTS there, then tx.Commit.

I suspect the first way will be faster, but am not sure.

Another thing to consider if this is a new table is to use IMPORT which can ingest CSV files directly.