Throughput not increasing when doubling nodes (3 to 6)

#1

I’m currently testing CockroachDB on AWS for our very simply yet demanding use case; registering as many users per second as possible using PHP (apache module). I created a simplified ‘users’ table with 2 fields; id (string(36)) and name (string).

PHP is extremely annoying because each process is isolated, meaning each script execution has to open it’s own DB connection. This alone is a bottleneck for many db systems I test. I also can’t batch the inserts, they have to be separate queries. To make matters worse, each connection triggers a SELECT on CockroachDB’s users table, for authentication. With all that in mind;

I started with running the following PHP script on a c5.2xlarge instance, with cockroachDB running on a single, separate c5.2xlarge instance.

require 'vendor/autoload.php';

use Ramsey\Uuid\Uuid;

  $dbh = new PDO('pgsql:host=x.x.x.x;port=26257;dbname=testdb;sslmode=disable',
    'testuser', null, array(
      PDO::ATTR_ERRMODE          => PDO::ERRMODE_EXCEPTION,
      PDO::ATTR_EMULATE_PREPARES => true,
  ));

  $uuid4 = Uuid::uuid4();
  
  $dbh->exec("INSERT INTO testdb.users (id, name) VALUES ('".$uuid4->toString()."', 'somename')");

Throughput: 2400 inserts/sec

I then added 2 more nodes, and modified the PHP script to select 1 of 3 node IP’s randomly

Throughput: 3400 inserts/sec

I was surprised the inserts increased at all, since the default replication factor is 3. But maybe connecting to 3 nodes instead of 1 helped (example: less open TCP connections than on just the 1 node)

I then added 3 more nodes, altered the PHP script again (more IPs), expecting the throughput to (sort of) double. But to my surprise…

Throughput: 3600 inserts/sec

A very very moderate increase. But hey, maybe the ranges hadn’t split yet (I hope I said that right/you know what I mean) so let’s keep the loadtest going. 10 minutes later, a couple 100 MB’s of data were added, which should have triggered more 64-MB splits, but the throughput remained the same… 3600/sec

So I assumed the client was the bottleneck and added another. Running both clients simulatiously:

Throughput: 2x1800 inserts/sec

Looks like the client wasn’t the bottleneck after all. Does anyone here know what the bottleneck might be, or how I might find it?

Also, do I even understand the replication correctly? I’m under the assumption that if my replication factor is 3, and I have 6 nodes, that these 6 nodes form 2 ‘groups’ of 3 nodes of some kind, each handling their own half (or range) of the data. If this assumption is incorrect then it might make a lot of sense that my throughput didn’t increase/double, but then I’d love to find out what exactly it is that I’m misunderstanding. Thanks!

(Ron Arévalo) #2

Hey @stijnv404,

Throughput can be affected by the disk I/O, CPU usage, and network latency. Could you check Admin UI and take note of the following metrics:

Disk I/O: Disk IOPS in progress
CPU usage: CPU percent
Network latency: Service latency SQL 99th percentile

Also as a side note, Multi-row inserts are faster than multiple single-row INSERT statements.
However, I see that you’re not able to batch inserts. Also, could you let me know the volume size you’re using on the c5.2xlarge machines.

Thanks,

Ron

#3

Hi Ron,

aren’t disk IO, CPU and network latency all things that should be scaling horizontally? I feel like I’m misunderstanding something about the replication factor in relation to the amount of nodes. If my RF is 3, and I double my amount of nodes from 3 to 6, should throughput theoretically double with my test? Or is this assumption wrong in the first place?

In any case, I’ll set up the test env again and have a look at those stats. I’ll also set up a different test client in a different language than PHP to see if connection pooling helps (even though this won’t help me as I have to use PHP.) The volume is 100GB General Purpose SSD block storage, per node.

Thanks,
Stijn

#4

Ok, so the test results were way better for nodejs using pg pool, matching my expectations. I did notice the default replication factor of ‘system’ is 5, and each PDO connection casues a read there for authentication. I reduced this to RF 3 for my testing purposes but I don’t know if this takes effect since I did it after scaling up from 3 to 6 nodes. Though scaling from 6 to 12 nodes still leaves me with 3.6k/s inserts using PHP. All in all it seems there’s something going on with PHP/PDO that prevents me from scaling out.


1 node / 1 client (PHP, pdo):

Throughput: 3k/s

Disk IOPS in progress: Back and forth between 1 and 0 every 10 to 30 seconds
CPU Percent: ~80%
Service latency SQL 99th percentile: ~80ms


1 node / 1 client (nodejs, pg pool):

Throughput: 11k/s

Disk IOPS in progress: between 1 and 5 (only dropped to 0 once in 3min)
CPU Percent: ~70%
Service latency SQL 99th percentile: ~20ms


3 nodes / 1 client (nodejs, pg pool):

Throughput: 9k/s

Disk IOPS in progress: 1 to 6 (mostly 1)
CPU Percent: ~50%
Service latency SQL 99th percentile: ~24ms


3 nodes / 2 clients (nodejs, pg pool):

Throughput: 13k/s

Disk IOPS in progress: 1 to 6 (mostly 1)
CPU Percent: ~65%
Service latency SQL 99th percentile: ~40ms


3 nodes / 3 clients (nodejs, pg pool):

Throughput: 14k/s

Disk IOPS in progress: 1 to 5 (mostly 1)
CPU Percent: ~70%
Service latency SQL 99th percentile: ~60ms


3 nodes / 1 client (PHP, pdo):

Throughput: 3k/s

Disk IOPS in progress: 0 to 1
CPU Percent: ~70%
Service latency SQL 99th percentile: ~170ms


6 nodes / 1 client (nodejs, pg pool):

Throughput: 11k/s

Disk IOPS in progress: ~1
CPU Percent: ~35%
Service latency SQL 99th percentile: ~25ms


6 nodes / 2 clients (nodejs, pg pool):

Throughput: 17k/s

Disk IOPS in progress: ~1
CPU Percent: ~60%
Service latency SQL 99th percentile: ~30ms


6 nodes / 3 clients (nodejs, pg pool):

Throughput: 22k/s

Disk IOPS in progress: 1 to 4
CPU Percent: ~60%
Service latency SQL 99th percentile: ~40ms


6 nodes / 1 client (PHP, pdo):

Throughput: 3.6k/s

Disk IOPS in progress: ~1
CPU Percent: ~30% but one node (node 1) is on 80% ???
Service latency SQL 99th percentile: ~160ms

More stats here: https://i.imgur.com/rPtCTZG.png using 6 node cluster. The first ‘part’ of the graphs is when I tested with PHP, then there’s a drop to reset, and then the nodejs test starts.


12 nodes / 3 clients (nodejs, pg pool):

Throughput: 30k/s

Disk IOPS in progress: ~1
CPU Percent: ~50%
Service latency SQL 99th percentile: ~30ms


12 nodes / 6 clients (nodejs, pg pool):

Throughput: 45k/s

Disk IOPS in progress: ~1
CPU Percent: ~70%
Service latency SQL 99th percentile: ~40ms


12 nodes / 2 clients (PHP, pdo):

Throughput: 3.6k/s

Disk IOPS in progress: ~1
CPU Percent: ~20% and node1 ~80% (???)
Service latency SQL 99th percentile: ~250ms

Super weird that with PHP, 1 of the nodes has such high CPU while the rest remains fairly low.


PHP script

<?php

require 'vendor/autoload.php';

use Ramsey\Uuid\Uuid;

try {
	
  $hosts = array(
	"172.31.37.133", "172.31.39.37", "172.31.46.125", "172.31.45.100", "172.31.38.87", "172.31.39.11"
  );

  $dbh = new PDO('pgsql:host='.$hosts[array_rand($hosts)].';port=26257;dbname=testdb;sslmode=disable',
    'testuser', null, array(
      PDO::ATTR_ERRMODE          => PDO::ERRMODE_EXCEPTION,
      PDO::ATTR_EMULATE_PREPARES => true,
  ));

  $uuid4 = Uuid::uuid4();
  
  $dbh->exec("INSERT INTO testdb.users (id, name) VALUES ('".$uuid4->toString()."', 'myname')");

} catch (Exception $e) {

}

NodeJS script

const async = require('async');
const fs = require('fs');
const pg = require('pg');
const uuidv4 = require('uuid/v4');
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;

// Connect to the database.
var configs = [
	{
      user: 'testuser',
      host: '172.31.41.30',
      database: 'testdb',
      port: 26257
	},
	{
      user: 'testuser',
      host: '172.31.41.217',
      database: 'testdb',
      port: 26257
	},
	{
      user: 'testuser',
      host: '172.31.46.50',
      database: 'testdb',
      port: 26257
	},
	{
      user: 'testuser',
      host: '172.31.37.133',
      database: 'testdb',
      port: 26257
	},
	{
      user: 'testuser',
      host: '172.31.39.37',
      database: 'testdb',
      port: 26257
	},
	{
      user: 'testuser',
      host: '172.31.46.125',
      database: 'testdb',
      port: 26257
	}
];

if (cluster.isMaster) {
    console.log(`Master ${process.pid} is running`);

    // Fork workers.
    for (let i = 0; i < numCPUs; i++) {
      cluster.fork();
    }

    cluster.on('exit', (worker, code, signal) => {
      console.log(`worker ${worker.process.pid} died`);
    });
} else {
    console.log(`Worker ${process.pid} started`);
	
	// Create a pool.
	var pool = new pg.Pool(configs[Math.floor(Math.random()*configs.length)]);

	var concurrent = 600;
	var insertOne;

	insertOne = function() {

		var didSoFar = 0;
				
		for(var i=0; i<concurrent;i++)

			pool.connect(function (err, client, done) {
				
				client.query("INSERT INTO users (id, name) VALUES ('"+uuidv4()+"', 'myname');", function(){
					done();
					didSoFar++;
					if(didSoFar == concurrent)
						insertOne();
				});

		});

	};

	insertOne();
}
(Tim O'Brien) #5

@stijnv404, I’ve been trying to reproduce your tests on GCE using those scripts without success. Probably something I’m doing wrong.

What do you see on the statements page when you run the tests? I suspect that the churn in connections is exhausting the leaseholder on the system.users table. You can confirm this in two steps:

  1. Run show testing_ranges from table system.users; on your cluster, note the lease_holder, and see if this is the same node with high CPU utilization.
  2. Check the statements page in the UI and see whether or not there’s high latency on a query like SELECT "hashedPassword" FROM system.users WHERE (username = $1) AND ("isRole" = _).

If the high utilization node = the system.users leaseholder, and you see high volume and latency on the select "hashedPassword" query during your PHP test, then the culprit is the churn in queries.

Generally we expect that connections are pooled or persistent at least for some time. I’m not a PHP developer, but some searching found https://www.php.net/manual/en/function.pg-pconnect.php - it looks like this will at least preserve a connection to a postgres database.

Let me know if the leaseholder is the node with high utilization and if pg_connect would work for you.

#6

Hi Tim,

you were correct on both accounts, and using pg_pconnect the issue seems to be gone. With 3 nodes I can now get 9k inserts/s just like with nodejs. I also found out that the same can be achived with PHP PDO using the extra setting PDO::ATTR_PERSISTENT => true, like so:

$dbh = new PDO('pgsql:host='.$hosts[array_rand($hosts)].';port=26257;dbname=testdb;sslmode=disable',
'testuser', null, array(
  PDO::ATTR_ERRMODE          => PDO::ERRMODE_EXCEPTION,
  PDO::ATTR_EMULATE_PREPARES => true,
  PDO::ATTR_PERSISTENT => true
));

Maybe this is something you want to add in the docs at https://www.cockroachlabs.com/docs/stable/build-a-php-app-with-cockroachdb.html ? Because that’s where I got the code sample from.

Thanks a lot for the info, if I still run into some scaling issues I’ll report back.

(Tim O'Brien) #7

@stijnv404 - thanks for the heads up. I just created a PR to fix the examples: https://github.com/cockroachdb/docs/pull/4804 . Should be fixed within the next couple of days.