I’m currently testing CockroachDB on AWS for our very simply yet demanding use case; registering as many users per second as possible using PHP (apache module). I created a simplified ‘users’ table with 2 fields; id (string(36)) and name (string).
PHP is extremely annoying because each process is isolated, meaning each script execution has to open it’s own DB connection. This alone is a bottleneck for many db systems I test. I also can’t batch the inserts, they have to be separate queries. To make matters worse, each connection triggers a SELECT on CockroachDB’s users table, for authentication. With all that in mind;
I started with running the following PHP script on a c5.2xlarge instance, with cockroachDB running on a single, separate c5.2xlarge instance.
require 'vendor/autoload.php'; use Ramsey\Uuid\Uuid; $dbh = new PDO('pgsql:host=x.x.x.x;port=26257;dbname=testdb;sslmode=disable', 'testuser', null, array( PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_EMULATE_PREPARES => true, )); $uuid4 = Uuid::uuid4(); $dbh->exec("INSERT INTO testdb.users (id, name) VALUES ('".$uuid4->toString()."', 'somename')");
Throughput: 2400 inserts/sec
I then added 2 more nodes, and modified the PHP script to select 1 of 3 node IP’s randomly
Throughput: 3400 inserts/sec
I was surprised the inserts increased at all, since the default replication factor is 3. But maybe connecting to 3 nodes instead of 1 helped (example: less open TCP connections than on just the 1 node)
I then added 3 more nodes, altered the PHP script again (more IPs), expecting the throughput to (sort of) double. But to my surprise…
Throughput: 3600 inserts/sec
A very very moderate increase. But hey, maybe the ranges hadn’t split yet (I hope I said that right/you know what I mean) so let’s keep the loadtest going. 10 minutes later, a couple 100 MB’s of data were added, which should have triggered more 64-MB splits, but the throughput remained the same… 3600/sec
So I assumed the client was the bottleneck and added another. Running both clients simulatiously:
Throughput: 2x1800 inserts/sec
Looks like the client wasn’t the bottleneck after all. Does anyone here know what the bottleneck might be, or how I might find it?
Also, do I even understand the replication correctly? I’m under the assumption that if my replication factor is 3, and I have 6 nodes, that these 6 nodes form 2 ‘groups’ of 3 nodes of some kind, each handling their own half (or range) of the data. If this assumption is incorrect then it might make a lot of sense that my throughput didn’t increase/double, but then I’d love to find out what exactly it is that I’m misunderstanding. Thanks!