Reproducing YCSB results for CRDB

Hello,
I have been using CRDB for a while and it’s really an amazing DB. I came to know about CRDB by reading the published research paper.
I was interested specifically in reproducing the YCSB results mentioned in the paper for 3 servers.


Although there is a mention of the results, there is no mention about how we can reproduce it. It would be great if someone can help me by sharing the steps to reproduce the same.
I see there is some mention here https://www.cockroachlabs.com/blog/unpacking-competitive-benchmarks/ about We will publish reproduction steps for the primary tests in this document and will update this post as soon as they are available, but I couldn’t find anything beyond this.

Thanks and Regards,
Ritesh Sinha.

Hi Ritesh,

Thanks for the question. I’m moving your specific questions over from this Slack thread: https://cockroachdb.slack.com/archives/CP4D9LD5F/p1599675728258000.

  1. I am trying to reproduce the workload A results for 3 servers(4 vCPUs). Can you please let me know what was the hardware config of the client and server machines in terms of memory, network bandwidth and whether the storage was SSD or not?

This was run on a 3 node cluster of GCP n1-standard-{4,8,16} machines. The machines each used one local SSD using an NVMe interface. If this was run today, we would instead recommend running on n2-standard-{4,8,16} machines with 2 local SSDs (the minimum) for moderately improved performance. Either will work.

  1. Config used for starting cockroachdb on the 3 servers? like the store , cache , max-sql-memory and all.

We followed the following steps to manually deploy the 3 servers: https://www.cockroachlabs.com/docs/v20.2/deploy-cockroachdb-on-google-cloud-platform-insecure. store was set to the SSD’s mountpoint. cache was set to .25 (--cache=.25). max-sql-memory was set to .25 (--max-sql-memory=.25).

Please let me know if you have other questions,

Nathan

Thanks @nathan for the information.
Can you also help me with these please, regarding the YCSB configuration ?

  1. insertorder value
  2. db.batchsize value
  3. requestdistribution value
  4. insertcount value
  5. recordcount value
  6. operationcount value

Also, please let me know if there is any specific(non-default) constraint/config imposed in terms of YCSB or starting the cockroachDB servers apart from the steps mentioned here - https://www.cockroachlabs.com/docs/v20.2/deploy-cockroachdb-on-google-cloud-platform-insecure.

Hi Ritesh, these were the YCSB configurations:

insertorder = unspecified
db.batchsize = unspecified
requestdistribution = unspecified, used each workload’s default value
insertcount = unspecified
recordcount = 10000000
operationcount = 1000000

Also, please let me know if there is any specific(non-default) constraint/config imposed in terms of YCSB or starting the cockroachDB servers apart from the steps mentioned here - https://www.cockroachlabs.com/docs/v20.2/deploy-cockroachdb-on-google-cloud-platform-insecure.

The only other thing I can think of is that that the 3-node cluster was run across 3 zones in the same region. The load gen was run from another VM in that same region.

Also, just as a reminder, the schema was set up like:

CREATE TABLE usertable (
    ycsb_key VARCHAR(255) PRIMARY KEY NOT NULL,
    FIELD0 TEXT NOT NULL,
    FIELD1 TEXT NOT NULL,
    FIELD2 TEXT NOT NULL,
    FIELD3 TEXT NOT NULL,
    FIELD4 TEXT NOT NULL,
    FIELD5 TEXT NOT NULL,
    FIELD6 TEXT NOT NULL,
    FIELD7 TEXT NOT NULL,
    FIELD8 TEXT NOT NULL,
    FIELD9 TEXT NOT NULL,
    FAMILY (ycsb_key),
    FAMILY (FIELD0),
    FAMILY (FIELD1),
    FAMILY (FIELD2),
    FAMILY (FIELD3),
    FAMILY (FIELD4),
    FAMILY (FIELD5),
    FAMILY (FIELD6),
    FAMILY (FIELD7),
    FAMILY (FIELD8),
    FAMILY (FIELD9)
)