Wondering what config can satisfy the following SLA

Hi,

We are evaluating cockroachDB for following use case:

  1. UUID Primary Key - String Value
  2. ~ 5 Billion records
  3. ~ 15000 qps for read
  4. ~ 20000 records/s for write
  5. remove everything whose last_updated_at is before 1 year
  6. across multiple data centers and cloud providers but sync latency can be at seconds
  7. read latency 95percentile < 15 ms

Wondering if cockroachDB can do this, and if yes, how? What would be the cluster/db config look like? Can we deploy this on kubernetes?

Unless you left something out, it sounds like you should use a key-value database. CockroachDB is great when you need a relational database that scales, but I wouldn’t choose it if I don’t need the relational features.

Take a look at Couchbase. It scales extremely well, can easily handle your load on a minimal cluster, it supports cross datacenter replication, and if you need more functionality than key-value storage it also supports document storage and querying with N1QL, which is very similar to SQL.

But you don’t get foreign key constraints or transactions. Cockroach fills a really awesome need for scaling an RDBMS, but it’s not the best solution if you don’t need an RDBMS.

Thanks for you advices.
The reason I did not mention previously is that the cockroachDB document says it only has single digit percentage slowdown for key value storage
https://www.cockroachlabs.com/docs/stable/frequently-asked-questions.html#can-i-use-cockroachdb-as-a-key-value-store

And we are thinking of tech consolidation for our user meta data db (relational data currently in mongoDB - I know it is bad idea, we need to fix it) and our key value transaction data storage (currently in dynamoDB, but we want to be cloud agnostic in the long run)

I agree cockroachDB might not be best solution for this use case but still wondering whether it is possible and how expensive for cockroachDB to hit that SLA, to see if this extra effort is worth the benefit of the tech consolidation.

Hi @Jiale, thanks for the interest in CockroachDB. As you noted above, CockroachDB can be used as a key value store through SQL with very little loss in performance. I’ll break down each of your requirements:

UUID Primary Key - String Value

CockroachDB provides a UUID data type specifically for this use case. Because UUIDs are a fixed 128-bits, they can be more efficiently encoded than a variable length string.

~ 5 Billion records

Given enough disk space, this should be manageable. Do have an estimate on how large each record will be?

~ 15000 qps for read

This should be fine.

~ 20000 records/s for write

This should also be fine as long as these writes are not contended.

remove everything whose last_updated_at is before 1 year

CockroachDB doesn’t yet support row-level TTLs. This is discussed further in this forum post: Auto-Expiring data. You can probably work around this with a cron-job.

across multiple data centers and cloud providers but sync latency can be at seconds

Check out this blog post :slight_smile:

read latency 95percentile < 15 ms

This is very reasonable given proper SQL indexing. Point lookups should be in the single-digit ms range.

Can we deploy this on kubernetes?

Yes, please see our docs on this.

Thanks for the info! Really helpful! @nathan
Could you please be more specific on how to “proper sql index” to achieve single-digit ms range.

Currently I do not have any index on my db.

Here is what I can see from the dashboard:
Queries per second 512.5
P50 latency10.5 ms
P99 latency27.3 ms

Here is what my cluster config:
GKE 1.8.5-gke.0
5 nodes
n1-highcpu-16 (16 vCPUs, 14.4 GB memory)
Container-Optimized OS (cos)
standard storage class

Now only 1 record is inserted. The string less than 100 characters.
The load is generated with random uuid.

How can I diagnose where the bottleneck is?

And it seems if we want to go with cron job work around for TTL, then we will need to have extra column to set the timestamp, in addition to the value column.
In this case, can this still achieve single-digit ms range for read latency 95percentile?

And also when I use pq client (golang) to query the cluster.
It takes almost alway close to 100ms percentile95 from client side just for the query to come back from CRDB, no matter if I co-locate the cockroachdb work pod with the client pod in the same node or not.
Meanwhile the cluster side info is quite different as mentioned above.
Not sure if there is anything I can improve.

How can I diagnose where the bottleneck is?

One thing that sticks out to me is GKE. @a-robinson was just explaining that by default Kubernetes is not set in an optimal configuration. I’m not aware of the details here other than that it has to do with inefficiencies in the Kubernetes network stack. I think Alex is working on drafting a doc on how to run with a better Kubernetes config, but he may be able to provide specific here as well.

we will need to have extra column to set the timestamp, in addition to the value column

Yes, you’ll want to add a second column to your KV table to track the timestamp. This shouldn’t have any noticeable impact read or write latency.

It takes almost alway close to 100ms percentile95 from client side just for the query to come back from CRDB, no matter if I co-locate the cockroachdb work pod with the client pod in the same node or not.

This may also be a Kubernetes issue. Is this only when using the pq client?

Hi @Jiale,

Nathan is correct to say that the performance you’re seeing on GKE is being affected by the fact that you’re running on kubernetes with the default configuration. Unfortunately, the default configuration has been optimized for working everywhere rather than for being as performant as possible. Sometime next week I’ll write up documentation explaining the problems and different ways of deploying cockroach on kubernetes to improve performance, but in my testing I was seeing about 55% worse throughput and about 2x worse latency on a default GKE configuration than directly on the same VMs.

If you don’t have strong reasons for running on Kubernetes, you’ll get the best performance by running directly on your VMs.

If you do have strong reasons for running on Kubernetes, it’ll take some work to improve the performance. You can try things small things like using an SSD storage class increasing the volume size (on GCE, bigger disks give more IOPS), and using host networking (example). Or you can try larger changes, like setting up a GKE node pool with local SSDs and deploying cockroach on them using a DaemonSet (example). Or you can go even further and try to improve the network configuration of your Kubernetes cluster itself, a la:


http://machinezone.github.io/research/networking-solutions-for-kubernetes/

Thanks for the insight, @a-robinson.

We are in the middle of moving all the infrastructures into kubernetes, anything not going to run on kubernetes will require going through special approval process. So it would be great if we can try kubernetes.

We will try your suggestions and we are expecting your documentation.

@a-robinson, I am trying to deploy the DaemonSet version from your example, running into following 2 questions:

Are the 3 ips the external ip of the initial nodes in the cockroachDB node pool?
- “exec /cockroach/cockroach start --logtostderr --insecure --http-host 0.0.0.0 --cache 25% --max-sql-memory 25% --join=10.128.0.4,10.128.0.5,10.128.0.3”

And how to initialize the cluster with this daemonset?

Thanks!

They are the IP addresses of the VMs. In that example config, I was using the VMs internal IP addresses.

Once you’ve created the pods, you can initialize the cluster by picking any of the pods and running kubectl exec -it <podname> -- /cockroach/cockroach init --insecure

Hi @a-robinson,

Any updates on your new write up for the kubernetes deployment?
Would love to read it.

Nothing yet, sorry @Jiale. I’ll ping this thread once it’s written.

Well it took a little longer than expected to publish, but you can find Kubernetes-specific performance guidance here: https://www.cockroachlabs.com/docs/stable/kubernetes-performance.html