How cockroach utilizes multiple disks on a single node?

I have 3 nodes where each node got 6 SSD disks.

As per production recommendations https://www.cockroachlabs.com/docs/stable/recommended-production-settings.html we run one instance per node with multiple stores.

Command line looks like that

/usr/local/bin/cockroach start --certs-dir=/etc/cockroach/ssl --http-host=127.0.0.1
–http-port=8080 --cache=5% --max-sql-memory=80%
–store=path=/mnt/sda4,attrs=ssd,size=750GB
–store=path=/mnt/sdb4,attrs=ssd,size=750GB
–store=path=/mnt/sdc,attrs=ssd,size=750GB
–store=path=/mnt/sdd,attrs=ssd,size=750GB
–store=path=/mnt/sde,attrs=ssd,size=750GB
–store=path=/mnt/sdf,attrs=ssd,size=750GB
–locality=region=US,datacenter=X --port=26257
–logtostderr
–join=10.1.1.2:26257,10.1.1.3:26257
–host=10.1.1.1
–advertise-host=10.1.1.1

What I see, when cluster is initialized, and start filling data, is that, instead of spreading ranges across multiple disks, it is selecting 1 disk per node and start filling it.

This can be verified by tools like iostat, node exporter, df etc where we can see that all of IO goes to one disk, usage goes to one disk etc.

This is especially true at the beginning when database is freshly initialized and I start filling it.

Is there a way I can control the way the disks should be utilized something like creating new ranges in round robin.

What do you think about this in general? Does this make any sense at all?

My idea is that if we have multiple disks if we write into them simultaneously we will get higher write throughput.

1 Like

Hello!
Thank you for this interesting question. We are aware of a current limitation with the ability of CockroachDB to automatically spread ranges across multiple stores on one node.

The situation is improved when there are multiple nodes and ranges are balanced from one node to another, then CockroachDB tends to automatically spread the replicas across multiple stores.

We are working on this limitation for CockroachDB 2.1 and beyond.

In the meantime, as of CockroachDB 2.0, our recommendation for the most “automatic” load balancing between SSD drives would be to use filesystem-level striping across the multiple disks. This will increase read and write bandwidth just as well as using multiple stores in CockroachDB.

The main reason why the feature to use multiple stores per node exists in CockroachDB is to combine it with zone configurations, so you can choose which table data goes to which store (and which hard drives). This is important e.g. when you have stores with different performance characteristics and you want to utilize this difference when laying out your data.

When the underlying storage performance is the same, the benefit of partitioning the data is not there and then a filesystem-level solution may be more appropriate.

Does this clarify?

To clarify this: when a replica is balanced from one node to another, CockroachDB will automatically pick the store on the new node with the least replicas on it already. That is why it will tend to equally use the multiple stores automatically during rebalancing.

The limitation thus mainly exists only for intra-node rebalancing.

You clarified that perfectly. Thank you for the detailed answer @knz

I decided to create a pull request on this one. I hope you will find it usefull:

https://github.com/cockroachdb/docs/pull/3488

Once again thank you for your help.

1 Like

Thank you for your contribution to our docs!