Docker Swarm Scaling

This was an excellent article on deploying Cockroach on Docker Swarm with secrets:

https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-docker-swarm.html

It is creating 3 separate docker services and allowing them to talk over an overlay network.

But, why 3 separate services?

Does each CockroachDB node require a separate Cert/Key?
Can you not use the same CA/Cert/Key created across all Cockroach nodes in a cluster?

Also, Docker Swarm 1.13.1 uses TLS encryption between each swarm node. So any “internal” communications between physical swarm nodes is secure. And since we specified a dedicated network, cockroachdb, this keeps other containers from being able to sniff the packets.

So do we need to run in “secure” mode in Docker Swarm?


The key differences seem to be these lines:

--replicas 1
--mount type=volume,source=cockroachdb-3,target=/cockroach/cockroach-data,volume-driver=local

The first one makes the service only run one instance.

And the second line basically creates an unnamed local Data Volume, mounted locally to the machine. Ok, I get that. Basically this:

  • Allows the creating of an anonymous data container for that one Cockroach node running in that one container, on whatever swarm node it starts on.
  • Mounts the data volume local to the swarm’s cluster node disk, instead of to a virtual Data Container. Perhaps because it is much faster.

Does Cockroach need fast SSD IOPS?
Can it survive on 300 IOPS poor-speed drives?

If so, can we drop the volume-driver=local option?

If we can, that would:

  • Create anonymous data volume containers, linked to the Cockroach running instance. Should remain persisted between restarts. Though rolling upgrades, I’ll have to research.
  • Allow us to define just 1 cockroach service for the entire swarm, and use the scale=X feature to scale a the cockroach Docker Swarm service to X instances - when there are only Y physical swarm machines (nodes) available, as an example.

That would provide us with more saturation options for underutilized boxes in the swarm, specifically by allowing us to scale up more instances than physical swarm boxes.

This could in theory work. But, I am wondering if this is a bad idea.

1 Like

Well that’s a lot of questions :slight_smile: I’ll try to take them one-by-one.

But, why 3 separate services?

Because each cockroachdb container needs to be independently addressable, and at the time that we created the docs there were some issues cockroach that made it not work if its network address changed across restarts. The separate services are used to give them separate network identities. Also, for secure mode, we need to know the network identity of each node in advance to properly sign the certificate for it.

Since we’ve fixed up all the known issues with the network address changing across restarts, you might now be able to run a cluster as just a single service, but I haven’t tried it out. And to do a secure cluster, you’d need something like dynamic certificate signing requests on startup to get certs generated for the container’s random hostname.

Does each CockroachDB node require a separate Cert/Key?
Can you not use the same CA/Cert/Key created across all Cockroach nodes in a cluster?

It’s best practice to give each node its own cert/key, but it’s not strictly required. If you wanted, you could generate a node cert/key with multiple hostnames included in it (e.g. cockroachdb-1,cockroachdb-2,cockroachdb-3) and use it for all of the nodes.

So do we need to run in “secure” mode in Docker Swarm?

If your threat model is just to be concerned about packet sniffing, then probably not. But if your threat model is about being able to authenticate clients connecting to the cluster, then yes, since authentication only really works in secure clusters. See our docs for info on what you give up when running in insecure mode.

Does Cockroach need fast SSD IOPS?

It doesn’t truly need it, but it’s usually worth provisioning SSD for your database if you can (no matter what database software you’re running).

Can it survive on 300 IOPS poor-speed drives?
If so, can we drop the volume-driver=local option?

Yes, but its performance won’t be very good. If it meets your needs, though, go for it.

I do think that what you propose should be feasible. I don’t think I’d do it for my personal projects though. Having two cockroach processes on the same machine, one using resources x and one using resources y, will almost always perform worse than having one cockroach process on the machine using x+y resources.

Thanks for the detailed replies!

Yes, I now see that certs are needed for authentication - at the time of writing I did not know that. And thanks for the link, makes sense to protect it now.

I did not plan to expose the ports outside of the swarm (internal use only). E.g. even “backups” would be performed from within the swarm itself. Admin UI access would be needed.

Overall, I was trying to limit the bootstrapping and maintenance factor. Docker Swarm’s secrets feature is too hard to pass up; therefore, having CockroachDB run within the swarm makes total sense - especially to protect the certs from unauthorized access (swarm stores secrets encrypted in the raft - as long as you use the autolock feature to protect them, they cannot be accessed).

I think I’ll fall back to the original link I supplied for now. It’s documented and should work without fail. If I have time, I can setup a test cluster to beat on.

Thank you!