Replication zone config format semantics

I’m reading through the Replication Zone documentation, and am a bit confused by the zone config format semantics. Specifically in the attributes declarations for zone configurations. All the examples use 3 attributes, and this seems to make sense for the availability zone examples (replicate across us-west-1a, us-east-1a, etc), but the SSD example throws me off:

replicas:
- attrs: [ssd]
- attrs: [ssd]
- attrs: [ssd]

Is there a need for ssd to be stated as an attribute 3 times? Moreover, is there some magic in having 3 replica attributes? Is it possible to list more than 3 attributes?


Also, small housekeeping note, the examples on that page for starting nodes with store attributes don’t work. They all declare the store as --store=path=node1-data,attr=ssd, but it should be attrs=, not attr=.

Thanks for catching that error, @twrobel. I’ll fix that shortly.

In regards to that ssd example, the example assumes that:

  • there are 5 nodes in the cluster
  • 3 of the nodes have ssd storage
  • you want to replicate 3-ways by default
  • you want each replica to live on a node with ssd storage

The example shows how you would achieve those replication goas: Each --attrs field equals one replica, so for each replica, you specify that it should live on a node with ssd storage.

An upcoming change to the zone config format and logic will make it unnecessary to specify ssd 3 times, but for now, that’s what’s required. Does that help?

Does that mean that if I had a larger cluster and wanted to replicate data across 5 nodes, I would need to list the --attrs field 5 times? Or does “3-way” replication have a different meaning to the number of physical nodes that the data is replicated to?

Yes, if you wanted 5 replicas across 5 nodes, you would need to list the --attrs field 5 times. I’m pretty sure we never allow more than 1 replica per node. But @Bram can tell us for sure.