Questions on multi-datacenter deployment architecture

Hi here,

I get a couple questions for the Multi-DC deployment architecture. Some of them might not make sense as CockroachDB is new to me:-)

  1. Geo-partition of dataset in the same logic table.
    Assuming we have a user table keyed by user_id, roughly half of the users are homed in west coast, and half in east coast. so let’s say to strike a better balance between latency and availability, we want to have two topologies , west topology (us-west1, us-west2, us-central), east topology (us-east1, us-east-2, us-central), to store the data homed in the two different coasts respectively. The application can afford to pass in topology hints to let cockroachdb know which topology the request should do. I am wondering if cockroachdb provide similar mechanism? Of course, one way I can imagine is to create two different cockroachdb clusters with the same schema but different default replication zones, but not sure how much the extra management cost of multiple clusters would be.

  2. region affinity of leader
    Say with replication zone (us-west1, us-west2, us-central), when the nodes in us-west1 have issues, we want to control the consensus leader to have region affinity, ie. the new leader should be us-west2 instead of us-central. Not sure if coachroachdb provide any knob to control this?

  3. Dynamically change replication zone by adding/removing datacenters
    Consensus protocols like RAFT support membership changes. So at high level, it’s reasonable to add/remove data centers to an existing replication zone. say if we want to migrate from us-west1 to us-west3, we can do membership change sequence (us-west1, us-west2, us-central) -> (us-west1, us-west2, us-west3, us-central) -> (us-west2, us-west3, us-central).

  4. Support different replica roles?
    We’d like to support different types of replica roles. E.g an observer replica may just tail the latest write logs without participating quorum vote. And ideally the observe can tail the logs from non-leader replicas for performance reasons.

I am curious to know if some of the above are already supported. And in case that new development is needed, I’d appreciate your assessment on the complexity/feasibility.


1 Like

Yawei Li,

(Cockroach Labs employee here)

  1. CockroachDB already has the ability to control the replication and location of a table with zone configs. We plan to extend this to give users the same controls but on a row-level granularity and have recently started figuring out how this would work. We call this feature partitioning.

This link is a preliminary document, so there’s a lot of details missing about how the feature will look to users. I’m happy to answer any specific questions it leaves unaddressed. The coding work is scheduled for Q2/Q3 2017.

I’m not the expert on your questions 2, 3, 4, so if I’m wrong, one of my colleagues will correct me.

  1. Work has started on moving the leaseholder for a given range to follow traffic patterns, see Leaseholder Locality. This should address the performance issue you mention. I don’t know of any work to allow for user configuration of leaseholder region.

  2. This can be accomplished with zone configs. You would bring up the us-west3 nodes, use zone configs to move data onto them from the us-west1 nodes, wait for all the data to move, then turn down us-west1.

  3. We’ve had a number of requests for non-voting replicas, but it hasn’t been scheduled yet. See our Product Roadmap.

Thanks for your interest in CockroachDB!


To elaborate a little bit on what @dan wrote.

We have two concepts that deal with replica leaders that perform different functions. We do have a RAFT consensus leader and we offer no control as to which replica is the current leader. We have no plans to do so anytime soon.

However, we do have a read lease holder, which allows non-consensus reads to occur, and it’s usually, but not always aligned with the raft leader. This lease holder can quickly jump from one replica to another. Our next beta will include a heuristic that tries to automatically move the lease holder to the replica that is receiving the most requests. However, again, we have no plans for manual adjustment of this lease holder.

Yes, you can! You can always add a new node to the system (or collection of nodes).

When trying to remove a node, there are a few different options:
1 - remove the old nodes, one at a time, and wait for the system to have 0 under-replicated ranges after each removal
2 - set the default zone config to have a negative requirement for the old zone. Once all the replicas have moved to the new area, disconnect the old nodes. Again, one at a time would be a bit safer, but if they have no replicas on them, it should be fine.

Just a note on removing a node: If you stop the cockroach process (ctrl-c or send a kill signal), it will enter “draining” mode. This mode prevents all incoming replica transfer, lease transfers, raft leadership elections and tells the other nodes to start adding new replicas to replace the ones that will be lost. In the future, we are considering adding the ability for the node itself to actively replicate and remove all of its replicas.

If you need any more info, let me know!