How long does node decomissioning take?

(Rishabh Jain) #1

Hi,

I am evaluating CockroachDB and one of my main criteria is determining how long it takes to decomission a node. Where can I find this information/does anybody know?

1 Like
(Ricardo Rocha) #2

Hey @rj254,

The speed of the decommissioning process is going to depend on the amount of data that needs to be rebalanced to the other nodes. The rebalance speed is a cluster setting kv.snapshot_rebalance.max_rate, which is 8 MiB/s by default.
There is a documentation page as well where the overall process of decommissioning a node is described here. This also contains any potential reasons why the decommission process may take longer then expected.

Let me know if you have any other questions or concerns.

Cheers,
Ricardo

(Rishabh Jain) #3

Hi @rickrock

That makes sense! I’m not sure I fully understand CockroachDB architecture. If I had three replicas, and one of the nodes failed, I understand that I can continue to connect to either of the two nodes that are still up. In this scenario, if I decommission a node rather than having a node fail on me, I’m assuming the amount of data to be rebalanced should be 0 (as it is replicated), so the decomissioned node will finish serving its requests then terminate. Is this correct?

Thanks for getting back to me!

(Ricardo Rocha) #4

Hey @rj254 ,

The given example in the documention here shows the exact scenario when attempting to decommission a single node in a 3 node cluster. Essentially, a new node needs to be brought up for there to be 3 way replication of all ranges of data, otherwise the decommision process will hang.

Let me know if you have any other questions or concerns.

Cheers,
Ricardo

1 Like
(Rishabh Jain) #5

I see that makes sense. What if the process is not completed but the node is terminated before the replication completes to the fourth node?

(Ricardo Rocha) #6

Hey @rj254,

If the node’s decommission process was terminated before being complete, the ranges of data that are on the first two nodes would replicate over to the 4th node. However, during the time there would be a number of under-replicated ranges, which if if something catastrophic were to happen to one of the other two nodes and their under-replicated ranges, it could lead to data unavailability. If one of the ranges that are under-replicated happen to be a system range, this could lead to the cluster being unavailable, which may lead to needing to restore data from a previous backup.

Generally, it’s best to let the decommission process complete so as to ensure all ranges are available and replicated per replication factor in the cluster settings.

Let me know if you have any other questions.

Cheers,
Ricardo

(Rishabh Jain) #7

Hi @rickrock,

Thanks for getting back to me! Is there a way to load the system range on only a subset of nodes or prioritize that the system ranges get transferred first (sorry if this is a silly question)? Another question I had was what the consequences of setting kv.snapshot_rebalance.max_rate to be higher than 8 MiB/s was? Is it IO bound?

To recap the situation, I can start the decomissioning process but am not guaranteed that my node will stay alive while it is completing. I have a standard 3-way replication set, so I’d like the in-flight requests to finish but would then want to immediately shift all future requests to my other nodes (I believe this is possible with a config reload with HAProxy) while I bring up a new node.

Thanks!

(Ricardo Rocha) #8

Hey @rj254,

Unfortunately, there doesn’t seem to be a way to prioritize the system ranges on a node for up replication first in the decommission process. The kv.snapshot_rebalance.max_rate is described in the documentation here in the list of Cluster Settings, and is based on network resources available.

Per the documentation, the decommission process will allow any current requests to complete, and will stop new requests from going to the old node as its replicating its data to the new node. Is it possible to adjust your application to wait for the node to complete the decommission process?

Let me know if you have any other questions or concerns.

Cheers,
Ricardo

(Rishabh Jain) #9

Hi Ricardo,

Unfortunately no - the nodes are managed by another team, so there is no real guarantee that they stay alive. Previously, we have been given advance notice, but always within a short amount of time before the node actually terminated. If I understand correctly, having three complete replicas will ensure the system ranges are on all nodes, so the system ranges being deleted should not be a concern, correct?

I’m trying to understand stability and long-term planning. Another question I had was if I were to expand the cluster (bring another identical node into the mix), would the time it takes for the node to be available be dependent on that same kv.snapshot_rebalance.max_rate value? To prevent a disastrous situation, could I start expanding the cluster ahead of time and then decomission my node later?

(Ricardo Rocha) #10

Hello @rj254,

To answer your first question, you are correct. In your case with 3 nodes and a replication factor of 3, having the three complete replicas will ensure that all ranges are available in the DB. Just to be clear, no ranges are ever deleted in the decommission process. I simply wanted to stress that we recommend letting the decommission process complete normally to ensure continued operation.

The kv.snapshot_rebalance.max_rate cluster setting does control the speed for rebalancing as well, like when a new node is introduced into the cluster. The idea of expanding the cluster is a good one, as having more nodes available is always a good idea.

If leveraging HAProxy configuration, then by default it is set to check the health readiness of the nodes, and when a node is unavailable will re-route the requests as described here.

Let me know if you have any other questions.

Cheers,
Ricardo