Playing with DC HA/Outages

Hi Guys,

I’m doing some tests with a cockroachdb cluster and i have a little question, let me ilustrate my scenario: my cluster are distributed on three datacenters each one with 3 nodes, ex:

obs: cockroachdb 2.0.2, replication: 3:

dc1: id:one
vm1: locality: one
vm2: locality: one
vm3: locality: one

dc2: id: two
vm1: locality: two
vm2: locality: two
vm3: locality: two

dc3: id: three
vm1: locality: three
vm2: locality: three
vm3: locality: three

Introduction:

I’m blocking the access on the “one’s” and everything goes ok on “'two’s” and “'three’s”, at this moment i just see “under-replicated” growing but “no problems”, then in a second moment i comeback “'one’s” to the cluster and as expected everything remains ok. \o/

The “Problem”:

After execute the steps described above i wait the under-replicated going to 0 and then i drop/block the connectivit to the “two’s” instances, seconds after do that the cluster enter on a strange state become unavailable and eventually i can se a lot of under-replicated and unavailable ranges, is this a expected behaviour?

Im doing something wrong ? i dont got anything ?

obs: I think this occur because the range leases and replicas are on the “stack” that i drop/kill, is possible to solve this ? how can i block a range-lease and a replica to live on the same region?

Tks,
Andre

maybe with num_replicas = 5 ? :frowning:

I’m doing some tests and i think that i resolved it with: server.time_until_store_dead :joy:
(and i think that i will need to be playing with this attr when i really need to drop/remove a node/instance)

would be cool if the cluster just migrate “under-replicated” to the nodes on the same locality

Hey Andre,

Our metric for under-replication is currently based on the number of replicas, not the location of those replicas - so what’s probably happened is that we hadn’t had a sufficient amount of time to rebalance replicas from DC2 back to DC1, even though we had three present across the entire cluster. The solution here would be to flag when we’re “under-diversified” - I created an enhancement request for this here: https://github.com/cockroachdb/cockroach/issues/26757, feel free to add comments.

Hope that helps - feel free to comment here or on the github issue.

1 Like