Auto heal. Node. Follower of range

Setup: more than 3 CockroachDB nodes.
Range: lease holder – follower (sync) – follower (async)

How does the auto heal work:

  • Range specific?
  • crdb node wide?

crdb node wide:
If crdb node gets broken, then whole crdb is recreated automatically?
With data replication factor 3, CRDB can tolerate one broken CRDB node at the time. (Quorum.)

Range specific:
If async follower of a range gets in an inconsistent state, then the async follower of the range gets fixed? (Raft)
Whole CRDB node will not be recreated in such a case?
Correct?

Hi @roachman,

I’m not sure what you mean by sync & async followers.

For crdb node wide: If you have more than 3 nodes, then the replicas on the node that are no longer available will be “dispersed” onto the other available nodes (assuming that replica does not already exist on the new node).

For range specific:
Can you explain how the range will get to an “inconsistent state”?

Thanks,
Matt

Sync and async:
Ranges are handled in semi-synchronous way:
"As soon as one follower has appended the write to its Raft log (and thus a majority of replicas agree based on identical Raft logs), it notifies the leader and the write is committed to the key-values on the agreeing replicas. "
Leaseholder and one follower (sync) is enough.
https://www.cockroachlabs.com/docs/stable/architecture/reads-and-writes-overview.html

Ah now I understand what you’re referring to.

Ok.

In crash situations etc., async “follower” may end up in inconsistent state.
I think following talks about such:


28 min. Raft.
And, very quickly searched from internet,
“Crashes can result in inconsistencies”

43 min 30 s

Got it.

My previous answer will still apply, as long as the raft group can maintain quorum, the inconsistent replica will get “fixed”.

I’m unable to think of a scenario where only a replica would be inconsistent without affecting the whole node.