Replication Lag/Delay

Hey there,

Am newone here -

Did anybody face any issue like replication delay/lag around the nodes?

As clients may WRITE on any node what comes if a node-x is running delayed and next WRITE goes to that. Saying previous WRITE was not executed due to the replication-lag on node-x and another WRITE has been executed by client on delayed node - Will there be any inconsistency issue? How CockRoach handle such conflicts.

We’ve seen such cases in RDBMS running in Master-Master topologies and delayed nodes?

Many thanks

Hey there Saten,

As opposed to some other RDBMS systems which do hot-hot topologies by copying data between nodes potentially after acknowledging a transaction, Cockroach requires raft consensus between all nodes which contain the ranges of data affected before acknowledging the write to the client, so there is no opportunity for “replication lag” to come into play.

This means that by the time your client completes it’s write on node A, the write has already been preformed and committed on a quorum of other nodes which contain replicas of that range of data. Since a quorum is required for all writes, even if the data from the first write is only replicated to a subset of the range replicas (say 2 out of 3), if you were to try a conflicting write operation on the 3rd node missing the prior writes, the transaction would be aborted once the nodes establish raft consensus and detect the conflict.

If there are connection issues or latency between your nodes, this may surface as an increased transaction latency, or a hung or aborted transaction, depending on the nature of the problem.

Some more info about the consistency guarantees can be found in the docs.

Thanks Taylor,

Would love to test it to get me ensured on this? Can we perform that somehow? Do we have any config which may enable the “delayed-replication” for time being so that I would test WRITING on 3rd node using SQL query same as described here.

Can I set a configuration running delayed replication on a node among the cluster in CockroachDB somehow?

Thanks
Saten.

You may be interested in reading about Jepsen testing which tests for consistency in various kinds of failure scenarios (node death, network partitions, clock offsets, etc).

1 Like