Use of HLC in CRDB

Hi all

I am reading the crdb docs about the use of HLC vs. Spanner’s TrueTime. In spanner’s TrueTime, since it is a global wall clock time with a bound (7ms) on the clock drift, I can see how this is used to provide global timestamp ordering, as used in implementing transactions, snapshot isolation etc.

In crdb, since HLC is a wall clock + vector clock, but this is within a given raft group (range shard), I am a bit confused how you get global timestamp ordering across range shards?

Assuming you use NTP to synchronize server clocks, the drift could as much as 100ms.

I am wondering how crdb gets global timestamp ordering across shards using these.

First of all, our HLC is not “within a given Raft group”. The HLC maintained per node. Which documentation gave you that idea? We should fix it.

More importantly, the HLC is generally not required for correctness in CockroachDB; it’s more an optimization - keeping clocks tight minimizes transaction restarts.
See Spencer’s blog post for a contrast with TrueTime: https://www.cockroachlabs.com/blog/living-without-atomic-clocks/

TL;DR we don’t guarantee “global timestamp ordering across shards”; only transactions that interact somehow are guaranteed monotonic timestamp (so we’re serializable whereas Spanner is linearizable).

Thanks @andrei, I dont think any doc mentioned the HLC per raft group. I imagined the vector clock needs to be part of the raft log entry and hence assumed this would be per raft shard. But looks like this is per node and is not part of the raft log entries.

@ashwinmurthy I stumbled on this discussion a little late but want to clarify that the HLCs used in CockroachDB are a combination of a wall clock and a single logical clock, not a wall clock and a vector clock. This, of course, means that our clocks are subject to a loss of causal independence information which vector clocks avoid. However, using a single logical counter also provides a significant reduction in overhead, especially in larger deployments.

We have a small section in our design docs under Lock-Free Distributed Transactions that complements Spencer’s blog post.

Hi! In that link it says:

It allows us to track causality for related events similar to vector clocks, but with less overhead

Can you elaborate on what kind of overhead you saw with vector clocks, and why it was prohibitive? Really interesting stuff to read about, thanks!

-david

PS: also posted this question to https://gitter.im/cockroachdb/cockroach?at=5f1518d5b2dad248b6c8b088 – sorry for the dupe, not sure which is more active.

The overhead being referred to is almost certainly space. @nathan can chime in if he disagrees.

not sure which is more active.

Slack is almost certain to be the most active channel to engage with this community. Gitter is mostly dead. The forum is more async and less watched.

Hope you’re not yet sick of me on this topic :stuck_out_tongue:. I’ve had periods of strong obsession with systems providing high availability and stronger than eventual semantics. I find this topic area near and dear to my heart. Vector clocks and HLCs are both just tools in the toolkit of a system builder.

My take on HLCs and their power as a causality propagation tool is that with some basic synchrony assumptions (namely clocks are very loosely synchronized) you can propagate causality implicitly without the need for communication.

Vector clocks provide a tool for finer granularity of causality propagation at the cost of space (and other less tangible things like unwieldiness relative to a scalar). The two are not opposed. A vector clock could be a vector of logical clocks or it could be a vector of HLCs. A vector of HLCs provide the best of both worlds but open the door to compression due to the convergent nature of HLCs (as utilized in https://www.cs.princeton.edu/~wlloyd/papers/occult-nsdi17.pdf).