Timestamp format in abort message

I’m debugging some aborts that happen with an error message like this:

restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1635298418.171191194,0 too old; wrote at 1635298418.293532763,1: "sql txn" meta={id=6611743f pri=0.01301086 epo=0 ts=1635298418.293532763,1 min=1635298418.171191194,0 seq=0} lock=true stat=PENDING rts=1635298418.171191194,0 wto=false gul=1635298418.671191194,0

From the docs about transaction retries, I assume this error message means that TxnA (intended to be serialized at 1635298418.171191194,0) tried to write a row, but TxnB (serialized later at 1635298418.293532763,1) has already written to that row and commited.

To begin with, is that reading correct?

Second, I’m interested in the fact that my errors all have timestamps of the form X,M and X,N, where the X component is the same and N>M are integers. I’m assuming that this is a representation of HLC time, with X being physical and M and N being logical. If that’s the case, is it interesting that my errors always have X the same with the conflicting transactions differing only in logical time? I don’t know how the timestamps are assigned, but this makes it seem like it isn’t just two competing clients connecting at nearly the same time.

Any insight would be appreciated!

Yes, I think so.

That’s right. You can read a bit more about our HLCs here.

I’m a little confused about this part, since the error you posted above has timestamps with two different values of X (1635298418.171191194 < 1635298418.293532763).

I think the client actually doesn’t matter because transaction timestamps are managed by the CockroachDB database nodes, rather than by the client.

Oh gosh, I had a ton of these example errors and ended up pulling one that actually didn’t have matching physical time without noticing. Sorry about that. Rest assured, I do have plenty of example specifically that show a conflict between X,0 and X,2 (always that combination it seems looking again) where X is identical:

restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1635298418.295648343,0 too old; wrote at 1635298418.295648343,2: “sql txn” meta={id=9735cf4d pri=0.02278229 epo=0 ts=1635298418.295648343,2 min=1635298418.295648343,0 seq=0} lock=true stat=PENDING rts=1635298418.295648343,0 wto=false gul=1635298418.795648343,0

1 Like

The engineering team thinks that these duplicate physical timestamps are due to something called “HLC ratchet.” CockroachDB relies on all nodes’ clock being synchronized and if a node notices that its own clock is falling too far behind the rest of cluster, it will ratchet its clock forward to match the other nodes. The node with the slow clock will be pinned at the higher physical timestamp for a while, so only the logical part is ticking.

1 Like

Interesting - thanks for looking into it. It sounds like this isn’t something I should worry about then? I had been wondering if somehow I had written something bad such that steps inside the same transaction were conflicting with each other or something.