We haven’t had a chance to do much testing with Amazon’s time sync service yet, but it looks like a good solution for applications deployed to that platform. One warning, though: they do leap second smearing, which is good but non-standard. If any of your nodes use time sync sources that do leap-second smearing, they all must (and they must all smear the leap second the same way. I haven’t seen any official confirmation of whether Google and Amazon’s leap second smearing is compatible with each other).
In general, better clock synchronization can improve your tail latencies (by reducing retries) but not mean/median performance. To realize these gains, you’ll need to set the
--max-offset flag to something smaller than the default of 500ms. I don’t know what the best value for amazon time sync would be - my recommendation would be to run a test cluster for a while to collect data (we’re not graphing that yet, so the results won’t be visible in a current build but it’ll be collecting data that can be displayed once the UI is updated). One nice thing about truetime is that it explicitly models the uncertainty in the clocks, so that the delay can be adjusted dynamically based on the actual conditions instead of choosing a fixed max-offset.
On our AWS test clusters (just using standard NTP with external sources, not AWS time sync), we typically see offsets in the single-digit milliseconds with spikes up to ~20ms. The default max-offset of 500ms is pretty conservative.
If the max clock offset gets small enough, you could switch to linearizable mode (which adds the max clock offset to all your read latencies). However, this mode has not yet been tested in practice and I wouldn’t yet recommend using it.