Received! A couple observations:
- Given only 20-30mb of data, I’m surprised to see so many ranges. It looks like there are a couple hundred at least. Did you presplit a table, or is it spread out across many tables?
- It looks the cluster begins to upreplicate almost immediately after n6 is taken down, as expected.
- The logs indicate that n9 is busy applying new snapshots (new ranges) for roughly half an hour, as you described. You can see a number of "store
What’s interesting is the length of time it takes to apply snapshots: if you search through the logs for “streamed snapshot” you can see that it takes somewhere between 2ms and 1.5-1.6s to apply each. They vary in size from tens of kv pairs to thousands, and the larger the snapshot in terms of kv pairs, the longer it takes to apply.
How large is each row in your dataset? I can’t see any indication of a problem on n8, at least - the logs are consistent with a cluster trying to upreplicate a significant amount of data. The only part that doesn’t fit is the fact that it’s only 20-30MB. Has the cluster been up and running for a long time? Is there more data in the system ranges? You can check the size of the system and time series data at
The only other thing I can think of checking is the log and command commit latency on n9. What kind of disks are you using?