Upper bound on number of ranges and increasing range_max_bytes

What’s the upper bound on the number of ranges per node until more nodes should be added or the range_max_bytes should be increased? No need for an exact number but more of just an order of magnitude (100, 1000, 10000, etc). I assume there is some decrease in performance because there’s a raft group for each range and having tons of them per node would increase chatter and CPU time, right?

Also, I saw in 2.1 range merging was added, does that mean it’s safe to raise the range_max_bytes and it should merge ranges that are now under the new limit?

  1. with default settings we have seen successful clusters with x*10.000 ranges per node. Meanwhile we are also hearing reports that things break down north of 100.000 ranges per node. YMMV.

  2. you are right that more ranges conceptually cause more “chatter”. However since 1.1 and 2.0 there are algorithms to merge the network chatter across active ranges (“heartbeat coalescing”), and remove the chatter for idle ranges (“range quiescence”). So the impact of many ranges is not as bad as intuition would allow.

    That said, there is still some per-range chatter and this is why very large numbers of ranges per node, even if the ranges are idle, still have an adverse effect on performance as noted in the previous point.

  3. we do not recommend tweaking the max range size too much, as this is currently untested. We know for sure that very small sizes cause extremely bad behavior, and we also know that increasing the size from the defaults also increases overall RAM usage and network throughput.

  4. range merging was added in 2.1 but not enabled by default. A correctness bug prevents correct operation of range merges currently. We will not be able to advise you to enable this feature until a later 2.1.x release.

Does this help?

Thanks for the excellent info!

Regarding 2: What’s the best way for me to measure the “chatter”? Not necessarily to understand exactly but just monitor it on an order-of-magnitude level so I can understand when I’m having perf issues if its high or low and generally the state of it (compared to everything else).

There are various range queues where every replica on a nodes lands periodically. You can monitor the activity on that queue using the exported prometheus metrics. I think there is also a debug screen in the crdb web UI which shows this information.