A node in CRDB can hold many (100k+) ranges so it seems common that for any two nodes in a CRDB cluster they will both be members of the same raft range. Thus the communication overhead for replication is at least (1/hearbeat_interval_ms)xNumberOfNodes (i.e. the sum over all nodes of (1/hearbeat_interval_ms)x(NumberOfNodes-1)). And this assumes the multiraft batching - otherwise the overhead would be (1/hearbeat_interval_ms)xNumberOfRanges?
How does the auto-rebalancing algorithm work? Is there a preference for trying to reduce the communication overhead mentioned above? What tradeoffs are taken into consideration for rebalancing optimization?
Couldn’t it be possible that a transaction that spans many ranges could end up creating O(NumberOfNodes) overhead due to the first question? It seems possible that I could be interested in some dataset such that the raft leaders that owns that dataset includes every node in the cluster?
What’s the fundamental scaling bottleneck that keeps CRDB from scaling past 64 nodes? Is it the gossip mechanism (seems unlikely)? Is it multi-raft with O(NumberOfNodes) overhead (seems unlikely)? Is it the possibility of pathological behavior from my hypothetical third point?