Few powerful nodes vs many less powerful nodes

In general, are there circumstances where having fewer (eg 3) powerful nodes is better or worse than having many (7 - 9+) less powerful nodes. Low powered servers would still meet minimum requirements of 4 GB - 8 GB of RAM, while high powered servers could have 32 GB - 96 GB.

For example:

  1. Many inserts and updates vs mostly selects?
  2. More complex queries (joins, etc) vs mostly simple selects from one table?
  3. Low latency network vs high latency or cross datacenter network?
  4. Many connecting clients (100s) vs dozens?

With cloud hosting, the same budget can go either way.

Thanks.

Hi Dave,

As mentioned in our recommended production settings, you’ll almost always get better performance from using a smaller number of more powerful nodes. I expect that you’d notice the biggest performance difference when running large, complex queries, but it should be better for simple workloads too.

The tradeoff (as you could probably imagine) is that you’ll have somewhat less resiliency to failures than if you had more nodes, but you’d have to lose two nodes out of the three to have any problems with it.

Thanks Alex.

I wasn’t aware that a 3 node cluster could lose 2 nodes and still run. The docs you linked to say that an even number of nodes doesn’t give added resiliency and that “to survive two simultaneous failures, you must have five replicas.”

I suppose you mean that a 3 node cluster can withstand 2 non-simultaneous failures, but now I wonder why we don’t get a split brain situation if one of the two remaining nodes fail. Would this not be similar to having a 4 node cluster where 2 nodes fail and now the cluster doesn’t have a quorum? My experience is with XtraDB cluster and those docs say that split brain occurs when half the nodes go down, even if it’s not simultaneous.

If I understand you correctly, although the docs lean towards using an odd number of nodes to survive multiple simultaneous failures, using an even number of nodes still provides resiliency as long as the failures don’t happen simultaneously.

For example, in a 4 node cluster 1 nodes fails, then hours later a second node fails, then later a third node fails. It sounds like CRDB would gracefully make sure that all replicas are available on whatever nodes remain provided that there is time to do so between failures.

Sorry for the misunderstanding, I simply meant that you wouldn’t have any problems until you lost 2 nodes, not that you wouldn’t have problems if you do lose a second node. The cluster will work fine after 1 node outage, but not if 2 of the 3 are down at the same time.

I suppose you mean that a 3 node cluster can withstand 2 non-simultaneous failures, but now I wonder why we don’t get a split brain situation if one of the two remaining nodes fail. Would this not be similar to having a 4 node cluster where 2 nodes fail and now the cluster doesn’t have a quorum?

Regardless of when the nodes fail, if your cluster ever goes down from multiple nodes to just a single node, that node won’t be be able to serve any requests (because other nodes going down is indistinguishable from a partition to the remaining node).

If I understand you correctly, although the docs lean towards using an odd number of nodes to survive multiple simultaneous failures, using an even number of nodes still provides resiliency as long as the failures don’t happen simultaneously.

Correct.