Actual usable storage for user data in a multi-DC setup

Setup:
6 nodes with 2 nodes in each DC, of size 1TB.
Replica count 3, with restriction that 1 copy of replica is present in each DC.

Question: For a node N1 of size 1TB, how much storage should be provisioned for the user data?

Considering the fact that,

  1. the other node N2 in the same DC might go down, and data from N2 will be migrated to N1.
  2. data in N2(which is going down) might be in compressed form & would required additional storage in N1 for the rebalancing.

As per my understanding, a rough guess considering point(1) above will be 50% of 1TB, but again considering (2), it will be less than 50%.

Hi Ramshankar

In your example you have 6 nodes spread evenly between 3 data centres and each data centre must always have one complete set of the data.

Given these requirements we would advise that each node would have to be 1TB, this allows for one node to be lost but still all data centres to continue to hold a full copy of the data.

To answer how much of the 1TB is consumed by user data is more complex as it is dependent on a number of factors including number and size of indexes, use of compression, data types, etc.