What happens when one of multiple disks used with the --store flag breaks


(Reinhard Fischer) #1

Hi,

I was looking for a more detailed explanation of what would happen, if one of more disks, each one declared with a separate --store= flag, breaks. Does the node itself rewrite the content onto the other disks (if there is enough space), is the whole node broken?

(If the answer is already answered somewhere else, please point me to it, I just haven’t found it then.)

In Storage on multiple disks this question was answered:

If a disk dies, can I restore the data to a replacement disk, and re-start CDB?

Yes, or as another reply suggested you can also wipe everything and create a new node; it will repopulate automatically from the data on the other nodes.

The reason I’d like to know that is that would have a setup with approx. 6 SSDs, likely SATA, per node and need to know if it is better to create a RAID out of the disks (probably RAID 0) or to use them separately with --store flags. The RAID 0 might be more performant, but I was assuming that it would be less effort to replace one broken SSD out of six than replace the disk, rebuild the RAID and re-set up the node. (I know RAID 0 is recommended, but write performance is probably not our bottleneck)

Thanks in advance for any help on this,
Reinhard


(Tim O'Brien) #2

Hey @enterldestodes, sorry for the delay getting back to you - I’ll get you an answer today.


(Ron Arévalo) #3

Hey @enterldestodes,

Regarding your question:

If you’re asking whether the replicas would rebalance to another store on the same node, then the answer would be that it would not. Nodes will rebalance replicas across other nodes not across stores.

Does this make sense?

Thanks,

Ron


(Tim O'Brien) #4

To flesh it out a bit, we’d like to match your initial expectations and have each node utilize and balance replicas across stores, allowing for individual disk failures without losing the node. Currently this doesn’t work in three node clusters - the blocking issue is here if you’d like to track it.

So in your use case, it’s likely best to proceed with the RAID setup - as it stands, adding additional stores won’t reliably allow you to tolerate a disk failure while keeping the rest of the replicas on the node active.

Hope that’s helpful, and again apologies for the delay.


(Reinhard Fischer) #5

Thanks both of you for the answers! For me it also looks now as if for my setup RAID should make more sense.