Is is possible to use EBS snapshots as a backup strategy for nodes running in AWS? If not, why?
In order to get an accurate backup, the snapshots would need to occur on nodes that have stopped running. This is because different nodes may flush the data to disk at different times while the nodes are running, so getting a consistent backup of the cluster would be difficult in that scenario. Stopping the nodes guarantees that you will get a consistent single backup across the entire cluster.
Bear in mind that additionally, since the data is internally replicated, you would end up with three copies (by default) of every table’s data in your backups. There are more details about how and why we built BACKUP the way we did in this blog post.