Thank you for the answer! While data corruption may happen unfrequently, it is important for me to make sure CRDB is covering all their bases. Trust, but verify.
This “scrub” functionality does not seem to be talked anywhere in the wiki. Is the checksum also verified every time a page is read from disk?
When you say “The node will crash”- does that mean that the whole data folder for that node will be deleted? Or will the node restart with the ranges that are corrupted being deleted? Or will be the node be ejected from the cluster? I would imagine that k8s would not just automatically bring the node back online with corrupt data?
And of course, I do not expect CRDB do be able to recover from a loss of quorum. Nor do I wish to run CRDB on degraded hardware. Nor do I expect multiple simultaneous disk corruptions. But failures can happen anywhere, and at any time, and being prepared is important.
For your consideration, I think it would be good for the documentation to have at least one paragraph talking about scrubbing. I would argue that disk/file corruption is an item pretty high in the minds of sysadmins, and help put people like me at ease.