Pebble: corrupt manifest - issue

Is there a way to recreate or fix a MANIFEST file, one of the nodes of our cluster crashed because of a lack of available disk space, and as a result, the manifest file got corrupted and we’re not able to start the node in question back up.

My question is there a way to get around this we don’t mind a little loss of data especially the latest records but we would like to recover the unavailable ranges.

Hello! And welcome to the forum.

The short answer is No, unfortunately we do not offer a way to fix a corrupt manifest file. The recommended way forward is to decommission the failed node and create a new node and add it to the cluster to allow the cluster to heal.

With only a single node failure, there shouldn’t be unavailable ranges, so adding a new node should be able to restore the cluster. If replacing the node still results in unavailable ranges, perhaps there is something else going on. Are you seeing any other failures beyond losing a single node?

WARNING: If you are encountering MANIFEST corruption, do not perform the steps outlined in this post. Instead, reach out to Cockroach Labs support. Truncating a MANIFEST is unsafe and may allow corruption to spread to other nodes or make any manual recovery of a store impossible.—Cockroach Labs


Hello @nathastilwell ,

Thank you for your reply, the problem is we had another node down, we had a big job (altering the primary index of a big Table) and some of the nodes ran out of disk space and crashed.
So we needed to have this node back up to recover the unavailable ranges.

This is the solution I was able to come up with for future reference:

1 I checked the manifest file to find the corruption place:

# cockroach debug pebble manifest check ./MANIFEST-5100654 
./MANIFEST-5100654: offset: 4455759 err: pebble: corrupt manifest

2 Since we don’t mind if lost a few transactions, I created a new file truncated from the old manifest file with the problem lines left out

# head MANIFEST-5100654 -c +4455759 > ./MANIFEST-5100654_truncked

3 **I Check the truncated file and it checked out **

# cockroach debug pebble manifest check ./MANIFEST-5100654_truncked 
OK

4 I Replaced the old file with the new one, started the node and it worked.


WARNING: If you are encountering MANIFEST corruption, do not perform the steps outlined in this post. Instead, reach out to Cockroach Labs support. Truncating a MANIFEST is unsafe and may allow corruption to spread to other nodes or make any manual recovery of a store impossible.—Cockroach Labs

Hi @eljeilany — Manually truncating the MANIFEST is dangerous. The consequence is not necessarily just a few lost transactions.

If you have still have the pre-truncation MANIFEST available, I’d like to take a look. If you’re able to provide it, please email it to me (jackson @ cockroachlabs.com), alongside with any of the node logs if available.

Hi Jackson,

I shot you the file by mail. :slight_smile: