How to ensure GC happens after a failure?

Currently I’ve got an index build that fails (see #67487, #67523). Unfortunately the GC also failed because I decommissioned a node, it got this error:

attempting to GC indexes: clearing index 8: failed to connect to n16 at <node addr>: initial connection heartbeat failed: rpc error: code = PermissionDenied desc = n16 was permanently removed from the cluster at 2021-07-25 02:54:14.58800097 +0000 UTC; it is not allowed to rejoin the cluster

Do I have to do anything to ensure the relevant ranges/data ultimately get GCed? Will they be GCed less aggressively, or not at all?

Hi Dan!

We would expect this job to be retried eventually. However, we had some bugs around handling of decommissioned nodes that were fixed in v21.1.6, so it might be worth upgrading and see if that fixes the error.

Doing a restart of all nodes might also make the error go away for now (if it’s caused by stale range descriptor caches that aren’t refreshed on error), but it’s hard to say exactly which bug you’re hitting, so it may or may not help.

1 Like