Decommissioned Nodes Cycling between Dead and Decommissioned

We’ve noticed this situation in our cluster twice now. The most recent time being right after an automatic kubernetes upgrade. We run a 3 region cockroach cluster on kubernetes with 7 nodes total and this region hasn’t been part of the cluster for some time.

The 3 nodes in us-west2 (n13, n15, n16) will flip back and forth between “Dead Nodes” and “Decommissioned Nodes” every few minutes or so. Last time it was resolved by cycling through the nodes in the cluster one at a time until what looked like the “bad” node was cycled and the behavior stopped. Would be nice to get a concrete answer as to why this is happening.

Hey @dmcqueen,

It’d be helpful to get the output of a cockroach debug zip.

This definitely seems strange but there’s not enough info here to understand what might be causing the issue. Do you have a set of steps I can take to reproduce the issue?

Thanks!