Remove decommissioned nodes that don't exist anymore

Hi there,

I have two decommissioned nodes, one with an IP that is no longer available for use (old server) and one duplicate (same IP and port of a currently live node). The quit command obviously doesn’t help. Any way to remove decommissioned nodes manually, from the database or from files on the disk or by an extra command?

Thanks in advance

Hi @kresic.ivan,

What version of CockroachDB are you running? If you’re on 2.0, decommissioning nodes should remove them from the web ui and the cockroach node responses. For both cases, I’d suggest following our docs on decommissioning dead nodes.

If you have any trouble, please let us know here.

Best,
Jesse

I’m not sure about OP, but I have the same problem with v2.0.3.

I destroyed and re-created each of the 3 CockroachDB servers, one by one, while allowing the new one (with a new IP address) to connect to the cluster before moving on to the next.

I was left with 3 dead nodes, so I followed your guide to decommission them.

Now I have 3 “decommissioned” nodes in the dashboard and no way to remove them since, as OP pointed out, they no longer exist so the “quit” command is unhelpful.

Also, I’m not sure if this is related, but it seems wrong. It says “decommissioned since” and gives the time they became dead, which was actually yesterday for 2 of them, even though I only decommissioned them a few minutes ago.

1 Like

Hi @jazoom,

thanks for revisiting the issue, I forgot about it. I simply gave up, since no harm is done, it’s just a bit annoying. I have two decommissioned nodes, one decommissioned 3, and the other one 4 months ago. Any instructions for manual removal, since the quit command does not apply, would be helpful and appreciated @jesse.

Thanks

1 Like

You can decommission a node that is not alive any more.

To do this use the command:

cockroach node decommission <nodeid> <nodeid> <nodeid>...

The cockroach node decommission command can do the work by connecting to any of the remaining nodes. Use --host and --port.

Does this help?

Hi @knz,

not really, we wan’t to remove already decommissioned nodes from the list. I, for example, have 2 decommissioned nodes, one using a duplicate IP and port of a currently live node, and the other using an old IP, so I cannot connect to those nodes anymore. It’s just an aesthetic issue concerning the dashboard.

2 Likes

Ivan,

Maybe there was a misunderstanding. I believe that if the node still appears in the list in the UI, that means the node is considered as dead (terminated) but not decommissionned.

The word “decommission” does not mean “stop the node” instead it means “remove the node from the list of nodes”. It is possible to decommission a node that is already stopped. Is it not what you want?

Hi @knz,

I literally want to remove nodes from the UI list that are listed in the “decommissioned nodes” section. They are not listed as “dead”, but rather “decommissioned”. I just don’t want to see them in the list anymore for they are unavailable an non-existing for a long time.

1 Like

@kresic.ivan, we did make a change to remove decommissioned and dead nodes from timeseries graphs: https://github.com/cockroachdb/cockroach/issues/23110. And it looks like we intended to remove decommissioned and dead nodes from the nodes list page as well: https://github.com/cockroachdb/cockroach/issues/20639. However, I can’t understand if that work actually got done. From your experience, it seems like it didn’t.

@tschottdorf, @marc, do either of you know whether there’s a way to get dead and decommissioned nodes to stop appearing on the nodes list page?

Exactly as @kresic.ivan says. And I guess since they’re in the UI they’re also still in the database somewhere.

As far as I can tell, they’re not causing any trouble, but it’s silly to have them there forever and not be able to do anything about it.

1 Like

All right now I understand better. Thanks for explaining.

There was a bug about the display, which I think I recently fixed: https://github.com/cockroachdb/cockroach/pull/26821. This will be available in crdb 2.1, hopefully you can test it in the July 30 alpha release.

Cheers

3 Likes

Am on v2.1.1, still see obliterated hosts under “Decommissioned Nodes”, one goes back 5 days.
Used and did not hang: cockroach node decommission <node_nbr>.
node status went from is_available=is_live=false to not showing up in the listing and moving from Live Nodes to Decommissioned Nodes.

Hi Timothy,
thank you for your inquiry. For now decommissioned node will indeed remain in the UI, albeit just on that one screen. You can discuss this feature further here: https://github.com/cockroachdb/cockroach/issues/24636

1 Like

To follow up here with the same comment I made on the above-linked issue: that issue is strictly related to the display of nodes which have been decommissioned but the cluster still remembers. I have opened a new issue for the suggestion to let the cluster completely forget some nodes: https://github.com/cockroachdb/cockroach/issues/33542

I am facing same Problem on a cockraoch cluster which I use under DC/OS cluster.

+----+-------------------+--------+---------------------+---------------------+---------+------------------+-----------------------+--------+--------------------+------------------------+------------+-----------+-------------+--------------+--------------+-------------------+--------------------+-------------+
| id | address | build | updated_at | started_at | is_live | replicas_leaders | replicas_leaseholders | ranges | ranges_unavailable | ranges_underreplicated | live_bytes | key_bytes | value_bytes | intent_bytes | system_bytes | gossiped_replicas | is_decommissioning | is_draining |
+----+-------------------+--------+---------------------+---------------------+---------+------------------+-----------------------+--------+--------------------+------------------------+------------+-----------+-------------+--------------+--------------+-------------------+--------------------+-------------+
| 1 | 10.46.24.38:26257 | v2.0.7 | 2019-11-10 15:53:09 | 2019-11-05 13:51:50 | false | 67 | 67 | 277 | 0 | 62 | 9180582124 | 101507003 | 9079461165 | 0 | 321398 | 0 | true | true |
| 2 | 10.46.24.32:26257 | v2.0.7 | 2019-11-12 10:01:23 | 2019-11-03 01:36:50 | true | 54 | 54 | 277 | 0 | 0 | 9191030118 | 102172223 | 9089443619 | 0 | 434627 | 278 | false | false |
| 3 | 10.46.24.18:26257 | v2.0.7 | 2019-11-10 15:14:21 | 2019-11-05 09:25:50 | false | 57 | 57 | 277 | 0 | 0 | 9167215758 | 101497475 | 9066035078 | 0 | 265050 | 0 | true | true |
| 4 | 10.46.24.24:26257 | v2.0.7 | 2019-11-12 10:01:29 | 2019-11-11 14:45:05 | true | 60 | 59 | 277 | 0 | 0 | 9191091750 | 102172283 | 9089505369 | 0 | 434633 | 278 | false | false |
| 5 | 10.46.24.34:26257 | v2.0.7 | 2019-11-12 10:01:27 | 2019-11-10 17:41:37 | true | 55 | 54 | 277 | 0 | 0 | 9191071206 | 102172259 | 9089484779 | 0 | 434633 | 278 | false | false |
| 6 | 10.46.24.18:26257 | v2.0.7 | 2019-11-12 10:01:26 | 2019-11-11 09:26:36 | true | 54 | 54 | 277 | 0 | 0 | 9191050662 | 102172235 | 9089464187 | 0 | 434627 | 278 | false | false |
| 7 | 10.46.24.38:26257 | v2.0.7 | 2019-11-12 10:01:21 | 2019-11-12 08:46:40 | true | 54 | 50 | 277 | 0 | 0 | 9191009574 | 102172175 | 9089422981 | 0 | 434627 | 278 | false | false |
+----+-------------------+--------+---------------------+---------------------+---------+------------------+-----------------------+--------+--------------------+------------------------+------------+-----------+-------------+--------------+--------------+-------------------+--------------------+-------------+
(7 rows)
[root@thor-eu2-master-1 centos]#

I am doing upgrade of dcos nodes and todo this successfully, I need to replace nodes. I keep same IPs. When I deployed first master node all went well. but when I tryed second node. I see Master Node Poststart Checks failing due to ‘underreplicated ranges’ error.

I check zone configuration looks : replica factor is 5.
when I checked, I saw that the dcos-check used 127.0.0.1:8090/_status/nodes to get stats about nodes. but it include dead nodes. in my case one of dead/decommissioned node have under-replicated range ( node 1 - 62 ) but as I am using same IP as node 7. I can not recommission it. and decoommission it correctly.

any best approach to fix this ?

Hey @azzeddine.faik

Please see my response on your other forum thread here in order to continue troubleshooting your specific issue. Please update that thread with your findings.

Cheers,
Ricardo