Why live node try to connect to decommission node?

I use 2.17 version.
I tested availability .
So I have many decommission node.
but in the log, I found some starange.

W190522 01:51:07.763134 17823130 vendor/google.golang.org/grpc/clientconn.go:942 Failed to dial 182.195.49.205:26257: context canceled; please retry.
W190522 01:51:08.503693 17823268 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {182.195.49.205:26257 0 }. Err :connection error: desc = “transport: Error while dialing dial tcp 182.195.49.205:26257: connect: connection refused”. Reconnecting…

Already decommission node. why try to connect?
All node trying to connect to decommission node.
How can I solve the this problem.

Hi @yeri,

Can you send us over the result of node status --decommission?

Thanks,

Ron

Hi Ron.
This is node status.

[epsvc@epkttcrd02 /square]./cockroach node status --decommission --insecure --host=****
  id |       address        | build  |            started_at            |            updated_at            | is_available | is_live | gossiped_replicas | is_decommissioning | is_draining
+----+----------------------+--------+----------------------------------+----------------------------------+--------------+---------+-------------------+--------------------+-------------+
   1 | 182.195.49.206:26257 | v2.1.6 | 2019-05-17 04:26:40.203308+00:00 | 2019-05-23 06:35:10.727496+00:00 |         true |  true   |             21364 |       false        |    false
   2 | 182.195.49.207:26257 | v2.1.6 | 2019-05-17 04:26:47.636389+00:00 | 2019-05-23 06:35:10.464325+00:00 |         true |  true   |             21365 |       false        |    false
   3 | 182.195.49.210:26257 | v2.1.6 | 2019-05-17 04:28:05.533632+00:00 | 2019-05-23 06:35:10.045242+00:00 |         true |  true   |             21365 |       false        |    false
   4 | 182.195.49.209:26257 | v2.1.6 | 2019-05-17 04:26:41.931852+00:00 | 2019-05-23 06:35:10.423769+00:00 |         true |  true   |             20847 |       false        |    false
   5 | 182.195.49.208:26257 | v2.1.6 | 2019-05-17 04:26:44.68071+00:00  | 2019-05-23 06:35:10.022847+00:00 |         true |  true   |             20853 |       false        |    false
   6 | NULL                 | NULL   | NULL                             | 2019-05-21 02:32:48.877475+00:00 |        false |  false  |              NULL |        true        |    false

Hi yeri

The message is simply a warning and I expect your cluster to be otherwise healthy.

The log messages you observe suggest that one of the nodes still “remembers” the decommissioned node somehow. There are two places inside CockroachDB where the “list of nodes” is stored, and it’s possible that either only one was successfully updated during the decommissioning process, or the other needs a delay before it lets the entry expire on its own.

In any case I agree the behavior is surprising, and we could have a further look. For you could tell us:

  • on which of the remaining node(s) do you see this message occur?
  • are there other log entries that pertain to this address 182.195.49.205 in the remainder of the log file since you started to decommission that node?
  • is the cluster otherwise healthy?

I see the log over all nodes.
The cluster consist of the five node on 206, 207, 208, 209, 210.

The 205 node decommissioned at 05-21.

By now(05-23) I can see the log
“cockroach.epkttcrd03.epsvc.2019-05-17T04_26_43Z.024917.log”

W190523 08:17:06.091961 20105058 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {182.195.49.205:26257 0 }. Err :connection error: desc = “transport: Error while dialing dial tcp 182.195.49.205:26257: connect: connection refused”. Reconnecting…
W190523 08:17:07.091861 20105058 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {182.195.49.205:26257 0 }. Err :connection error: desc = “transport: Error while dialing cannot reuse client connection”. Reconnecting…
W190523 08:17:07.091913 20105058 vendor/google.golang.org/grpc/clientconn.go:942 Failed to dial 182.195.49.205:26257: context canceled; please retry.
W190523 08:17:16.093969 20105290 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {182.195.49.205:26257 0 }. Err :connection error: desc = “transport: Error while dialing dial tcp 182.195.49.205:26257: connect: connection refused”. Reconnecting…
W190523 08:17:17.092102 20105290 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {182.195.49.205:26257 0 }. Err :connection error: desc = “transport: Error while dialing cannot reuse client connection”. Reconnecting…
W190523 08:17:17.092155 20105290 vendor/google.golang.org/grpc/clientconn.go:942 Failed to dial 182.195.49.205:26257: context canceled; please retry.

But the cluster is healthy. I can use CRUD transactions.
But I wonder why the cluster warning to me?

If that log it does’n matter things, they don’t logging that message.
Right?