Inconsistent is_live node status after decomissioning a node

Hi Roachers :slight_smile: Getting inconsistent info between the following two cockroach commands after decommissioning a node (see below). Ran the following to decommission Node # 3:

root@ip-10-207-52-213:~# cockroach node decommission 3 --wait=live --certs-dir=/cockroach/certs --host="${IP}" --port=11400

The ASG then terminated the instance since itโ€™s health check is set to ELB health check which is set to HTTP:8080/health?ready=1. Now Node # id 4 (10.187.107.240) is reporting inconsistent status for is_live:

root@ip-10-207-52-213:~# echo $IP
10.207.52.213

root@ip-10-207-52-213:~# cockroach node status --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+----------------------+--------+---------------------+---------------------+---------+
| id |       address        | build  |     updated_at      |     started_at      | is_live |
+----+----------------------+--------+---------------------+---------------------+---------+
|  1 | 10.207.52.213:11400  | v2.0.4 | 2018-08-29 18:24:16 | 2018-08-29 17:40:34 | true    |
|  2 | 10.207.37.230:11400  | v2.0.4 | 2018-08-29 18:24:19 | 2018-08-29 16:12:49 | true    |
|  4 | 10.187.107.240:11400 | v2.0.4 | 2018-08-29 18:24:18 | 2018-08-29 16:15:38 | false   |
|  5 | 10.187.108.184:11400 | v2.0.4 | 2018-08-29 18:24:21 | 2018-08-29 16:15:41 | true    |
|  6 | 10.187.104.243:11400 | v2.0.4 | 2018-08-29 18:24:18 | 2018-08-29 16:16:08 | true    |
|  7 | 10.207.44.157:11400  | v2.0.4 | 2018-08-29 18:24:16 | 2018-08-29 18:22:46 | true    |
+----+----------------------+--------+---------------------+---------------------+---------+
(6 rows)

root@ip-10-207-52-213:~# cockroach node status --decommission --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+----------------------+--------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
| id |       address        | build  |     updated_at      |     started_at      | is_live | gossiped_replicas | is_decommissioning | is_draining |
+----+----------------------+--------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
|  1 | 10.207.52.213:11400  | v2.0.4 | 2018-08-29 18:24:26 | 2018-08-29 17:40:34 |    true |        22         |              false |    false    |
|  2 | 10.207.37.230:11400  | v2.0.4 | 2018-08-29 18:24:29 | 2018-08-29 16:12:49 |    true |        22         |              false |    false    |
|  3 | 10.207.42.36:11400   | v2.0.4 | 2018-08-29 18:20:39 | 2018-08-29 16:12:49 |   false |         0         |               true |    true     |
|  4 | 10.187.107.240:11400 | v2.0.4 | 2018-08-29 18:24:28 | 2018-08-29 16:15:38 |    true |        22         |              false |    false    |
|  5 | 10.187.108.184:11400 | v2.0.4 | 2018-08-29 18:24:31 | 2018-08-29 16:15:41 |    true |        22         |              false |    false    |
|  6 | 10.187.104.243:11400 | v2.0.4 | 2018-08-29 18:24:28 | 2018-08-29 16:16:08 |    true |        22         |              false |    false    |
|  7 | 10.207.44.157:11400  | v2.0.4 | 2018-08-29 18:24:26 | 2018-08-29 18:22:46 |    true |         0         |              false |    false    |
+----+----------------------+--------+---------------------+---------------------+---------+-------------------+--------------------+-------------+
(7 rows)

I believe this is some sort of display bug? Please see what happens when I specify #4 vs not for cockroach node status:

root@ip-10-207-52-213:~# cockroach node status 4 --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+----------------------+--------+---------------------+---------------------+---------+
| id |       address        | build  |     updated_at      |     started_at      | is_live |
+----+----------------------+--------+---------------------+---------------------+---------+
|  4 | 10.187.107.240:11400 | v2.0.4 | 2018-08-29 19:16:21 | 2018-08-29 19:04:00 | true    |
+----+----------------------+--------+---------------------+---------------------+---------+
(1 row)

root@ip-10-207-52-213:~# cockroach node status --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+----------------------+--------+---------------------+---------------------+---------+
| id |       address        | build  |     updated_at      |     started_at      | is_live |
+----+----------------------+--------+---------------------+---------------------+---------+
|  1 | 10.207.52.213:11400  | v2.0.4 | 2018-08-29 19:16:36 | 2018-08-29 17:40:34 | true    |
|  2 | 10.207.37.230:11400  | v2.0.4 | 2018-08-29 19:16:29 | 2018-08-29 16:12:49 | true    |
|  4 | 10.187.107.240:11400 | v2.0.4 | 2018-08-29 19:16:31 | 2018-08-29 19:04:00 | false   |
|  5 | 10.187.108.184:11400 | v2.0.4 | 2018-08-29 19:16:31 | 2018-08-29 16:15:41 | true    |
|  6 | 10.187.104.243:11400 | v2.0.4 | 2018-08-29 19:16:38 | 2018-08-29 16:16:08 | true    |
|  7 | 10.207.44.157:11400  | v2.0.4 | 2018-08-29 19:16:36 | 2018-08-29 18:22:46 | true    |
+----+----------------------+--------+---------------------+---------------------+---------+
(6 rows)

This looks like a bug to me. Created a github issue for you here: https://github.com/cockroachdb/cockroach/issues/29308

1 Like

@fat0 - sounds like this was fixed in 2.0.5. Can you try upgrading and see if it persists?

@tim-o - just built a v2.0.5 cluster, the issue is not fixed, remains the same:

# cockroach node status 3 --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+--------------------+--------+---------------------+---------------------+---------+
| id |      address       | build  |     updated_at      |     started_at      | is_live |
+----+--------------------+--------+---------------------+---------------------+---------+
|  3 | 10.207.42.52:11400 | v2.0.5 | 2018-09-11 20:49:05 | 2018-09-11 19:57:45 | true    |
+----+--------------------+--------+---------------------+---------------------+---------+
(1 row)

# cockroach node status 4 --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+---------------------+--------+---------------------+---------------------+---------+
| id |       address       | build  |     updated_at      |     started_at      | is_live |
+----+---------------------+--------+---------------------+---------------------+---------+
|  4 | 10.187.108.48:11400 | v2.0.5 | 2018-09-11 20:49:16 | 2018-09-11 20:01:06 | true    |
+----+---------------------+--------+---------------------+---------------------+---------+
(1 row)

# cockroach node status --certs-dir=/cockroach/certs --host="${IP}" --port=11400
+----+----------------------+--------+---------------------+---------------------+---------+
| id |       address        | build  |     updated_at      |     started_at      | is_live |
+----+----------------------+--------+---------------------+---------------------+---------+
|  3 | 10.207.42.52:11400   | v2.0.5 | 2018-09-11 20:49:25 | 2018-09-11 19:57:45 | false   |
|  4 | 10.187.108.48:11400  | v2.0.5 | 2018-09-11 20:49:26 | 2018-09-11 20:01:06 | false   |
|  5 | 10.187.106.151:11400 | v2.0.5 | 2018-09-11 20:49:26 | 2018-09-11 20:01:06 | true    |
|  6 | 10.187.105.113:11400 | v2.0.5 | 2018-09-11 20:49:25 | 2018-09-11 20:01:14 | true    |
|  7 | 10.207.51.197:11400  | v2.0.5 | 2018-09-11 20:49:21 | 2018-09-11 20:19:21 | true    |
+----+----------------------+--------+---------------------+---------------------+---------+
(5 rows)