Localities, init container not in "Completed" state

Hi!

Problem: init container not in “Completed” state

I tried to use localities functionality.
DC == data center == locality

DC1: 3 crdb nodes
DC2: 3 crdb nodes
DC3: 3 crdb nodes
All DCs are running in same lab in k8s, in three different namespaces.

All nodes try to join to:
dc1node 0, 1, 2
dc2node 0
dc3node 0

Step 1: DC1 was started up. “tampere”

Step 2: DC2 was started up. “oslo”

Init container in “oslo” does not end up in “Completed” state:

`oslo              crdboslo-cockroachdb-0                     1/1     Running     
0          4m23s   192.168.224.91    neo0025node11       <none>           <none>`
`oslo              crdboslo-cockroachdb-1                     1/1     Running     
0          4m23s   192.168.184.215   neo0025node07       <none>           <none>`
`oslo              crdboslo-cockroachdb-2                     1/1     Running     
0          4m23s   192.168.190.83    neo0025node05       <none>           <none>`
`oslo              crdboslo-cockroachdb-**init**-r97bc            1/1     **Running**     
0          4m23s   192.168.215.17    neo0025node09       <none>           <none>`

DC1 is ok:

`tampere           crdbtampere-cockroachdb-0                  1/1     Running     
0          5m12s   192.168.99.30     neo0025node12       <none>           <none>`
`tampere           crdbtampere-cockroachdb-1                  1/1     Running     
0          5m12s   192.168.113.218   neo0025node06       <none>           <none>`
`tampere           crdbtampere-cockroachdb-2                  1/1     Running     
0          5m12s   192.168.79.210    neo0025node08       <none>           <none>`
`tampere           crdbtampere-cockroachdb-init-hwmn5         0/1     **Completed**   
0          5m12s   192.168.224.90    neo0025node11       <none>           <none>`

Error message in init container of “oslo” locality:

+ sleep 5
+ /cockroach/cockroach init --insecure --host=crdbwarsaw-cockroachdb-0.crdbwarsaw-cockroachdb --port 26257
*
* ERROR: rpc error: code = Unknown desc = already connected to cluster
*
E191007 06:15:36.579453 1 cli/error.go:229  rpc error: code = Unknown desc = already connected to cluster
Error: rpc error: code = Unknown desc = already connected to cluster
Failed running "init"
+ sleep 5
+ /cockroach/cockroach init --insecure --host=crdbwarsaw-cockroachdb-0.crdbwarsaw-cockroachdb --port 26257
*
* ERROR: rpc error: code = Unknown desc = already connected to cluster
*
E191007 06:15:41.644741 1 cli/error.go:229  rpc error: code = Unknown desc = already connected to cluster
Error: rpc error: code = Unknown desc = already connected to cluster
Failed running "init"
+ sleep 5

The nodes are joined together successfully:

kubectl exec -it crdbtampere-cockroachdb-0 -ntampere – ./cockroach node status --insecure
id | address | build | started_at | updated_at | is_available | is_live
±—±----------------------------------------------------------------------------------±--------±---------------------------------±---------------------------------±-------------±--------+
1 | crdbtampere-cockroachdb-0.crdbtampere-cockroachdb.tampere.svc.cluster.local:26257 | v19.1.5 | 2019-10-07 06:05:22.639332+00:00 | 2019-10-07 06:13:19.695771+00:00 | true | true
2 | crdbtampere-cockroachdb-1.crdbtampere-cockroachdb.tampere.svc.cluster.local:26257 | v19.1.5 | 2019-10-07 06:05:26.15803+00:00 | 2019-10-07 06:13:18.713603+00:00 | true | true
3 | crdbtampere-cockroachdb-2.crdbtampere-cockroachdb.tampere.svc.cluster.local:26257 | v19.1.5 | 2019-10-07 06:05:30.007338+00:00 | 2019-10-07 06:13:18.06478+00:00 | true | true
4 | crdboslo-cockroachdb-1.crdboslo-cockroachdb.oslo.svc.cluster.local:26257 | v19.1.5 | 2019-10-07 06:06:06.457908+00:00 | 2019-10-07 06:13:18.473494+00:00 | true | true
5 | crdboslo-cockroachdb-2.crdboslo-cockroachdb.oslo.svc.cluster.local:26257 | v19.1.5 | 2019-10-07 06:06:08.337818+00:00 | 2019-10-07 06:13:15.898984+00:00 | true | true
6 | crdboslo-cockroachdb-0.crdboslo-cockroachdb.oslo.svc.cluster.local:26257 | v19.1.5 | 2019-10-07 06:06:11.560959+00:00 | 2019-10-07 06:13:19.078033+00:00 | true | true

So, CockroachDB is working ok otherwise.

Same happens with third locality (“warsaw”).

join command looks like following, e.g. in crdboslo-cockroachdb-0:

+ exec /cockroach/cockroach start --logtostderr --insecure --advertise-host crdboslo-cockroachdb-0.crdboslo-cockroachdb.oslo.svc.cluster.local --http-host 0.0.0.0 --http-port 8080 --port 26257 --cache 25% --max-sql-memory 25% --locality=datacenter=oslo --join crdbtampere-cockroachdb-0-svc.tampere.svc.cluster.local:26257,crdboslo-cockroachdb-0-svc.oslo.svc.cluster.local:26257,crdbwarsaw-cockroachdb-0-svc.warsaw.svc.cluster.local:26257

Version: 19.1.5

Hi @roachman

You mentioned that:

I just want to confirm that this is for your Oslo DC? I see that the hostnames contain crdbwarsaw-cockroachdb in their names, hence I want to clarify what the issue you are seeing is. Please let me know if this is correct, or if theres some other information I may be missing.

Cheers,
Ricardo

Sorry, yes.
From “oslo” comes “oslo” related errors,
kubectl logs crdboslo-cockroachdb-init-fh867 -noslo :

...
+ sleep 5
+ /cockroach/cockroach init --insecure --host=crdboslo-cockroachdb-0.crdboslo-cockroachdb --port 26257
*
* ERROR: rpc error: code = Unknown desc = already connected to cluster
*
E191008 13:36:50.388509 1 cli/error.go:229  rpc error: code = Unknown desc = already connected to cluster
Error: rpc error: code = Unknown desc = already connected to cluster
Failed running "init"
+ sleep 5
...

Regarding “warsaw”, there comes similar errors, but warsaw related,
kubectl logs crdbwarsaw-cockroachdb-init-lsmmc -nwarsaw :

Failed running "init"
+ sleep 5
+ /cockroach/cockroach init --insecure --host=crdbwarsaw-cockroachdb-0.crdbwarsaw-cockroachdb --port 26257
*
* ERROR: rpc error: code = Unknown desc = already connected to cluster
...

Hey @roachman

Did you run the cockroach init command on other nodes in the cluster? This command initializes the entire cluster from one node, provided that everything is configured correctly. Please see the documentation page on cockroach init for further details.

Hi!

I have used the charts.
I installed the CockroachDB localities / “DCs” in following way:

helm install --name crdbtampere --namespace tampere ./cockroachdb
helm install --name crdboslo --namespace oslo ./cockroachdb
helm install --name crdbwarsaw --namespace warsaw ./cockroachdb

I have modified the join part in “exec /cockroach/cockroach start ” phase.
I think I have not modified the “cockroach init” part, unless the mentioned “exec /cock…” has effects on that.

Perhaps the /cockroach/cockroach init should be executed only once, only in one locality / in one data center.
The /cockroach/cockroach init should not be executed in the other two localities?
Target: All crdb nodes in all three localities are joined together.