Node does not start up - v1.0.5

,

Hi all,

I’m having an issue with my cluster. We’re currently developing our application, so we are running the nodes in insecure mode. However, when I upgraded from v1.0.4 to v1.0.5, the original node now does not start properly, nor does it show the proper information (version number, etc). This is on AWS EC2. Here’s the output:

[ec2-user@ip-xxx-xx-xx-xxx ~]$ cockroach start --insecure
*
* WARNING: RUNNING IN INSECURE MODE!
*
* - Your cluster is open for any client that can access <all your IP addresses>.
* - Any user, even root, can log in without providing a password.
* - Any user, connecting as root, can read or write any data in your cluster.
* - There is no network encryption nor authentication, and thus no confidentiality.
*
* Check out how to secure your cluster: https://www.cockroachlabs.com/docs/secure-a-cluster.html
*

After this, it hangs as if waiting for input. The only option is Ctrl+C, which initiates graceful shutdown, which also does not finish properly.

Any help would be appreciated.

Edit:

I am also getting this error in the logs:

W170906 18:56:15.607019 88 storage/store.go:1339  [n1,s1,r5/1:/Table/{0-11}] could not gossip system config: [NotLeaseHolderError] r5: replica (n1,s1):1 not lease holder; lease holder unknown
I170906 18:56:16.074473 118043 vendor/google.golang.org/grpc/server.go:752  grpc: Server.processUnaryRPC failed to write status: stream error: code = DeadlineExceeded desc = "context deadline exceeded"
E170906 18:56:16.186659 118182 server/admin.go:763  context deadline exceeded
E170906 18:56:16.186675 118182 server/admin.go:843  rpc error: code = Internal desc = An internal server error has occurred. Please check your CockroachDB logs for more details.
I170906 18:56:16.186695 118182 vendor/google.golang.org/grpc/server.go:752  grpc: Server.processUnaryRPC failed to write status: stream error: code = DeadlineExceeded desc = "context deadline exceeded"
W170906 18:56:16.538568 88 storage/store.go:1339  [n1,s1,r5/1:/Table/{0-11}] could not gossip system config: [NotLeaseHolderError] r5: replica (n1,s1):1 not lease holder; lease holder unknown

Hi Matthew,

Thanks for reporting your issue. Are the other nodes in your cluster still running and working as expected?

No, unfortunately they are not. I am getting a similar situation on all other nodes.

That’d be because you didn’t start it in the background. If you run the start command using the --background flag it should behave as you’re expecting. Then you can kill it by running cockroach quit --insecure

Are you trying to start all the nodes at the same time? One node can’t finish initializing unless a sufficient number of the other nodes are also running (initializing counts as running in this case).

If you are trying to start all the nodes at the same time and none of them are coming up, I’d love to see the logs from the rest of cluster.

That’s not quite right. When run without --background, a node will still print an initial message to stdout that it started up successfully, of the form:

CockroachDB node starting at 2017-09-07 00:30:17.044800559 -0400 EDT
build:      CCL v1.0.5 @ 2017/08/24 17:42:00 (go1.8.3)
admin:      https://alex-laptop.local:8080
sql:        postgresql://root@alex-laptop.local:26257?sslcert=%2FUsers%2Falex%2F.cockroach-certs%2Fclient.root.crt&sslkey=%2FUsers%2Falex%2F.cockroach-certs%2Fclient.root.key&sslmode=verify-full&sslrootcert=%2FUsers%2Falex%2F.cockroach-certs%2Fca.crt
logs:       /Users/alex/Downloads/cockroach-v1.0.5.darwin-10.9-amd64/cockroach-data/logs
store[0]:   path=/Users/alex/Downloads/cockroach-v1.0.5.darwin-10.9-amd64/cockroach-data
status:     restarted pre-existing node
clusterID:  4ecc24f9-a65c-40ab-8ffd-2c50e7b95482
nodeID:     1
1 Like

Ah yes, that worked perfectly. Thanks so much! Maybe you guys should modify the documentation to mention that more clearly.

I’m glad to hear that solved it for you! In the 1.1 release, the cockroach process will print out some help text in situations like this, but we’d love to make sure that it’s covered appropriately in the docs as well. Do you remember which docs page(s) you were using?

cc @jesse

Specifically Upgrade a Cluster’s Version which caused the issue, since I shut down all the nodes to upgrade them to v1.0.5 and Deploy CockroachDB on AWS EC2 (Insecure), though I realize now that the deploy docs are only good when you’re creating a new cluster.

Unfortunately because the technology is so new, there was really no troubleshooting that had been done when I searched around. Hopefully this post will help somewhat, as will the new help information.

Thanks again!

It looks like Upgrade a Cluster’s Version does say to bring down only one node at a time, but it’s clearly not visible enough (it’s the fourth bullet toward the top of the page). We’ll work on improving that.