How to login with the root user on a cluster with certs

Hi All,

We’ve setup a cluster with certificates (followed by the docs) and now I’m trying to access “sql”, but I’m getting the following error:

Welcome to the cockroach SQL interface.
All statements must be terminated by a semicolon.
To exit: CTRL + D.
Error: pq: user root must use certificate authentication instead of password authentication
Failed running "sql"

And in the cockroachdb log file stands: (note: I’ve replaced the ipv4 and port)

server/server.go:1537  [n1,client=<ipv4>:<port>] user root must use certificate authentication instead of password authentication

What am I doing wrong in this case?

How cockroach is running:

/usr/local/bin/cockroach start --certs-dir=/etc/cockroachdb/certs --store=/data/cockroach --host=cro00.test.ownhost.local --port=26257 --http-host=localhost --http-port=7005 --logtostderr=ERROR --insecure=false --cache=.25 --max-sql-memory=.25

How I try to access the “sql”:

/usr/local/bin/cockroach sql --certs-dir=/etc/cockroachdb/certs --host=cro00.test.ownhost.local


cockroach sql --certs-dir=/etc/cockroachdb/certs --host=cro00.test.ownhost.local

I’ve checked the certs, made them, just lik the docs said, 0700 but that didn’t solve the issue.
After that I’ve tried with “sudo -user=” in front just in order to run the command with the cockroach user, but that resulted in a “panic: could not find absolute path for path cockroach-data: stat .: permission denied”. When I executed the “sudo -user”-command, the cockroach was not able to read the key / certs even when the group and users are the same as the system user of cockroach.

Build information:

Build Tag:    v2.0.3
Build Time:   2018/06/18 16:11:33
Distribution: CCL
Platform:     linux amd64 (x86_64-unknown-linux-gnu)
Go Version:   go1.10
C Compiler:   gcc 6.3.0
Build SHA-1:  91715a9a95edbe716912173204fa4c0fc6724457
Build Type:   release

What do you get when you run cockroach cert list --certs-dir=/etc/cockroachdb/certs on the machine where you’re trying to run cockroach sql? (And for completeness, let’s do that on the machine running the node if they’re different).

One of the entries should be for the root client certificate. eg: I get:

$ cockroach cert list
Certificate directory: ${HOME}/.cockroach-certs
|         Usage         | Certificate File |    Key File     |  Expires   |                  Notes                   | Error |
| Certificate Authority | ca.crt           |                 | 2027/08/19 | num certs: 1                             |       |
| Node                  | node.crt         | node.key        | 2022/08/15 | addresses: localhost,stationer, |       |
| Client                | client.marc.crt  | client.marc.key | 2023/05/20 | user: marc                               |       |
| Client                | client.root.crt  | client.root.key | 2022/08/15 | user: root                               |       |
(4 rows)

If a client.<username>.crt file is not found in the --certs-dir, cockroach will attempt password communication. This is disallowed for the root user, so a client.root.crt certificate (and associated key client.root.key) is required.

When I execute cockroach cert list --certs-dir=/etc/cockroachdb/certs I don’t have a client ‘marc’, but further it’s the same…

|         Usage         | Certificate File |    Key File     |  Expires   |                                 Notes                                 | Error |
| Certificate Authority | ca.crt           |                 | 2035/11/15 | num certs: 1                                                          |       |
| Node                  | node.crt         | node.key        | 2023/06/29 | addresses: cro00.test.ownhost.local,localhost,<ipv4>, |       |
| Client                | client.root.crt  | client.root.key | 2023/06/29 | user: root                                                            |       |

Could it have something todo with the fix for release-2.0: cli: don’t prompt for a password when a cert is provided #26232 ?

It was fixed in 2.0.3 which you seem to be running.

You can confirm a few things:

  1. are you running cockroach start and cockroach sql on the same host?
  2. verify cockroach version (or /usr/local/bin/cockroach version if you run that one). Even better: both. If running start and sql on different hosts, please run it on all of them.
  3. are you using any sort of proxy?
  4. can you try removing the --host flag from cockroach start and use cockroach sql --host=localhost
  1. I’m running cockroach start and cockroach sql on the same host, same machine.
  2. I’ve checked the version of /usr/local/bin/cockroach and cockroach and those are the same.
  3. I’m accessing the server directly on the machine, no proxy is involved.
  4. Host is afaik required for joining other servers or am I wrong? (Docs)

--host on the start command specifies which interface the node is listening on. If you leave it empty, it listens on all interface, including loopback. This allows connections through localhost.

As a matter of fact, can you try starting the node with:

/usr/local/bin/cockroach start --certs-dir=/etc/cockroachdb/certs --store=/data/cockroach --http-host=localhost --http-port=7005 --logtostderr=ERROR

Most of the other flags were just using the default value. The --host flag being absent now means it also listens on the loopback device. If you could try connecting to that node with: /usr/local/bin/cockroach sql --certs-dir=/etc/cockroachdb/certs, that’d be great.

In addition, a few more things to check:

  1. which unix users are cockroach start and cockroach sql being run as?
  2. what does ls -l /etc/cockroachdb/certs output?

Hi Marc,

I’m able to login without --host, but now we don’t have a cluster anymore…

Answers on your questions:

  1. start and stop are both executed as root.
  2. ls -lisa returned: (note: *_org are the original files, only copied)
401548 4 drwxr-xr-x 3 cockroach cockroach 4096 Jul 13 11:57 .
396095 4 drwxr-xr-x 3 root      root      4096 Jun 25 20:10 ..
401553 4 -rw-r--r-- 1 cockroach cockroach 1411 Jun 25 20:07 ca.crt
401554 4 -rwx------ 1 cockroach cockroach 1099 Jun 25 16:48 client.root.crt
401567 4 -rw-r--r-- 1 cockroach cockroach 1099 Jul 13 11:35 client.root.crt_org
401555 4 -rwx------ 1 cockroach cockroach 1679 Jun 25 16:48 client.root.key
401568 4 -rw------- 1 cockroach cockroach 1679 Jul 13 11:36 client.root.key_org
395634 4 -rwx------ 1 cockroach cockroach 1375 Jun 25 20:08 node.crt
401569 4 -rw-r--r-- 1 root      root      1375 Jul 13 11:57 node.crt_org
401556 4 -rwx------ 1 cockroach cockroach 1675 Jun 25 20:08 node.key
401570 4 -rw------- 1 root      root      1675 Jul 13 11:57 node.key_org
401550 4 drwxr-xr-x 2 cockroach cockroach 4096 Jun 25 16:48 private

Again, the --host is not what makes a cluster, it just tells the node which interface to listen on. Without it, it listens on all interfaces.
To have a cluster, you need --join=comma-separated-list-of-other-nodes. Unless you have the --join flag on the other nodes, you never had a cluster to begin with.

We’ve got the --join=cro00.test.ownhost.local on the other notes…

Ok, so the problem isn’t with the certificates.

Does cro00.test.ownhost.local resolve to an interface address? (eg: listed in ifconfig on the local machine), or is it an alias or other DNS name?

Before changing the --host flag on the start command, was the admin UI showing all nodes as up and all ranges as fully replicated? After removing it, what does the admin UI show?

cro00.test.ownhost.local points to the system where I’m on.

Netstat is also telling me that the server is listening (xxx.x.xx.x = ip of the machine)

$ netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0*               LISTEN      31353/cockroach 
tcp        0      0 xxx.x.xx.x:26257*               LISTEN      31353/cockroach 

I’m not able to see the admin UI because of the infrastructure.

You can check the list of nodes known to the cluster with cockroach node status --ranges. You should see one entry per node in the cluster. Furthermore, the ranges column shows how many ranges are held by each node. In a three-node cluster, all nodes should have the same number of ranges pretty quickly. With more than 3 nodes, they should still hold some ranges.

It’s a good idea to first experiment with --insecure and check this or the admin UI to make sure the networking setup is working correctly, and only then move on to creating a secure cluster.

Ok, I was able to terminate all the cluster nodes. I’ve retried the start without --host and so on, but I’ve got the same issue. I’m not able to access “sql” or “user” with the --cert option while I’m root. And I’m not able to create another “user” because I need access for the root-user…

With some help I’m able to see the admin UI, every node is green. The system is seeing 3 live nodes.

We’ve played with --insecure AND with certs on older versions than 2.0.3 for benchmarking purposes and load-tests.

Ok, it feels like we’re going in circles.

From a clean data directory and certs directory (or just point them to other directories), could you show all the commands (including paths and system users) to:

  • create CA/node/client certs
  • start the first node
  • run cockroach sql against the first node

Let’s leave the other nodes out for now.
It feels like we’re missing something.

At this moment I have to go, I will reply later on :slight_smile:

Hi @marc,

I’ve found the issue. The client.root.*-files where created with a different CA :blush:
They created the certs, changed the ca-file because of some other issues with ssl but didn’t update the keys, so they they created the keys again and now I’m able to login.

We’ve created the client.root.*-files again and now everything is fine!

Thnx for your help anyway :slight_smile: