First node behavior - Does CockroachDB have a SPOF?

If the first node dies, all query will be hanged until the first node starts again.
Is it correct? I tried like below.

  1. start Node1. (cockroach start --insecure --store=… --host=… --port=… --http-port=…)
  2. start Node2 with --join option.
  3. terminate Node1 (“Connection to CockroachDB node lost” shows up on admin site)
  4. connect to Node2 with psql and query. *(hang)
  5. Start Node1 with no --join option (Same as 1) and query returns as soon as Node1 up.

If this is correct, it seems a bit SPOF. I checked when all node down, I can change the first node(main node), but is it possible to change the main node at runtime or automatically change if trouble occurs on main node?

In CockroachDB, a majority of nodes need to to be available for a range to make progress. By default, ranges have a replication factor of three, which means that at least two nodes must be available. Since you started a cluster with only two nodes, it can’t tolerate a failure, since you’ll lose majority as soon as one node dies.

Add a third node to the cluster, though, and you’ll see that taking one node down does not impact availability. See our fault tolerance and recovery guide for a complete walkthrough.

1 Like

Thank you so much for quick and accurate response!