Cannot run cockroachdb on Centos server

I’m trying to run these instructions on two CentOS virtual servers from DigitalOcean:
https://www.cockroachlabs.com/docs/manual-deployment.html#deploy-a-secure-cluster

the second node freezes when I try to run it.

when I go back to the first server and type “cockroach node status”, I get this:
[root@cdb1 ~]# cockroach node status
Error: rpc error: code = 13 desc = transport is closing
Failed running “node”

Thanks for the report.

Could you provide us with the logs for both nodes? Assuming you didn’t change the --store flag, you should find them in $PWD/cockroach-data/logs/

“transport is closing” is the terrible error message that some of our commands give when you forget the certificate-related flags with a secure cluster (https://github.com/cockroachdb/cockroach/issues/9153). Try adding --ca-cert=ca.crt --cert=root.crt --key=root.key to the node status command.

1 Like

that fixed the “node status” issue, thanks

the issue where the first node cannot be connected to by the second node is still there.

I’ve cleared the node directories completely and started from scratch. Here are the INFO logs.

Node 1:

[root@cdb1 ~]# cat  cockroach-data/logs/cockroach.INFO 
I160908 20:42:28.382811 util/log/clog.go:1003  [config] file created at: 2016/09/08 20:42:28
I160908 20:42:28.382811 util/log/clog.go:1003  [config] running on machine: cdb1
I160908 20:42:28.382811 util/log/clog.go:1003  [config] binary: CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160908 20:42:28.382811 util/log/clog.go:1003  [config] arguments: [cockroach start --host=cdb1.fieldmotion.com --http-addr=139.59.179.118 --ca-cert=ca.cert --cert=node1.cert --key=node1.key]
I160908 20:42:28.382811 util/log/clog.go:1003  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu file:line msg
I160908 20:42:28.382810 cli/start.go:342  CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160908 20:42:28.483614 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160908 20:42:28.483690 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160908 20:42:28.485926 server/context.go:407  1 storage engine initialized
I160908 20:42:28.485978 cli/start.go:362  starting cockroach node
W160908 20:42:28.488251 gossip/gossip.go:1070  not connected to cluster; use --join to specify a connected node
I160908 20:42:28.489494 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160908 20:42:28.553509 server/node.go:407  store store=0:0 not bootstrapped
I160908 20:42:28.556126 storage/replica_trigger.go:308  store=1:1 range=1 [/Min-/Max): new range lease replica {1 1 1} 1970-01-01 00:00:00 +0000 UTC 409268h42m37.555305778s following replica {0 0 0} 1970-01-01 00:00:00 +0000 UTC 0s [physicalTime=2016-09-08 20:42:28.556034359 +0000 UTC]
I160908 20:42:28.556738 server/node.go:339  **** cluster {1dea7d3a-719b-41dc-b45d-77d316ba1b80} has been created
I160908 20:42:28.556757 server/node.go:340  **** add additional nodes by specifying --join=cdb1.fieldmotion.com:26257
I160908 20:42:28.560515 storage/store.go:982  store=1:1: failed initial metrics computation: store=1:1: system config not yet available
I160908 20:42:28.560541 server/node.go:420  initialized store store=1:1: {Capacity:21005025280 Available:18489274368 RangeCount:0}
I160908 20:42:28.560572 server/node.go:312  node ID 1 initialized
I160908 20:42:28.560727 storage/stores.go:302  read 0 node addresses from persistent storage
I160908 20:42:28.560845 server/node.go:549  connecting to gossip network to verify cluster ID...
I160908 20:42:28.561596 server/node.go:570  node connected via gossip and verified as part of cluster {"1dea7d3a-719b-41dc-b45d-77d316ba1b80"}
I160908 20:42:28.561629 server/node.go:358  node=1: started with [[]=cockroach-data] engine(s) and attributes []
I160908 20:42:28.561691 server/server.go:459  starting https server at 139.59.179.118:8080
I160908 20:42:28.561709 server/server.go:460  starting grpc/postgres server at cdb1.fieldmotion.com:26257
I160908 20:42:28.563750 storage/split_queue.go:99  splitting store=1:1 range=1 [/Min-/Max) at keys [/Table/11/0 /Table/12/0 /Table/13/0 /Table/14/0]
I160908 20:42:28.565602 sql/event_log.go:109  Event: "node_join", target: 1, info: {Descriptor:{NodeID:1 Address:{NetworkField:tcp AddressField:cdb1.fieldmotion.com:26257} Attrs:} ClusterID:{UUID:1dea7d3a-719b-41dc-b45d-77d316ba1b80} StartedAt:1473367348561602852}
I160908 20:42:28.580027 server/updates.go:150  No previous updates check time.
I160908 20:42:28.580517 storage/replica_command.go:2213  store=1:1 range=1 [/Min-/Max): initiating a split of this range at key /Table/11
E160908 20:42:28.592266 storage/queue.go:510  [replicate] (purgatory) on store=1:1 range=1 [/Min-/Table/11): 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160908 20:42:28.592624 storage/queue.go:510  [replicate] (purgatory) on store=1:1 range=1 [/Min-/Table/11): 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160908 20:42:28.593020 storage/replica_command.go:2213  store=1:1 range=2 [/Table/11-/Max): initiating a split of this range at key /Table/12
E160908 20:42:28.597155 storage/queue.go:510  [replicate] (purgatory) on store=1:1 range=2 [/Table/11-/Table/12): 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160908 20:42:28.598040 storage/replica_command.go:2213  store=1:1 range=3 [/Table/12-/Max): initiating a split of this range at key /Table/13
W160908 20:42:28.598985 storage/intent_resolver.go:344  could not GC completed transaction: end key /Local/Range/"\x93"/"rdsc\x00" must be greater than start /Local/Range/"\x93"/RangeDescriptor
E160908 20:42:28.602896 storage/queue.go:510  [replicate] (purgatory) on store=1:1 range=3 [/Table/12-/Table/13): 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160908 20:42:28.603752 storage/replica_command.go:2213  store=1:1 range=4 [/Table/13-/Max): initiating a split of this range at key /Table/14
W160908 20:42:28.604247 storage/intent_resolver.go:344  could not GC completed transaction: end key /Local/Range/"\x94"/"rdsc\x00" must be greater than start /Local/Range/"\x94"/RangeDescriptor
E160908 20:42:28.608243 storage/queue.go:510  [replicate] (purgatory) on store=1:1 range=4 [/Table/13-/Table/14): 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
W160908 20:42:28.609029 storage/intent_resolver.go:344  could not GC completed transaction: end key /Local/Range/"\x95"/"rdsc\x00" must be greater than start /Local/Range/"\x95"/RangeDescriptor
E160908 20:42:28.609189 storage/queue.go:510  [replicate] (purgatory) on store=1:1 range=5 [/Table/14-/Max): 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160908 20:42:29.044480 server/updates.go:205  A new version is available: beta-20160908, details: https://www.cockroachlabs.com/docs/beta-20160908.html
I160908 20:42:38.565781 storage/store.go:2816  store=1:1: sstables (read amplification = 0):
I160908 20:42:38.568256 server/status/runtime.go:207  runtime stats: 25 MiB RSS, 58 goroutines, 6.8 MiB/512 KiB/11 MiB GO alloc/idle/total, 3.6 MiB/5.0 MiB CGO alloc/total, 0.00cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (4x)
I160908 20:42:48.568101 server/status/runtime.go:207  runtime stats: 28 MiB RSS, 58 goroutines, 9.2 MiB/152 KiB/14 MiB GO alloc/idle/total, 4.0 MiB/5.5 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)

Node 2:

[root@cdb2 ~]# cat cockroach-data/logs/cockroach.INFO 
I160908 20:42:31.717240 util/log/clog.go:1003  [config] file created at: 2016/09/08 20:42:31
I160908 20:42:31.717240 util/log/clog.go:1003  [config] running on machine: cdb2
I160908 20:42:31.717240 util/log/clog.go:1003  [config] binary: CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160908 20:42:31.717240 util/log/clog.go:1003  [config] arguments: [cockroach start --host=cdb2.fieldmotion.com --http-addr=146.185.181.252 --join=cdb1.fieldmotion.com --ca-cert=ca.cert --cert=node2.cert --key=node2.key]
I160908 20:42:31.717240 util/log/clog.go:1003  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu file:line msg
I160908 20:42:31.717239 cli/start.go:342  CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160908 20:42:31.818758 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160908 20:42:31.818986 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160908 20:42:31.823731 server/context.go:407  1 storage engine initialized
I160908 20:42:31.823897 cli/start.go:362  starting cockroach node
W160908 20:42:31.831018 gossip/gossip.go:1070  not connected to cluster; use --join to specify a connected node
I160908 20:42:31.837152 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160908 20:42:31.853829 server/node.go:407  store store=0:0 not bootstrapped
I160908 20:42:31.853942 storage/stores.go:302  read 0 node addresses from persistent storage
I160908 20:42:31.854235 server/node.go:549  connecting to gossip network to verify cluster ID...
I160908 20:42:31.866825 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
W160908 20:42:31.867042 gossip/client.go:105  node 0: failed to start gossip client: rpc error: code = 14 desc = grpc: the connection is unavailable
I160908 20:42:32.864870 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:42:34.763330 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:42:37.238208 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:42:40.639974 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:42:47.831370 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:42:59.907068 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:43:13.955498 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:43:40.624854 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:44:26.716003 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160908 20:45:43.800707 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}

The commands I used to start both:

cockroach start --host=cdb1.fieldmotion.com --http-addr=139.59.179.118 --ca-cert=ca.cert --cert=node1.cert --key=node1.key --background

and

cockroach start --host=cdb2.fieldmotion.com --http-addr=146.185.181.252 --join=cdb1.fieldmotion.com --ca-cert=ca.cert --cert=node2.cert --key=node2.key --background

another thing to node - the second command has “–background” set, but it’s still stuck in the foreground on the command line

could you try in insecure mode (take out the cert/key commands and add --insecure). This should allow us to narrow it down a bit.

I’ve started from scratch again (removing directories, etc), downloaded the new release, and run again, in insecure mode. Same issue happening - node1 starts up fine (it appears), but node2 doesn’t connect.

node1’s info log:

I160909 05:52:00.454819 1 util/log/clog.go:1019  [config] file created at: 2016/09/09 05:52:00
I160909 05:52:00.454819 1 util/log/clog.go:1019  [config] running on machine: cdb1
I160909 05:52:00.454819 1 util/log/clog.go:1019  [config] binary: CockroachDB beta-20160908 (linux amd64, built 2016/09/08 15:33:58, go1.7)
I160909 05:52:00.454819 1 util/log/clog.go:1019  [config] arguments: [cockroach start --host=cdb1.fieldmotion.com --http-addr=139.59.179.118 --insecure]
I160909 05:52:00.454819 1 util/log/clog.go:1019  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg
I160909 05:52:00.454817 1 cli/start.go:342  CockroachDB beta-20160908 (linux amd64, built 2016/09/08 15:33:58, go1.7)
I160909 05:52:00.555543 1 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160909 05:52:00.555660 1 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160909 05:52:00.558052 1 server/context.go:407  1 storage engine initialized
I160909 05:52:00.558150 1 cli/start.go:362  starting cockroach node
W160909 05:52:00.558606 1 server/server.go:125  running in insecure mode, this is strongly discouraged. See --insecure.
I160909 05:52:00.558791 1 gossip/gossip.go:230  [n?] initial resolvers: []
W160909 05:52:00.558814 1 gossip/gossip.go:1041  [n?] no resolvers found; use --join to specify a connected node
I160909 05:52:00.560450 1 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160909 05:52:00.575683 1 server/node.go:413  [n?] store [n0,s0] not bootstrapped
I160909 05:52:00.578655 40 storage/replica_trigger.go:308  new range lease replica {1 1 1} 1970-01-01 00:00:00 +0000 UTC 409277h52m9.577553125s following replica {0 0 0} 1970-01-01 00:00:00 +0000 UTC 0s [physicalTime=2016-09-09 05:52:00.578584565 +0000 UTC]
I160909 05:52:00.579357 1 server/node.go:345  [n?] **** cluster {377adc75-28c3-4822-b089-47eda1e7a4c2} has been created
I160909 05:52:00.579376 1 server/node.go:346  [n?] **** add additional nodes by specifying --join=cdb1.fieldmotion.com:26257
I160909 05:52:00.579620 1 gossip/gossip.go:264  [n?] NodeID set to 1
I160909 05:52:00.582772 1 storage/store.go:1003  [n?] [n1,s1]: failed initial metrics computation: [n1,s1]: system config not yet available
I160909 05:52:00.582800 1 server/node.go:426  [n?] initialized store [n1,s1]: {Capacity:21005025280 Available:18591358976 RangeCount:0}
I160909 05:52:00.582826 1 server/node.go:318  [n?] node ID 1 initialized
I160909 05:52:00.582903 1 gossip/gossip.go:278  [n?] NodeDescriptor set to node_id:1 address:<network_field:"tcp" address_field:"cdb1.fieldmotion.com:26257" > attrs:<> 
I160909 05:52:00.582967 1 storage/stores.go:313  read 0 node addresses from persistent storage
I160909 05:52:00.583064 1 server/node.go:554  [n?] connecting to gossip network to verify cluster ID...
I160909 05:52:00.583907 1 server/node.go:575  [n?] node connected via gossip and verified as part of cluster {"377adc75-28c3-4822-b089-47eda1e7a4c2"}
I160909 05:52:00.583940 1 server/node.go:364  [n?] node=1: started with [[]=cockroach-data] engine(s) and attributes []
I160909 05:52:00.583970 1 server/server.go:488  [n1] starting http server at 139.59.179.118:8080
I160909 05:52:00.583980 1 server/server.go:489  [n1] starting grpc/postgres server at cdb1.fieldmotion.com:26257
I160909 05:52:00.586507 57 storage/split_queue.go:99  splitting [n1,s1,r1{/Min-/Max}] at keys [/Table/11/0 /Table/12/0 /Table/13/0 /Table/14/0]
I160909 05:52:00.588135 67 sql/event_log.go:109  [n1] Event: "node_join", target: 1, info: {Descriptor:{NodeID:1 Address:{NetworkField:tcp AddressField:cdb1.fieldmotion.com:26257} Attrs:} ClusterID:{UUID:377adc75-28c3-4822-b089-47eda1e7a4c2} StartedAt:1473400320583916245}
I160909 05:52:00.600447 76 server/updates.go:150  No previous updates check time.
I160909 05:52:00.600867 57 storage/replica_command.go:2213  [n1,s1,r1{/Min-/Max}] initiating a split of this range at key /Table/11
E160909 05:52:00.607006 59 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r1{/Min-/Table/11}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:52:00.611006 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r1{/Min-/Table/11}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160909 05:52:00.611037 57 storage/replica_command.go:2213  [n1,s1,r2{/Table/11-/Max}] initiating a split of this range at key /Table/12
E160909 05:52:00.615218 59 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r2{/Table/11-/Table/12}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160909 05:52:00.616457 57 storage/replica_command.go:2213  [n1,s1,r3{/Table/12-/Max}] initiating a split of this range at key /Table/13
E160909 05:52:00.622556 59 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r3{/Table/12-/Table/13}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160909 05:52:00.623467 57 storage/replica_command.go:2213  [n1,s1,r4{/Table/13-/Max}] initiating a split of this range at key /Table/14
E160909 05:52:00.627874 59 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r4{/Table/13-/Table/14}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:52:00.628975 59 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r5{/Table/14-/Max}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160909 05:52:10.588735 65 storage/store.go:2982  [n1,s1] sstables (read amplification = 0):
I160909 05:52:10.590585 68 server/status/runtime.go:207  runtime stats: 25 MiB RSS, 56 goroutines, 6.3 MiB/816 KiB/12 MiB GO alloc/idle/total, 3.6 MiB/5.1 MiB CGO alloc/total, 0.00cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (4x)
I160909 05:52:20.590429 68 server/status/runtime.go:207  runtime stats: 27 MiB RSS, 56 goroutines, 8.7 MiB/632 KiB/14 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:52:30.590458 68 server/status/runtime.go:207  runtime stats: 28 MiB RSS, 56 goroutines, 6.4 MiB/3.6 MiB/16 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)
I160909 05:52:40.590331 68 server/status/runtime.go:207  runtime stats: 29 MiB RSS, 56 goroutines, 8.9 MiB/1.5 MiB/16 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:52:50.590377 68 server/status/runtime.go:207  runtime stats: 30 MiB RSS, 56 goroutines, 6.4 MiB/4.4 MiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)
I160909 05:53:00.588155 66 gossip/gossip.go:422  [n1] gossip status (ok, 1 node)
gossip client (0/3 cur/max conns)
gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received)
E160909 05:53:00.588855 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r5{/Table/14-/Max}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:53:00.588930 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r1{/Min-/Table/11}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:53:00.589330 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r2{/Table/11-/Table/12}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:53:00.589699 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r3{/Table/12-/Table/13}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:53:00.590099 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r4{/Table/13-/Table/14}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160909 05:53:00.590365 68 server/status/runtime.go:207  runtime stats: 30 MiB RSS, 56 goroutines, 8.9 MiB/2.4 MiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 34.50cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:53:10.590360 68 server/status/runtime.go:207  runtime stats: 31 MiB RSS, 54 goroutines, 6.4 MiB/4.3 MiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 6.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)
I160909 05:53:20.590351 68 server/status/runtime.go:207  runtime stats: 31 MiB RSS, 54 goroutines, 8.8 MiB/2.5 MiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:53:30.590638 68 server/status/runtime.go:207  runtime stats: 31 MiB RSS, 54 goroutines, 6.3 MiB/4.4 MiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)
I160909 05:53:40.590422 68 server/status/runtime.go:207  runtime stats: 31 MiB RSS, 54 goroutines, 8.7 MiB/2.5 MiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.10cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:53:50.590414 68 server/status/runtime.go:207  runtime stats: 31 MiB RSS, 54 goroutines, 11 MiB/440 KiB/17 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:54:00.588118 66 gossip/gossip.go:422  [n1] gossip status (ok, 1 node)
gossip client (0/3 cur/max conns)
gossip server (0/3 cur/max conns, infos 0/0 sent/received, bytes 0B/0B sent/received)
E160909 05:54:00.588812 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r5{/Table/14-/Max}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:54:00.588879 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r1{/Min-/Table/11}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:54:00.589325 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r2{/Table/11-/Table/12}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:54:00.589681 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r3{/Table/12-/Table/13}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
E160909 05:54:00.590061 88 storage/queue.go:528  [replicate] (purgatory) on [n1,s1,r4{/Table/13-/Table/14}]: 0 of 1 store with an attribute matching []; likely not enough nodes in cluster
I160909 05:54:00.590391 68 server/status/runtime.go:207  runtime stats: 32 MiB RSS, 54 goroutines, 7.6 MiB/4.3 MiB/18 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 34.50cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)
I160909 05:54:10.590444 68 server/status/runtime.go:207  runtime stats: 32 MiB RSS, 54 goroutines, 10 MiB/2.5 MiB/18 MiB GO alloc/idle/total, 4.1 MiB/5.6 MiB CGO alloc/total, 6.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (0x)
I160909 05:54:20.590399 68 server/status/runtime.go:207  runtime stats: 32 MiB RSS, 54 goroutines, 7.3 MiB/4.5 MiB/18 MiB GO alloc/idle/total, 5.1 MiB/6.6 MiB CGO alloc/total, 11.70cgo/sec, 0.00/0.00 %(u/s)time, 0.00 %gc (1x)

node2’s info log

I160909 05:53:22.376998 util/log/clog.go:1003  [config] file created at: 2016/09/09 05:53:22
I160909 05:53:22.376998 util/log/clog.go:1003  [config] running on machine: cdb2
I160909 05:53:22.376998 util/log/clog.go:1003  [config] binary: CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160909 05:53:22.376998 util/log/clog.go:1003  [config] arguments: [cockroach start --host=cdb2.fieldmotion.com --join=cdb1.fieldmotion.com --http-addr=146.185.181.252 --insecure]
I160909 05:53:22.376998 util/log/clog.go:1003  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu file:line msg
I160909 05:53:22.376996 cli/start.go:342  CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160909 05:53:22.478102 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160909 05:53:22.478349 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160909 05:53:22.482729 server/context.go:407  1 storage engine initialized
I160909 05:53:22.482863 cli/start.go:362  starting cockroach node
W160909 05:53:22.484012 server/server.go:124  running in insecure mode, this is strongly discouraged. See --insecure.
W160909 05:53:22.484474 gossip/gossip.go:1070  not connected to cluster; use --join to specify a connected node
I160909 05:53:22.487438 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160909 05:53:22.529632 server/node.go:407  store store=0:0 not bootstrapped
I160909 05:53:22.529754 storage/stores.go:302  read 0 node addresses from persistent storage
I160909 05:53:22.530076 server/node.go:549  connecting to gossip network to verify cluster ID...
I160909 05:53:22.530553 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
W160909 05:53:22.530861 gossip/client.go:105  node 0: failed to start gossip client: rpc error: code = 14 desc = grpc: the connection is unavailable
I160909 05:53:23.528224 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:53:24.948432 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:53:27.631439 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:53:31.729454 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:53:37.990606 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:53:48.090138 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:54:03.420214 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:54:30.717143 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:55:16.603051 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}
I160909 05:56:27.336610 /go/src/google.golang.org/grpc/clientconn.go:667  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 139.59.179.118:26257: getsockopt: connection refused"; Reconnecting to {"cdb1.fieldmotion.com:26257" <nil>}

What do the hostnames resolve to on each machine? Does centos map the machine’s hostname to 127.0.1.1 like some debian installations do? Try using IP addresses instead of hostnames everywhere.

The IP addresses resolve correctly. Even when I was creating the servers in the first place, I made sure to put the respective IP addresses into the /etc/hosts files.

I’ve started again, using just the IP addresses. node2 doesn’t like this at all:

[root@cdb2 ~]# cockroach start --join=139.59.179.118 --http-addr=146.185.181.252 --insecure --background
Error: exit status 255
Failed running "start"

info log on node2

[root@cdb2 ~]# cat cockroach-data/logs/cockroach.INFO 
I160909 09:31:02.017885 util/log/clog.go:1003  [config] file created at: 2016/09/09 09:31:02
I160909 09:31:02.017885 util/log/clog.go:1003  [config] running on machine: cdb2
I160909 09:31:02.017885 util/log/clog.go:1003  [config] binary: CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160909 09:31:02.017885 util/log/clog.go:1003  [config] arguments: [cockroach start --join=139.59.179.118 --http-addr=146.185.181.252 --insecure]
I160909 09:31:02.017885 util/log/clog.go:1003  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu file:line msg
I160909 09:31:02.017883 cli/start.go:342  CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160909 09:31:02.119555 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160909 09:31:02.119875 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160909 09:31:02.124488 server/context.go:407  1 storage engine initialized
I160909 09:31:02.124629 cli/start.go:362  starting cockroach node
W160909 09:31:02.124748 server/server.go:124  running in insecure mode, this is strongly discouraged. See --insecure.
W160909 09:31:02.125331 gossip/gossip.go:1070  not connected to cluster; use --join to specify a connected node
I160909 09:31:02.129059 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160909 09:31:02.149156 server/node.go:407  store store=0:0 not bootstrapped
I160909 09:31:02.149264 storage/stores.go:302  read 0 node addresses from persistent storage
I160909 09:31:02.149526 server/node.go:549  connecting to gossip network to verify cluster ID...
I160909 09:31:02.152796 gossip/client.go:111  node 0: started gossip client to 139.59.179.118:26257
I160909 09:31:02.161742 server/node.go:570  node connected via gossip and verified as part of cluster {"81ab08fe-ac55-45d4-8c1b-78b9b1e70a3c"}
I160909 09:31:02.163699 storage/stores.go:317  wrote 1 node addresses to persistent storage
F160909 09:31:02.175584 server/server.go:145  fewer than half the known nodes are within the maximum offset of 250ms (0 of 1)
goroutine 34 [running]:
github.com/cockroachdb/cockroach/util/log.getStacks(0xecf647800, 0xa7735f9, 0x23ae440, 0x23fd440)
	/go/src/github.com/cockroachdb/cockroach/util/log/clog.go:865 +0xa5
github.com/cockroachdb/cockroach/util/log.(*loggingT).outputLogEntry(0x23af5e0, 0xc400000003, 0x1d4db7b, 0x10, 0x91, 0xc4203769b0, 0x4f)
	/go/src/github.com/cockroachdb/cockroach/util/log/clog.go:728 +0x79e
github.com/cockroachdb/cockroach/util/log.addStructured(0x7f3184dc3888, 0xc4200152c0, 0x3, 0x2, 0x0, 0x0, 0xc42055fd18, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/util/log/structured.go:63 +0x15a
github.com/cockroachdb/cockroach/util/log.logDepth(0x7f3184dc3888, 0xc4200152c0, 0x1, 0x3, 0x0, 0x0, 0xc42055fd18, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/util/log/log.go:70 +0x8c
github.com/cockroachdb/cockroach/util/log.Fatal(0x7f3184dc3888, 0xc4200152c0, 0xc42055fd18, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/util/log/log.go:163 +0x76
github.com/cockroachdb/cockroach/server.NewServer.func1()
	/go/src/github.com/cockroachdb/cockroach/server/server.go:145 +0xdc
github.com/cockroachdb/cockroach/rpc.(*Context).runHeartbeat(0xc4203c0080, 0xc42021ad00, 0xc4203aa0e0, 0x14, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/rpc/context.go:313 +0x4a8
github.com/cockroachdb/cockroach/rpc.(*Context).GRPCDial.func1.1.1()
	/go/src/github.com/cockroachdb/cockroach/rpc/context.go:205 +0x69
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc4203b3b00, 0xc42039e450)
	/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:187 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:188 +0x66

if it helps, these two servers are basically throwaway nodes for me, so I can give someone access to login and see if they can spot the issue at first hand.

Node 2 is unhappy, but it’s getting further this time. The IP addresses are working better than the hostnames did. This error means that your clocks aren’t synchronized. Make sure you’re running ntpd on all of them.

1 Like

nice spot. I enabled ntpd on both servers and start it up. this time, node2 gets a bit further and spits out a result on the console:

[root@cdb2 ~]# cockroach start --join=139.59.179.118 --http-addr=146.185.181.252 --insecure --background
build:     beta-20160829 @ 2016/08/29 23:12:47 (go1.7)
admin:     http://146.185.181.252:8080
sql:       postgresql://root@cdb2.fieldmotion.com:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   139.59.179.118

but the logs on node1 show no sign that node2 tried to connect, node2 does not have cockroach running in its process list (ps aux | grep cockroach), and node2’s logs show a Go error:

[root@cdb2 ~]# cat cockroach-data/logs/cockroach.INFO 
I160909 11:25:12.334642 util/log/clog.go:1003  [config] file created at: 2016/09/09 11:25:12
I160909 11:25:12.334642 util/log/clog.go:1003  [config] running on machine: cdb2
I160909 11:25:12.334642 util/log/clog.go:1003  [config] binary: CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160909 11:25:12.334642 util/log/clog.go:1003  [config] arguments: [cockroach start --join=139.59.179.118 --http-addr=146.185.181.252 --insecure]
I160909 11:25:12.334642 util/log/clog.go:1003  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu file:line msg
I160909 11:25:12.334640 cli/start.go:342  CockroachDB beta-20160829 (linux amd64, built 2016/08/29 23:12:47, go1.7)
I160909 11:25:12.435859 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160909 11:25:12.436143 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160909 11:25:12.440733 server/context.go:407  1 storage engine initialized
I160909 11:25:12.440884 cli/start.go:362  starting cockroach node
W160909 11:25:12.441014 server/server.go:124  running in insecure mode, this is strongly discouraged. See --insecure.
W160909 11:25:12.441657 gossip/gossip.go:1070  not connected to cluster; use --join to specify a connected node
I160909 11:25:12.445583 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160909 11:25:12.467109 server/node.go:407  store store=0:0 not bootstrapped
I160909 11:25:12.467200 storage/stores.go:302  read 0 node addresses from persistent storage
I160909 11:25:12.467472 server/node.go:549  connecting to gossip network to verify cluster ID...
I160909 11:25:12.470450 gossip/client.go:111  node 0: started gossip client to 139.59.179.118:26257
I160909 11:25:12.480371 server/node.go:570  node connected via gossip and verified as part of cluster {"140a2be0-13dc-4d9b-903c-9d27a53a8e78"}
I160909 11:25:12.482053 storage/stores.go:317  wrote 1 node addresses to persistent storage
I160909 11:25:12.510994 kv/dist_sender.go:329  unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 11:25:12.520737 server/node.go:306  new node allocated ID 2
I160909 11:25:12.521144 server/node.go:358  node=2: started with [[]=cockroach-data] engine(s) and attributes []
I160909 11:25:12.521265 server/server.go:459  starting http server at 146.185.181.252:8080
I160909 11:25:12.521289 server/server.go:460  starting grpc/postgres server at cdb2.fieldmotion.com:26257
I160909 11:25:12.526088 sql/event_log.go:109  Event: "node_join", target: 2, info: {Descriptor:{NodeID:2 Address:{NetworkField:tcp AddressField:cdb2.fieldmotion.com:26257} Attrs:} ClusterID:{UUID:140a2be0-13dc-4d9b-903c-9d27a53a8e78} StartedAt:1473420312521107100}
I160909 11:25:12.536347 server/node.go:530  bootstrapped store store=2:2
I160909 11:25:12.611149 storage/replica_raftstorage.go:631  store=2:2 range=1 [/Min-/Min): with replicaID [?], applying preemptive snapshot at index 43 (encoded size=206034, 420 KV pairs, 33 log entries)
I160909 11:25:12.616015 storage/replica_raftstorage.go:634  store=2:2 range=1 [/Min-/Table/11): with replicaID [?], applied preemptive snapshot in 0.005s
I160909 11:25:12.633112 storage/raft_transport.go:378  raft transport stream to node 1 established
F160909 11:25:12.641315 storage/replica_raftstorage.go:788  unknown command encoding version 128
goroutine 99 [running]:
github.com/cockroachdb/cockroach/util/log.getStacks(0xecf649300, 0x2639b374, 0x23ae440, 0x23fd440)
	/go/src/github.com/cockroachdb/cockroach/util/log/clog.go:865 +0xa5
github.com/cockroachdb/cockroach/util/log.(*loggingT).outputLogEntry(0x23af5e0, 0xc400000003, 0x1d4d449, 0x1e, 0x314, 0xc4202dfaa0, 0x24)
	/go/src/github.com/cockroachdb/cockroach/util/log/clog.go:728 +0x79e
github.com/cockroachdb/cockroach/util/log.addStructured(0x7fedd990c848, 0xc4200102c8, 0x1b00000003, 0x2, 0x16fab55, 0x23, 0xc420557580, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/util/log/structured.go:63 +0x15a
github.com/cockroachdb/cockroach/util/log.logDepth(0x7fedd990c848, 0xc4200102c8, 0x1, 0xc400000003, 0x16fab55, 0x23, 0xc420557580, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/util/log/log.go:70 +0x8c
github.com/cockroachdb/cockroach/util/log.Fatalf(0x7fedd990c848, 0xc4200102c8, 0x16fab55, 0x23, 0xc420557580, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/util/log/log.go:154 +0x7e
github.com/cockroachdb/cockroach/storage.DecodeRaftCommand(0xc420166fc0, 0x1b2, 0x1c0, 0x2, 0x7, 0x0, 0x2e, 0x0)
	/go/src/github.com/cockroachdb/cockroach/storage/replica_raftstorage.go:788 +0x1df
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReady(0xc4203786c0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/storage/replica.go:1708 +0x813
github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1.4()
	/go/src/github.com/cockroachdb/cockroach/storage/store.go:2533 +0x8c
github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1.1(0xc42026aae0)
	/go/src/github.com/cockroachdb/cockroach/storage/store.go:2461 +0x66
created by github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1
	/go/src/github.com/cockroachdb/cockroach/storage/store.go:2463 +0xa7

the problem appears to be this line:

> F160909 11:25:12.641315 storage/replica_raftstorage.go:788 unknown command encoding version 128

Are both nodes running the same version? That error looks like what you’d get if one node was running the new 20160908 beta and one was running something older, but you’ve shown logs from both nodes running 20160829.

the first logs were from 0829. then 0908 was released. I downloaded the binary again on each. They are both now running 0908

also, I completely removed the cockroach-data directories each time before testing, so it’s from a clean install

The log with the “unknown command encoding version” error was from 20160829 (seen at the start of the log).

so it was! I’ll download again. sorry

I downloaded and extracted again, but the version has not updated to 0908. it works if I download from my laptop, but for some reason, I get 0825 from the node2 server.

are you using a CDN which caches differently for different locations?

I’ll copy the 0908 binary from node1 to node2. just notifying that there seems to be a CDN issue.


this forum software does not allow me to write another reply, so I’m editing this:

looks like the mismatch of 0825 and 0908 was one cause of my problems.

node2 still hasn’t returned console control to me (it’s still running in the foreground), but I see it in node1’s logs, and node2’s logs look a lot more healthy.

But - I’m getting another issue, where node2 seems to think that node1 is still booting?

[root@cdb2 logs]# cat cockroach.INFO
I160909 13:38:31.002276 1 util/log/clog.go:1019  [config] file created at: 2016/09/09 13:38:31
I160909 13:38:31.002276 1 util/log/clog.go:1019  [config] running on machine: cdb2
I160909 13:38:31.002276 1 util/log/clog.go:1019  [config] binary: CockroachDB beta-20160908 (linux amd64, built 2016/09/08 15:33:58, go1.7)
I160909 13:38:31.002276 1 util/log/clog.go:1019  [config] arguments: [cockroach start --join=139.59.179.118 --http-addr=146.185.181.252 --insecure]
I160909 13:38:31.002276 1 util/log/clog.go:1019  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg
I160909 13:38:31.002271 1 cli/start.go:342  CockroachDB beta-20160908 (linux amd64, built 2016/09/08 15:33:58, go1.7)
I160909 13:38:31.103570 1 cli/backtrace.go:40  backtrace disabled: stat /opt/backtrace/bin/ptrace: no such file or directory
W160909 13:38:31.103802 1 server/context.go:302  soft open file descriptor limit 4096 is under the recommended limit 10000; this may decrease performance
please see https://www.cockroachlabs.com/docs/recommended-production-settings.html for more details
I160909 13:38:31.108446 1 server/context.go:407  1 storage engine initialized
I160909 13:38:31.108598 1 cli/start.go:362  starting cockroach node
W160909 13:38:31.108700 1 server/server.go:125  running in insecure mode, this is strongly discouraged. See --insecure.
I160909 13:38:31.109206 1 gossip/gossip.go:230  [n?] initial resolvers: [%!s(*resolver.socketResolver=&{tcp 139.59.179.118:26257})]
W160909 13:38:31.109265 1 gossip/gossip.go:1043  [n?] no incoming or outgoing connections
I160909 13:38:31.112770 1 storage/engine/rocksdb.go:340  opening rocksdb instance at "cockroach-data"
I160909 13:38:31.131968 1 server/node.go:413  [n?] store [n0,s0] not bootstrapped
I160909 13:38:31.132046 1 storage/stores.go:313  read 0 node addresses from persistent storage
I160909 13:38:31.132267 1 server/node.go:554  [n?] connecting to gossip network to verify cluster ID...
I160909 13:38:31.136014 31 gossip/client.go:125  [n?] node 0: started gossip client to 139.59.179.118:26257
I160909 13:38:31.166443 1 server/node.go:575  [n?] node connected via gossip and verified as part of cluster {"140a2be0-13dc-4d9b-903c-9d27a53a8e78"}
I160909 13:38:31.167784 41 storage/stores.go:328  wrote 1 node addresses to persistent storage
I160909 13:38:31.200952 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:32.127489 57 gossip/client.go:125  [n?] node 0: started gossip client to cdb1.fieldmotion.com:26257
I160909 13:38:34.275017 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:37.406799 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:40.615466 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:44.043175 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:47.822586 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:52.480703 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:38:58.907154 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:06.242488 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:13.695223 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:21.139927 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:28.594784 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:36.890104 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:44.395643 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:39:52.014982 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:00.124284 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:07.546408 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:15.175090 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:23.000955 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
W160909 13:40:31.399289 1 kv/dist_sender.go:705  20 retries for an RPC at sequence 0, last error was: failed to send RPC: sending to all 1 replicas failed; last error: rpc error: code = 4 desc = context deadline exceeded, remaining key ranges {"/System/\"node-idgen\"" "/System/\"node-idgen\\x00\""}: Increment [/System/"node-idgen",/Min)
I160909 13:40:31.417454 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:39.903703 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:48.067661 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:40:56.102724 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:03.742743 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:11.543607 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:19.352972 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:27.886848 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:35.590974 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:43.431376 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:52.190830 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping
I160909 13:41:59.984222 1 kv/dist_sender.go:344  [n?] unable to determine this node's attributes for replica selection; node is most likely bootstrapping

Hi, given the rather technical nature of this investigation I would recommend following up on Github: http://github.com/cockroachdb/cockroach
This will enable more interaction than the forum software.

1 Like

I’ve managed to get this up and running. I rebuilt the servers completely, and set up the servers and certificates using only IP addresses. I know I tried this earlier in the thread and failed, but now that the system is up and running, I’m not going to break it again just to find the problem :wink:

Thanks