I have just installed cockroachdb using the helm chart and added about 27k rows to it, then I removed the helm release (
helm del --purge name) and started it again
helm install .... Now I see “Invalid leases” on the Problem Ranges report, but I don’t know what to do about it.
The cluster has the default 3 nodes and all the default settings in the helm chart other than the storage class (hostpath) and storage size (10Gi).
As far as I can tell all of the data is reachable. At least, I can run
SELECT SUM(balance) FROM accounts and get what I believe is the correct value and no errors (I’ll be able to verify this once the lease errors are gone). I can insert data successfully.
Is this a normal situation that I can ignore? Can I assume that it will repair itself? If not, what steps do I need to follow to a) identify the exact nature of the problem and b) resolve the problem?
FWIW, I saw this in the docs:
but it’s been over 10 minutes (33 minutes) and the leases are still expired. Should the leaseholders have worked this out amongst themselves by now?
I also saw this thread, but it looks like it died without resolution: Problem Ranges Report -- What can be done about it?
On the Replication Dashboard I see that I have 21 ranges available, 21 leases, 13 lease holders, 0 leaders-without-leases/unavailable/under-replicated/over-replicated. There are indeed 8 ranges with “Invalid leases”.
My wild, mostly uneducated guess is that this happened because the cluster nodes started in a different order the second time around. The events log shows that the initial “Node Joined” order was 1 2 3, while the subsequent “Node Rejoined” order was 1 3 2. Would that have caused some of the ranges to get new leaders? That is, when 1 and 3 were present but 2 was not yet there, would the ranges whose leader was node 2 have been moved to 1 or 3? And if so, would that have caused these invalid leases to exist and persist?
For what it is worth, only nodes 1 and 3 have problem ranges, 2 (the last to start) has none. However, according to /_status/nodes, node 2 is the leader of no ranges, so that makes some sense.
Build Tag: v19.1.2 Build Time: 2019/06/07 17:32:15 Distribution: CCL Platform: linux amd64 (x86_64-unknown-linux-gnu) Go Version: go1.11.6 C Compiler: gcc 6.3.0 Build SHA-1: cbd571c7bf2ffad3514334a410fa7a728b1f5bf0 Build Type: release
Here are screenshots from the Range Reports for R2 (valid lease) and R15 (invalid lease): https://imgur.com/a/lT1PoMN . The values you see here are consistent with the rest of the ranges – all of the “invalid leases” are associated with “expired”.