How to enable leaseholder load balancing

I’ve deployed a 3 node cockroachdb cluster on GCP with all nodes in the same availability zone (hence I’ve haven’t specified the --locality flag). When I start to run load on this cluster, it seems like one of the three nodes has much more CPU load than the others because all the leaseholders are on that node. This is what the ranges look like:

+-------------------------------------------------+-------------------------------------------------+----------+----------+--------------+
|                    Start Key                    |                     End Key                     | Range ID | Replicas | Lease Holder |
+-------------------------------------------------+-------------------------------------------------+----------+----------+--------------+
| NULL                                            | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1046728" |     1774 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1046728" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:109346"  |     1811 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:109346"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1140190" |     1813 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1140190" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1186923" |     1815 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1186923" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1233654" |     1817 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1233654" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1280386" |     1819 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1280386" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1327117" |     1821 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1327117" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:137385"  |     1823 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:137385"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1420580" |     1825 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1420580" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:147915"  |     1827 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:147915"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1525881" |     1778 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1525881" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1572612" |     1829 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1572612" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1619344" |     1831 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1619344" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1666076" |     1833 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1666076" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1712807" |     1835 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1712807" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:175954"  |     1837 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:175954"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1806270" |     1839 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1806270" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1853001" |     1841 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1853001" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:1899735" |     1843 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:1899735" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:195833"  |     1845 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:195833"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:200506"  |     1780 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:200506"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:2051792" |     1847 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:2051792" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:2098524" |     1849 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:2098524" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:2145255" |     1851 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:2145255" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:2191988" |     1853 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:2191988" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:2238719" |     1855 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:2238719" | /"e05e6ff1-c543-4489-a78e-730a1a58d290:243750"  |     1857 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:243750"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:291669"  |     1782 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:291669"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:339587"  |     1784 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:339587"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:387504"  |     1786 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:387504"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:435421"  |     1788 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:435421"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:48334"   |     1790 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:48334"   | /"e05e6ff1-c543-4489-a78e-730a1a58d290:54064"   |     1792 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:54064"   | /"e05e6ff1-c543-4489-a78e-730a1a58d290:588559"  |     1776 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:588559"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:636476"  |     1794 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:636476"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:684394"  |     1796 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:684394"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:732310"  |     1798 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:732310"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:780229"  |     1800 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:780229"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:828147"  |     1802 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:828147"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:876065"  |     1804 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:876065"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:923983"  |     1806 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:923983"  | /"e05e6ff1-c543-4489-a78e-730a1a58d290:971900"  |     1808 | {1,3,5}  |            3 |
| /"e05e6ff1-c543-4489-a78e-730a1a58d290:971900"  | NULL                                            |     1810 | {1,3,5}  |            3 |
+-------------------------------------------------+-------------------------------------------------+----------+----------+--------------+
(43 rows)

Is there a way to ensure leaseholders are balanced across all the three nodes so that I get uniform load across all the three nodes and as a result better read and write throughput? I tried setting kv.allocator.lease_rebalancing_aggressiveness to higher numbers like 2 and 5, but doesn’t look like that helps.

EDIT: I changed my code to ensure I prefixed my primary key with a hash to ensure the load is spread across ranges more evenly (earlier my keys were sequential). Although, the problem still persists that all these ranges have their leaseholder on a single node and hence that node becomes a hotspot

Thanks!

The lease holders per store metric also indicates that all lease holders are being added to a single machine as load increases:

Wow, that large of a leaseholder imbalance is not at all expected, and is something you shouldn’t ever see on a cluster without --locality flags specified.

In existing versions of cockroach, you should be able to fix the problem by running ALTER TABLE <tablename> SCATTER, which randomly scatters all leases in a cluster. Starting in 2.1, we’ll automatically consider the QPS load on each node and transfer leases appropriately.

But before you run the SCATTER command, I’d be interested to see what the /_status/allocator/node/3 endpoint shows to see why the leases aren’t being moved. Would you mind checking that path on the HTTP port and reporting back its contents?

And for future reference, the kv.allocator.lease_rebalancing_aggressiveness setting only applies when you are using localities, and it will often have the affect of making leases less balanced because it will more aggressively move leases toward the localities where the most load is coming from.

Thanks for the response Alex, I’ll try out the ALTER TABLE command to see if it fixes the issue. Also, what is the leaseholder load balancing algorithm in current versions of cockroachdb?

Following is the output of the /_status/allocator/node/3 endpoint: https://gist.github.com/blackpearl13/edd6ef482773bcc8f236cb2bf4e7370e

I’m sorry for the slow response, @blackpearl, for some reason I didn’t get an email about your message.

It looks like you’re using a cluster with multiple stores attached to each node. Unfortunately, this is a situation that we don’t handle well when there are only 3 nodes because we don’t allow the same replica to be stored twice on the same node, but without doing so we can’t rebalance from one store on a given node to the other store on the same node. Would you be able to add a fourth node to the cluster in order to unstick rebalancing?

Also, is this an important deployment scenario for you? We can put work into fixing it, but haven’t heard of anyone really wanting to run in this configuration (exactly 3 nodes, multiple stores per node.

And as for the leaseholder load balancing algorithm, the current algorithm depends on whether you’ve specified the --locality flag on your nodes. If you haven’t, the system just tries to balance the number of leases per store. The load on each range isn’t taken into account. That will most likely be changing in the 2.1 release this October.

If you have specified the --locality flag and your different localities are very far apart on the network (10s or 100s of milliseconds of RTT), then something called “follow the workload” comes into play where we try to move leases towards where most of their load is originating. See our docs for a description.