DistSQL: flows aggregate in two nodes

In my recently test i found that distsql flows aggregate in a small number of nodes.It gaves these nodes high pressure and thus limited the total throughput.



I want to know it there any good way to avoid this condition?And what caused this condition?

I test it in sysbench 1.0.15 and crdb-alpha-730 version.

Here is my sysbench command:+

./sysbench --db-driver=pgsql --pgsql-host= --pgsql-port=6257 --pgsql-user=root --pgsql-password= --time=120 --report-interval=1 --threads=160 /home/fee/sysbench/share/sysbench/select_random_ranges.lua --auto-inc=off --tables=1 --table-size=10000000 run

The SQL Query is like this:

SELECT count(k) FROM sbtest1 WHERE (((((((((k BETWEEN $1 AND $2) OR (k BETWEEN $3 AND $4)) OR (k BETWEEN $5 AND $6)) OR (k BETWEEN $7 AND $8)) OR (k BETWEEN $9 AND $10)) OR (k BETWEEN $11 AND $12)) OR (k BETWEEN $13 AND $14)) OR (k BETWEEN $15 AND $16)) OR (k BETWEEN $17 AND $18)) OR (k BETWEEN $19 AND $20);

Hey @floatsliang - it looks like your sysbench invocation talks to cockroach through some sort of load balancer. Is that one perhaps not spreading the queries evenly? You can check the sql statements per node counters and see if those look equal.

It’s also worth noting that distributed queries will only involve the node you issue the query to and nodes with leaseholders for the relevant data. Based on your “Leaseholders per Store” graph it looks like your leaseholders are well distributed in general, but you can verify that’s the case for this table in particular by running SHOW EXPERIMENTAL_RANGES FROM TABLE sbtest1.

Hi @andrei ,@solon

Yes , i used Haproxy as load balancer.It works well and evenly spreading queries to cluster.

This is connections per node:
distsqlconn

And this is queries per node:
distsqlq

Lease holders and flows:
lease
activeflows

As for the relevant ranges the query refers,it should scan all the ranges of the sbtest1 table according to the select_random_ranges.lua script,because:

To prevent overlapping of our range queries we need to partition the whole table into ‘threads’ segments and then make each thread work with its own segment.


Thus all ranges of sbtest1 should get involved during testing,and theoretically all flows should spread evenly in cluster.

If the connections from the client to the nodes are balanced, then the next step is determining if the table’s data is randomly distributed among the nodes (as @solon was suggesting). For example, if this table is small relative to the total amount of data in the cluster (i.e. other databases/tables and internal timeseries data), then it might not be.
Can you please do a show experimental_ranges from table sbtest1 to see where the data from that table lives? If it turns out that it lives predominantly on a few nodes, can you please try doing a alter table sbtest1 scatter, which should randomize it some, and see if that helps?

sorry i forgot to show you the distribution of table data.:sweat_smile:

Here is the result of show experimental_ranges from table sbtest1,the cluster only has one table and has no timeseries data:

root@:26257/sbtest> show experimental_ranges from table sbtest1;
±----------±---------±---------±---------±-------------+
| start_key | end_key | range_id | replicas | lease_holder |
±----------±---------±---------±---------±-------------+
| NULL | /158587 | 32 | {1,4,5} | 4 |
| /158587 | /316863 | 33 | {1,2,3} | 1 |
| /316863 | /475139 | 34 | {2,3,4} | 2 |
| /475139 | /633415 | 35 | {2,3,5} | 3 |
| /633415 | /791691 | 36 | {2,4,5} | 2 |
| /791691 | /949967 | 37 | {2,3,5} | 5 |
| /949967 | /1108243 | 38 | {1,3,4} | 4 |
| /1108243 | /1266519 | 39 | {2,3,5} | 5 |
| /1266519 | /1424795 | 40 | {1,2,5} | 1 |
| /1424795 | /1583071 | 41 | {1,2,3} | 3 |
| /1583071 | /1741347 | 42 | {2,3,4} | 3 |
| /1741347 | /1899623 | 43 | {2,3,4} | 2 |
| /1899623 | /2057899 | 44 | {2,3,5} | 5 |
| /2057899 | /2216175 | 45 | {2,3,4} | 3 |
| /2216175 | /2374451 | 46 | {2,3,5} | 5 |
| /2374451 | /2532727 | 47 | {3,4,5} | 4 |
| /2532727 | /2691003 | 48 | {1,4,5} | 5 |
| /2691003 | /2849279 | 49 | {2,3,5} | 5 |
| /2849279 | /3007555 | 50 | {1,2,3} | 3 |
| /3007555 | /3165831 | 51 | {2,3,5} | 5 |
| /3165831 | /3324107 | 52 | {3,4,5} | 4 |
| /3324107 | /3482383 | 53 | {1,2,3} | 3 |
| /3482383 | /3640659 | 54 | {1,2,4} | 1 |
| /3640659 | /3798935 | 55 | {2,3,5} | 5 |
| /3798935 | /3957211 | 56 | {3,4,5} | 4 |
| /3957211 | /4115487 | 57 | {1,2,4} | 1 |
| /4115487 | /4273763 | 58 | {1,4,5} | 1 |
| /4273763 | /4432039 | 59 | {3,4,5} | 4 |
| /4432039 | /4590315 | 60 | {1,2,3} | 3 |
| /4590315 | /4748591 | 61 | {1,2,3} | 2 |
| /4748591 | /4906867 | 62 | {1,3,4} | 1 |
| /4906867 | /5065143 | 63 | {2,3,5} | 5 |
| /5065143 | /5223419 | 64 | {1,2,3} | 3 |
| /5223419 | /5381695 | 65 | {3,4,5} | 3 |
| /5381695 | /5539971 | 66 | {1,2,3} | 1 |
| /5539971 | /5698247 | 67 | {2,4,5} | 2 |
| /5698247 | /5856523 | 68 | {1,3,4} | 3 |
| /5856523 | /6014799 | 69 | {1,4,5} | 5 |
| /6014799 | /6173075 | 70 | {1,2,5} | 1 |
| /6173075 | /6331351 | 71 | {1,4,5} | 4 |
| /6331351 | /6489627 | 72 | {1,4,5} | 1 |
| /6489627 | /6647903 | 73 | {1,4,5} | 4 |
| /6647903 | /6806179 | 74 | {3,4,5} | 5 |
| /6806179 | /6964455 | 75 | {1,4,5} | 1 |
| /6964455 | /7122731 | 76 | {1,4,5} | 5 |
| /7122731 | /7281007 | 77 | {1,4,5} | 5 |
| /7281007 | /7439283 | 78 | {1,4,5} | 4 |
| /7439283 | /7597559 | 79 | {1,4,5} | 4 |
| /7597559 | /7755835 | 80 | {1,2,4} | 2 |
| /7755835 | /7914111 | 81 | {1,4,5} | 4 |
| /7914111 | /8072387 | 82 | {1,4,5} | 5 |
| /8072387 | /8230663 | 83 | {1,4,5} | 1 |
| /8230663 | /8388939 | 84 | {1,4,5} | 4 |
| /8388939 | /8547215 | 85 | {1,2,3} | 3 |
| /8547215 | /8705491 | 86 | {1,4,5} | 5 |
| /8705491 | /8863767 | 87 | {1,4,5} | 1 |
| /8863767 | /9022043 | 88 | {1,4,5} | 1 |
| /9022043 | /9180319 | 89 | {1,4,5} | 4 |
| /9180319 | /9338595 | 90 | {1,2,5} | 1 |
| /9338595 | /9496871 | 91 | {1,4,5} | 5 |
| /9496871 | /9655147 | 92 | {1,4,5} | 4 |
| /9655147 | /9813423 | 93 | {1,4,5} | 4 |
| /9813423 | /9971699 | 94 | {1,4,5} | 5 |
| /9971699 | NULL | 95 | {1,4,5} | 4 |
±----------±---------±---------±---------±-------------+
(64 rows)

Time: 187.200115ms

I tried the alter table sbtest1 scatter, and it helped a little(can you tell me the function of alter table scatter ?),but still not spread in all nodes:

What is the table’s schema? Is k the primary key?
Could you please do an explain(distsql) SELECT count(k) ... (put your query in there) and paste the results here? That will show us how this query is planned; we expect to see something for each node.
alter table scatter randomizes the “lease holders” for every range some, in case they were imbalanced.