How does hashJoin work in CockroachDB if memory is not big enough to build hashmap?

There are two table orders and lineitem.

orders is the smaller table which have 70,000,000 rows, lineitem is the bigger table which have 3,000,000,000 rows.

The cluster run in 5 machines (with enough cpu/mem/disk/network resource) using the default cluster setting.

the query is like:

SELECT l_orderkey, o_orderdate, o_shippriority FROM orders, lineitem WHERE (( (l_orderkey = o_orderkey)) AND (o_orderdate < DATE '1995-03-10')) AND (l_shipdate > DATE '1995-03-10') GROUP BY l_orderkey, o_orderdate, o_shippriority;

I use explain analyze xxxxx and find that memory is not enough and it will spill to disk.

I wonder how does cockroachDB handle this situation?

I fount some information on the Internet. Some database may split two tables by the same hash function, one part is in memory and the other part is in disk, then merge the two parts of the results. (or Several other algorithms like ‘grace hash join’,‘hybrid hash join’)

I fount the function func maybeSpillToDisk(err) (in pkg/sql/distsqlrun/hashjoiner.go; version 2.1.2) which will spill h.rows[h.storedSide] to disk and use hashDiskRowContainer as h.storedRows.

I am confused about that will CockroachDB split hashmap (or hashContainer?) into several parts and some part in memory and the other on disk? or just using one hashmap either in memory or on disk?

I have fount the related issue :https://github.com/cockroachdb/cockroach/issues/24580

There are another issue talking about hash join : https://github.com/cockroachdb/cockroach/issues/24582

Does the 2.1.2 version still use the simply hash join algorithm ? or ‘grace hash join’ or some other algorithm else?

We use a distributed hash join. When spilling to disk, we use a simple schema in a temporary RocksDB instance to form an on-disk hash table.