Join and OOM Killer

I have deployed a 3 node cluster by using the last version of CockroachDB (2.1.6), but I have some problems with the OOM Killer when I try to execute join operations without indexes. The scenario is:

  • 10 connections
  • AWS Load Balancer
  • Each node has 16 GiB RAM
  • Cockroach run with the suggested options --cache=.25 --max-sql-memory=.25
  • Three tables: TAB1 with 200 rows (total size 25.9 MiB), TAB2 with 300K rows (total size 38.3 GiB) and TAB3 with 10K rows (total size 1.3 GiB)

When I send 10 queries on the 10 connections, one node (or more) is killed by OOM killer. The queries perform an inner join between the three tables, and select only 2 rows from the final table. The queries have this form:
SELECT * FROM DEVICE INNER JOIN MEASURE ON TAB1.d_field1 = TAB2.m_field2 INNER JOIN TAB3 ON TAB1.d_field1 = TAB3.v_field2 WHERE m_field1 >= '0004_0000000' AND m_field1 <= '0004_0000002' AND v_field3 = 'true' LIMIT 300;
With an index on the field v_field3 the queries works very well, but without index the OOM Killer kill one or more nodes. I expected higher latency but not this behaviour. I don’t understand how to approach this problem. Have I to modify some configurations to have that type of query works correctly (without index)?


For clarification on where the different fields are coming from in the join and filter clauses, is this the correct rewrite of the query?

INNER JOIN TAB3 ON DEVICE.d_field1 = TAB3.v_field2
MEASURE.m_field1 >= ‘0004_0000000’
AND MEASURE.m_field1 <= ‘0004_0000002’
AND TAB3.v_field3 = ‘true’
LIMIT 300;

Yes, it is. Sorry if I write the query in a messy way.

Is this a heap table for a reason? Is it a staging area that gets ETLd for some data warehouse? The solution is to really have the index.

I wanted to test the different perfomance with and without index. But If the only way to make the query work is with index, it’s ok. I’m only a bit surprise that without index there is this behaviour. I expected, for example, that the intermediate results are temporaly spilled on disk, with the rise of the average latency.

Hey @Ramalaca94,

Could you provide the EXPLAIN for both the indexed and non-indexed queries?



Fist of all I want to thank you for the support.

This is the EXPLAIN of not indexed query.

This is the EXPLAIN of indexed query.

In these case validation_period = TAB3.
The non-indexed query works if I execute it from SQL shell, but it doesn’t work when 10 clients try to execute it on 10 connections. The indexed query, as I say before, works very well.

Hey @Ramalaca94,

No problem! Could you also send us over your DDL, you can email it to me here. Also, could you clarify, are you executing the two queries against the same tables, or are they two different sets of tables where one has an index and the other does not?