Weird benchmarks

Hi all,

I’ve been interested to cockroach for a while and I started playing with it recently.
I am now trying to see how it compares to my other SQL DBs for my usual use cases.

I started a github repo to compare insert, update, select limit1 and select by pk.

Here it is:

It is easily reproducible by simply launching the ./launch_all.sh in the subfolder:
https://github.com/vincentserpoul/playwithsql/tree/master/status/islatest
all you need is docker and go.

What am I doing wrong?

What’s wrong? Did you observe an error message or unexpected results? If so can you detail what you saw?

We can see you have a join here: https://github.com/vincentserpoul/playwithsql/blob/6f62aec3fec57f22de0f1a721ab890fac4033736/status/islatest/entityone.go#L164

Despite the fact our joins are not (yet) optimized this query is not too far from reasonable. A minor possible optimization is this:

        SELECT
            *
        FROM (SELECT e.entityone_id, e.time_created FROM entityone AS e) AS e
        INNER JOIN (SELECT es.entityone_id AS status_entityone_id, es.action_id, es.status_id, es.time_created  FROM entityone_status AS es WHERE es.is_latest = 1) AS es ON es.entityone_id = e.entityone_id
        WHERE 0 = 0

Mind that you may need the next beta to see my proposed tweak above make your query run faster.

Hi knz,

Sorry, I should have given more precise details instead of asking you to look into the repo.

I’m using a containerized cockroachdb on coreos, 8 cores, 16GB ram and SSD.

Here is the schema used:

CREATE TABLE IF NOT EXISTS entityone ( entityone_id BIGSERIAL NOT NULL, time_created DATE NOT NULL DEFAULT CURRENT_DATE, PRIMARY KEY (entityone_id) )
CREATE TABLE IF NOT EXISTS entityone_status ( entityone_id BIGSERIAL NOT NULL, action_id BIGINT NOT NULL DEFAULT 1, status_id INT NOT NULL DEFAULT 1, time_created DATE NOT NULL DEFAULT CURRENT_DATE, is_latest INT NULL DEFAULT 1, UNIQUE (is_latest, entityone_id), INDEX (status_id, is_latest), CONSTRAINT es_fk_e FOREIGN KEY (entityone_id) REFERENCES entityone (entityone_id), INDEX (entityone_id) )

Here is the query:
SELECT e.entityone_id, e.time_created, es.entityone_id as status_entityone_id, es.action_id, es.status_id, es.time_created as status_time_created FROM entityone e INNER JOIN entityone_status es ON es.entityone_id = e.entityone_id AND es.is_latest = 1 WHERE 0 = 0 AND e.entityone_id = $1

Here is the problem:
On the benchmark, inserting, updating is behind other DBs, which is fine and probably usable in production for now (and it will improve for sure).
The select is a lot more problematic. My bench can only do 3 selects in 60s where SQLite can do 500000!
Then order of magnitude is not at all the same here.

Is it something already known or am I doing smtg wrong? (same query is used on all DBs)

If it is known, will it be fixed (you mentioned and I already saw many times optimization of joins)? When should I test it again?

Hi vincent,

Given that a new beta has been released on December 1, you should try to give that one a go and see how it goes. I would be great if you could over time, publish your benchmark results using a github wiki page.

Thanks for your interest in CRDB!

The current JOIN implementation is very basic and runs in quadratic time in many cases, which means it can be slower than a mature implementation by large factors.

In this case we have the es.entityone_id = e.entityone_id AND es.is_latest = 1 ON condition. This currently leads to quadratic runtime. If the es.is_latest = 1 check would be moved to the WHERE clause, I believe that (with the latest beta) this would be linear instead of quadratic.

The other problem is that we don’t yet push down filters so e.entityone_id = $1 is evaluated after we scan and join the entire tables; the correct way will be to scan only the relevant rows from each table (via the primary key of entityone and the entityone_id index on in entityone_status table).

Of course, we plan to address all these cases before shipping version 1, but I don’t have a more specific timeline.

I have published it 6 days ago, here: https://github.com/vincentserpoul/playwithsql/tree/master/status/islatest

It’s pretty uly for now, but I’m planning to get it better.
I’m definitely interested to see it getting better. I love the concept :).

thanks for replying. I will follow closely on the improvements as, as I said above, I’m definitely thrilled by the concept.
I will also try to deepen my knowledge of the query engine over time.

We have recently done some improvements for JOINs (pushing down filters). Would be interesting to see what results you get with a build off the current master or with the next beta.