[Performance] same request, takes 4 min on CockroachDB, 70 ms on SQL Server

,

Hello everybody,

This is my first post. I’m Tech Lead at lemonway.com. We are considering CockroachDB to replace the SQL Server as our main Database.

The first impression is not too promising, but maybe I did something wrong…

Infrastructure:

CockroachDB 19.2.6 is installed in a cluster 3 nodes (Ubuntu)

  • 2vCPU each
  • 8GB RAM each
  • Docker swarm container

versus

SQL Server 2019 (Windows Server 2019)

  • 4vCPU
  • 16GB RAM

The database contains only 2 tables: wallet (1 Mi wallets) and transaction (15 Mi transactions):

Here our request

select * from transaction t
join wallet s on t.sender_id=s.id
join wallet r on t.receiver_id=r.id
limit 10;

We only want 10 first results (any result)

  • CockroachDB takes 3-5min
  • SQL Server takes 35ms

We will make the test more fair by adding RAM + CPU to the 3 nodes (so that each will have the same power as the SQL Server host machine). But evens though the gaps here is still very big…

Did we do something wrong? Or CockroachDB speed is really so far behind the SQL Server for this particular query?

What do you get run you run an EXPLAIN with that query?

Something you could try as well since you don’t care which 10 transactions you get is this:

select * from
(select * from transaction limit 10) t
join wallet s on t.sender_id=s.id
join wallet r on t.receiver_id=r.id

Hi @duongphuhiep,

Sorry to hear that you’re running into these issues. As @mrloud suggested, EXPLAIN will give us more insight into what’s happening here. I’d be especially interested in seeing the output of EXPLAIN (opt, verbose) on this query, since that will give us information about the statistics used by the optimizer.

I’d also be interested in whether running this query on CockroachDB 20.1 (just released today!) improves the performance at all. One of the new features added in 20.1 is support for limit hints, so it’s possible that the optimizer could use a limit hint with your query to improve performance slightly. Unfortunately we don’t yet push limits or limit hints through joins in most cases, so if my suspicions are correct you probably won’t see a large improvement. We have a couple of issues open to improve this functionality further: #48674 and #21171.

In the mean time, manually rewriting the query as suggested by @mrloud may be a good option to explore.

1 Like

hello, I used cockroach 19.2.6 and here is the explain which i’m not sure how to read

> explain select * from transaction t
            -> join wallet s on t.sender_id=s.id
            -> join wallet r on t.receiver_id=r.id
            -> limit 10;
        tree         |       field        |     description
+---------------------+--------------------+----------------------+
                    | distributed        | true
                    | vectorized         | false
limit               |                    |
│                  | count              | 10
└── hash-join      |                    |
        │             | type               | inner
        │             | equality           | (receiver_id) = (id)
        │             | right cols are key |
        ├── hash-join |                    |
        │    │        | type               | inner
        │    │        | equality           | (sender_id) = (id)
        │    │        | right cols are key |
        │    ├── scan |                    |
        │    │        | table              | transaction@primary
        │    │        | spans              | ALL
        │    └── scan |                    |
        │             | table              | wallet@primary
        │             | spans              | ALL
        └── scan      |                    |
                    | table              | wallet@primary
                    | spans              | ALL

@mrloud

ofcourse I can optimize the request as you said, but in this POC, I just want to know how well CockroachDB handle a basic jointure.

and I also intended to try out other more complex jointure / condition to compare with SQL server, before deciding to change the camp. I did expect that the distributed cockroachDB should be slower than the monolithic SQL Server, but not this much…

I have not tried my similar test query on 20.1 in which the optimizer may short circuit this query, I used 19.2.6 like you are. What the explain is telling you is that it’s taking the whole tables of transaction and wallet, joining them, and then limiting the results to 10.

My suggested change will quickly select from the transaction table limiting to 10 results and THEN do the joins against those 10 records and it’s much faster. My execution plan for what I have(in my example incomingshipment would be your transaction, tbldealer your wallet) that’s a similar type dataset as yours is thus:

tree            |field                |description             |
----------------|---------------------|------------------------|
                |distributed          |true                    |
                |vectorized           |false                   |
lookup-join     |                     |                        |
 │              |table                |tbldealer@primary       |
 │              |type                 |inner                   |
 │              |equality             |(destinationcode) = (id)|
 │              |equality cols are key|                        |
 │              |parallel             |                        |
 └── lookup-join|                     |                        |
      │         |table                |tbldealer@primary       |
      │         |type                 |inner                   |
      │         |equality             |(shiptocode) = (id)     |
      │         |equality cols are key|                        |
      │         |parallel             |                        |
      └── scan  |                     |                        |
                |table                |incomingshipment@primary|
                |spans                |ALL                     |
                |limit                |10                      |

I have a 9 node cluster running in 3 different datacenters(all in the US) and this returns within 31ms.

Thanks @duongphuhiep and @mrloud for posting these plans.

It’s great that the rewritten query may serve as a workaround, but @duongphuhiep is right that the optimizer should be able to choose the correct plan without manual rewriting. We’ve discussed this issue a bit internally, and we think it may be a simple bug that can be fixed and back ported to the 20.1 branch: https://github.com/cockroachdb/cockroach/issues/48791.

We’ll work on fixing this issue, and I suspect that if you retry this query when we release 20.1.1 with the fix in a couple of weeks, you should see much better performance.

thanks @becca. I’ll look forward for the 20.1.1…

Other question:

I usually observe the CPU of all my 3 nodes go up to 100% for heavy requests…

  1. I suppose that it is the work of distributed SQL execution. Does it mean that if I throw in more nodes to the cluster (horizontally scale) then the query execution will speed up?
    or is it depend on the type of queries? => what kind of queries will most beneficial from horizontal scaling then?

  2. if I add more RAM / CPU to only one or 2 nodes in a 10 node cluster (partial vertical scaling) then did the execution will go up or it will be random (for example the query will get faster only by chance because they are executed by my 2 betters nodes otherwise nothing will change if the query execution fall to 8 other nodes…)?

Hey @duongphuhiep, I’m curious what you mean by “heavy requests”, is it analytical read-only queries? Is it writes of many rows? I’m working on SQL execution team, and in my experience CockroachDB usually hits RAM constraints sooner and not the CPU limits. It might be the case that the cluster is not provisioned with enough CPU, but I’d be interested to hear some more details from you.

  1. Indeed, since we observe that the nodes are at 100% CPU utilization, it might mean that distributed SQL engine is actually distributing work well, so adding more nodes would be very helpful. And your second point is also correct - not all queries can be distributed efficiently, but the most common can, and we try to do that as best as we can. For example, aggregations and hash joins are distributed well, but window functions can be distributed only if you have PARTITION BY clause in the window definition.

  2. I think the partial vertical scaling (without other changes) will provide only random speed ups. There are other factors in play (like on which nodes does the data live, which node is the gateway for the connection, etc) when deciding on how to distribute the query execution, and we apply various heuristics in that process, but I don’t think we have any that will look at the beefiness of the machines and would load them more or less based on that.

thanks @yuzefovich

“heavy request” means a request which will takes time and we should try to avoid them…

  • a request which require table full-scan (for example the request on my original message)
  • or a request which will return 10 million lines, so most of time will be spent on constructing the response (which will require CPU-power)

Now I wish to make a feature request: But I’m not sure where to post it (or maybe it is already here but I didn’t know)…

in RavenDb there is a notion of auto-index. It analyze the “where” in all requests and creates missing indexes on-fly… (+remove overlap indexes etc…), it can also transfer what it learn on sandbox datatabse to the production database… (so that the production database got all the optimized indexes definitions on the D-day, and continue to tunne further from it)

CockroachDb doesn’t need to meet this level of automation… But

  • I wish that CockroachDB could also study all the “where” requests it saw, and give us some hints (or proposals) about what kind of indexes we should add to (or remove from) our database…
  • Or if it was too complex then cockroachDB should just give us the stas of all the kind of “Where” it executed so that we can study and then deciding if it is worth to add more indexs or to optimize these “Where” clauses…

In a general maner, I wish that CockroachDB can give out some hints or warning about performance tunning depend on the way that the database is used and configured. But it is vague so I take the “auto-index” features as a more specific example.

Thanks for your reading.

Oh I forgot to ask

Is there somewhere a “strict” mode which requires that all the “Where” clauses in the request should meet an index, to avoid stupid “Where” such as:

select * from person where years(now() - Birthday) > 18

it should crash on “strict” mode and require us to fix with, for example:

select * from person where Birthday < 2002

is something like this exists or it will be a feature request again? (same category with the previous one)

thanks…

Hi! I’m a PM at Cockroach Labs and we are tracking ideas like that in this github issue https://github.com/cockroachdb/cockroach/issues/41259.

I haven’t seen the RavenDB feature before so I will take a look at that issue. Thanks for sharing this idea on our forum!