Is CR DB fast on selects w/ functions on millions of rows?

Hello there!

Need some advice…

Trying to decide if CockroachDB is the right foundation for us… It’s about an app (a bit like a hit-tracker & analytics) where traffic and data flowing from clients is unpredictable.

A client’s website could get millions of hits a day.

Plus, I’m looking to be able to query the DB and aggregate data & stats for the client in real-time.

Is CockroachDB fast at select queries with window functions on HUGE datasets?

For example: I’m thinking of ingesting pageview events one-per-row in a table. Each row has many columns (~30). Is CockroachDB fast at aggregation queries with window functions like “SUM()” on huge datasets with many millions of rows (provided indexes are set on queried columns)?

Example:

“Get total count of all visitors of a site, that performed more than 5 clicks between two dates”

===

SELECT count(*) AS aggregate
FROM (SELECT 1 FROM `events`
WHERE `site_id` = 12345 AND `datenr` BETWEEN 20210301 AND 20210515
GROUP BY `visitor_id`
HAVING SUM(clicks) > 5) AS `x`

===

I’ve read that DBs like “SingleStore” (previously “MemSQL”) can be very fast at such queries and so there’s no need for data aggregation in advance, which is convenient. A client can just get different stats from their data in real time with the right query, even if they have millions of records in the db.

Can CockroachDB be used the same way for fast select queries? Or is CR DB more about stable scaling horizontally with data, but not so much about real-time aggregation and analytics?

Thanks a lot!

It’s likely we won’t be nearly as fast as SingleStore for such queries. Analytical queries is not something we’ve focused a lot on, although large queries have been constantly improving.

1 Like