Percentile aggregates/window functions

sql
(Anthony Hogg) #1

Hi! I’m a newcomer to CRDB, and loving it so far.

The application I’m working on would benefit from calculating percentile values as group aggregates or window functions.

I understand that PG’s percentile_disc is not supported (yet), does this mean that these operations aren’t available at all, or is there some workaround?

(How are the latency and other metric percentiles calculated in the cockroach web ui?)

Thanks for your insights!

(Ron Arévalo) #2

Hey @ynohat,

Thanks for reaching out! When your asking:

Are you referring specifically to percentile_disc or window functions in general? We do support window function, and you could read more about them here.

Thanks,

Ron

(Anthony Hogg) #3

Hi @ronarev,

I saw that window functions were available, but nothing stood out to do the equivalent of percentile_disc. I was reaching out to know if people had alternate methods for doing the same in SQL with CRDB, or if I’d missed some bit in the docs!

Thanks,

Anthony

(Ron Arévalo) #4

Hey @ynohat,

I see, thank you for clarifying, at the moment, we do not have a workaround for the missing percentile_disc function. We are always striving toward parity with Postgres so I’ll bring this up to our product team to see if we have this in a roadmap or if we will add it to the roadmap. Would you mind sharing your use case? I know you mentioned its part of an app that you’re working on, but how would this benefit your app?

Thanks,

Ron

(Anthony Hogg) #5

Hi @ronarev,

Please forgive the latency, I was on holiday!

My use case involves low volume timeseries data, more reads than writes. The application collects metrics at a regular interval. Having collected multiple samples over a given time range, percentile_disc would allow me to extract useful statistics in one query (median/50th, 25th, 75th percentile).

Currently, I need to do this in the application which precludes streaming the results out (all must be retrieved before calculating the aggregates). This incurs RAM/CPU overhead, along with specialized logic that I’d rather offload to the DB layer.

Best regards,

1 Like