I have recently been experimenting with migrating to CRDB for one of our workloads that’s currently in Cassandra and have run into a confusing issue with metric in the Admin UI.
TL;DR: all requests to
/ts/query return a 504 Gateway timeout after 30 seconds (context deadline exceeded). So admin UI metric graphs never load. The logs don’t seem to say anything useful about it. Is there some way to reset the timeseries data? This happens even if I try to load just 1 metric in the custom time series chart.
Things I’ve tried that didn’t work:
- Rolling restart of the entire cluster
- Upgrade to 20.1.3 from 20.1.1 where I started
- Disable then reenable timeseries data per https://www.cockroachlabs.com/docs/v19.2/operational-faqs.html#can-i-reduce-or-disable-the-storage-of-timeseries-data (including setting resolution to 0)
My cluster seems healthy otherwise, I can insert and query normal data.
This problem came up when I started adding many nodes. Initially I tested on a single node (which happened to be a larger node with quite a few SSDs), and I quickly added 20 smaller nodes that have HDD and let it rebalance. One of those nodes had a disk error and died so I had to decommission it. According to that admin UI that happened fine.
Any help would be appreciated.