Metrics are not loading in admin UI dashboard after adding new nodes and decommissioning old

We have hosted our cluster in GCP, with two data centers, 5 nodes per DC and replication factor of 6, with a constraint of 3 for each region.
we started facing OOM, and then we decided to upgrade RAM size and replaced machine type we were using for nodes from 4vcpu 26GB RAM to 8VCPU 54GB RAM

we added 10 new nodes and then decommissioned 10 old nodes to have uniform nodes in the cluster,
but then metrics stopped loading. overview dashboard loads well.

we can see that it has stopped loading when we span the time range for one week last

Hi @krishnaswathi,

I noticed your screenshot indicates you have 12 total nodes.

Could you provide me with a screenshot of the overview page and the network report.

The links would be http://<adminurl>/#/overview/list and http://<adminurl>/#/reports/network respectively.


Hey @krishnaswathi,

I suspect the metric endpoints aren’t reporting the data to the time series.

Try taking a look at http://<adminurl>/#/databases/tables on the bottom.

I can suggest performing a rolling restart or even upgrading to a newer version of CockroachDB if that’s possible.

The documentation can be found here:

The idea behind a rolling restart would be the same as the upgrade, just without installing the new binary.

Let me know if that works.


@mattvardisorry for the delay. we did the rolling restart on the nodes, only a few metrics on storage dashboard, which just appeared for a while and again went back

When you performed the restart the storage metrics were the only graph that got populated?

When did they go back to showing nothing?

Did you change any cluster settings or decommission any nodes when you noticed the change?

no other change was done except rolling restart, and yes only 2 of the metrics populated on the storage dashboard, and no other metrics loaded.
those metrics also disappeared after a while, no decommission was done.

Hey @krishnaswathi,

Thanks for the info.

What version of CRDB are you running?

Are you getting any information from the endpoints on <adminURL>/#/debug/ ?

Can you access the admin UI from a different node’s perspective and let me know if metrics are available there?

As we had this discussion a while back, is your cluster otherwise healthy? If you could send me another screenshot of the network page that would be good.