How to profile memory usage

deployment

(Harry Yang) #1

Hi,

We have a 15 nodes cluster running on GKE kubernetes cluster. Replicas are being distributed evenly across nodes.

In our deployment, we’ve given --cache 25% --max-sql-memory 25% as for memory allocation, but some nodes keep running into OOM, which means they hit the memory limits defined in the statefulset yaml file.

We are scraping the prometheus endpoints on each crdb node. What would be a good set of metrics to grab to understand its memory usage?

In logs, we see some memory metrics at runtime

[n1] runtime stats: 5.7 GiB RSS, 512 goroutines, 301 MiB/271 MiB/768 MiB GO alloc/idle/total, 4.0 GiB/5.1 GiB CGO alloc/total, 720.5 CGO/sec, 16.1/4.8 %(u/s)time, 0.0 %gc (0x), 1.5 MiB/1.3 MiB (r/w)net

Thanks for any recommendation that you may have!


(Ron Arévalo) #2

Hey @harryy,

Could you let me know a bit more about your cluster, what kind of machine type are you running? If you are able, can you share your DDL/DML with me by posting it here, or you can email it to me at ron@cockroachlabs.com. Also what does your workload look like?

Thanks,

Ron


(Bob Vawter) #3

Kubernetes doesn’t “fake out” the amount of system memory present to apps running inside of the containers based upon resource limits, it merely enforces them. Those percentages are therefore relative to the total amount of memory in the system. Without your k8s configuration, it’s hard to know if this is what’s happening.

You can, however, use the “downward API” to expose the resource limits to the pod and provide CRDB with specific values:
https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

An example of exposing resource limits can be found here:
https://docs.openshift.com/container-platform/3.9/dev_guide/application_memory_sizing.html#finding-memory-request-limit-within-pod