Volume node affinity conflict kubernetes gcp

@ronarev, Hope you have checked the logs, I am awaiting for your reply.

Hi @vishal,

The logs you sent over aren’t the kubernetes system logs, however they still point to the fact that there isn’t sufficient memory:

available memory from cgroups (8.0 EiB) exceeds system memory 1.8 GiB, using system memory
I191014 10:13:45.959806 1 server/config.go:386 system total memory: 1.8 GiB

If this is not the case, then you’ll need to find out how much memory does the pod think it has. Could you provide your resource limits, they should look something like this:

resources:
  requests:
    cpu: 50m
    memory: 50Mi
  limits:
    cpu: 100m
    memory: 100Mi

@ronarev We doesn’t set resource limit manually to the cockroach db pod. But there is a command(--max-sql-memory 25%) in cockroach db statefulset yaml related to memory.

- "exec /cockroach/cockroach start --logtostderr --certs-dir /cockroach/cockroach-certs --advertise-host $(hostname -f) --http-addr 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%"

We are using default value from cockroach statefulset yaml. Here is the YAML file.

https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml

Hi @vishal,

Your pods are underprovisioned, you have 1.8 GiB of system memory and you’re allocating 25% of that to sql max memory. A node’s SQL memory size will increase the number of simultaneous client connections it allows (the 128MiB default allows a maximum of 6200 simultaneous connections) as well as the node’s capacity for in-memory processing of rows when using ORDER BY , GROUP BY , DISTINCT , joins, and window functions.

You are giving sql max memory a value of about 0.45 GiB which is roughly 460MiB which is about 3.5 times more than the default.

Our suggested total system memory is at least 2GB of ram and ideally, you want to aim for a ratio of 2 GB per vCPU.

This is most certainly why the cluster is crashing.