We are seeing consistent throttling in the sleeper container #2161

trouphaz · 2024-12-16T17:42:15Z

What happened?
The sleeper container in the pinniped-concierge-kube-cert-agent pod is showing consistent and regular throttling due to the CPU limit being set so low. This does not appear to affect the performance of the container itself since it is just on a sleep loop, but this is triggering our monitoring for critical platform workloads throttling.

What did you expect to happen?
It would be good to either have this resource quota configurable by the end user or just have the CPU limit set high enough that the workload isn't throttling.

What is the simplest way to reproduce this behavior?
Just look at your workload throttling metrics. Yours is likely throttling too.

In what environment did you see this bug?

Pinniped client version: v0.32.0
Pinniped container image (if using a public container image): pinniped-server:v0.32.0, but hosted in our own image registry.
Pinniped configuration (what IDP(s) are you using? what downstream credential minting mechanisms are you using?):
Kubernetes version (use kubectl version): v1.28.2
Kubernetes installer & version (e.g., kubeadm version): v1.26.15
Cloud provider or hardware configuration: Spectrocloud
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Others:

What else is there to know about this bug?

The text was updated successfully, but these errors were encountered:

trouphaz · 2024-12-16T17:43:15Z

I tried uploading the image with my grafana dashboard showing the throttling, but the image is not appearing.

cfryanr · 2024-12-16T20:52:11Z

Hi @trouphaz, thanks for reporting this!

The cpu request and limit for that pod are both hardcoded here. As you've noticed, the cpu request is 0 and the limit is a very low 20m.

What do your metrics show as the actual CPU usage of the pod?

That pod is typically running sleep, which should consume very little CPU. In my experiments, it's usually around 0m-2m. Occasionally the Concierge pods automatically wake up and uses the Kubernetes Exec API to exec into the cube cert agent pod to run the pinniped-concierge-kube-cert-agent print command. This command should also use very little CPU, but more than sleeping.

The only way that I can get the pod's CPU metric to rise close to 20m is to manually execute kubectl exec to use the print command within that pod a few hundred times within a short period of time. That would not be expected in normal operation.

Looking at the controller which will exec into the pod, I think it will run again whenever one of the following changes, or every ~3 minutes when nothing is changing:

The Kuberneteskube-controller-manager pod.
The cluster-info configmap in the kube-public namespace.
The CredentialIssuer.

I wonder if this is happening more often than expected on your cluster? Do you have Kubernetes audit logging enabled? Can you see how often that Exec API call is being made by the service account of the pods in the Concierge's namespace? Or if you don't have audit logging enabled, could you perhaps turn up the Concierge's log level to info and then see how often the word kubecertagent appears in your Concierge's pod logs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We are seeing consistent throttling in the sleeper container #2161

We are seeing consistent throttling in the sleeper container #2161

trouphaz commented Dec 16, 2024

trouphaz commented Dec 16, 2024

cfryanr commented Dec 16, 2024 •

edited

Loading

We are seeing consistent throttling in the sleeper container #2161

We are seeing consistent throttling in the sleeper container #2161

Comments

trouphaz commented Dec 16, 2024

trouphaz commented Dec 16, 2024

cfryanr commented Dec 16, 2024 • edited Loading

cfryanr commented Dec 16, 2024 •

edited

Loading