Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We are seeing consistent throttling in the sleeper container #2161

Open
trouphaz opened this issue Dec 16, 2024 · 2 comments
Open

We are seeing consistent throttling in the sleeper container #2161

trouphaz opened this issue Dec 16, 2024 · 2 comments

Comments

@trouphaz
Copy link

What happened?
The sleeper container in the pinniped-concierge-kube-cert-agent pod is showing consistent and regular throttling due to the CPU limit being set so low. This does not appear to affect the performance of the container itself since it is just on a sleep loop, but this is triggering our monitoring for critical platform workloads throttling.

What did you expect to happen?
It would be good to either have this resource quota configurable by the end user or just have the CPU limit set high enough that the workload isn't throttling.

What is the simplest way to reproduce this behavior?
Just look at your workload throttling metrics. Yours is likely throttling too.

In what environment did you see this bug?

  • Pinniped client version: v0.32.0
  • Pinniped container image (if using a public container image): pinniped-server:v0.32.0, but hosted in our own image registry.
  • Pinniped configuration (what IDP(s) are you using? what downstream credential minting mechanisms are you using?):
  • Kubernetes version (use kubectl version): v1.28.2
  • Kubernetes installer & version (e.g., kubeadm version): v1.26.15
  • Cloud provider or hardware configuration: Spectrocloud
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Others:

What else is there to know about this bug?

@trouphaz
Copy link
Author

I tried uploading the image with my grafana dashboard showing the throttling, but the image is not appearing.

@cfryanr
Copy link
Member

cfryanr commented Dec 16, 2024

Hi @trouphaz, thanks for reporting this!

The cpu request and limit for that pod are both hardcoded here. As you've noticed, the cpu request is 0 and the limit is a very low 20m.

What do your metrics show as the actual CPU usage of the pod?

That pod is typically running sleep, which should consume very little CPU. In my experiments, it's usually around 0m-2m. Occasionally the Concierge pods automatically wake up and uses the Kubernetes Exec API to exec into the cube cert agent pod to run the pinniped-concierge-kube-cert-agent print command. This command should also use very little CPU, but more than sleeping.

The only way that I can get the pod's CPU metric to rise close to 20m is to manually execute kubectl exec to use the print command within that pod a few hundred times within a short period of time. That would not be expected in normal operation.

Looking at the controller which will exec into the pod, I think it will run again whenever one of the following changes, or every ~3 minutes when nothing is changing:

  • The Kuberneteskube-controller-manager pod.
  • The cluster-info configmap in the kube-public namespace.
  • The CredentialIssuer.

I wonder if this is happening more often than expected on your cluster? Do you have Kubernetes audit logging enabled? Can you see how often that Exec API call is being made by the service account of the pods in the Concierge's namespace? Or if you don't have audit logging enabled, could you perhaps turn up the Concierge's log level to info and then see how often the word kubecertagent appears in your Concierge's pod logs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants