-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(prometheus.operator): retry GetInformer
when running prometheus operator components
#6415
base: main
Are you sure you want to change the base?
Conversation
return err | ||
} | ||
|
||
opts.Mapper, err = apiutil.NewDynamicRESTMapper(restConfig, opts.HTTPClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change in mapper is required as the dynamic rest mapper lazy loads mappings. Hence, we can retry calls to GetInformer
below rather than just failing when calling cache.New
here. This is also the default going forward so rather than having to redo this work/risk introducing a regression (if I were to retry here instead), I have implemented this so that it will continue to work when upgrading controller-runtime
.
@@ -266,6 +275,20 @@ func (c *crdManager) runInformers(restConfig *rest.Config, ctx context.Context) | |||
if ls != labels.Nothing() { | |||
opts.DefaultLabelSelector = ls | |||
} | |||
|
|||
// TODO: Remove custom opts.Mapper when sigs.k8s.io/controller-runtime >= 0.17.0 as `NewDynamicRESTMapper` is the default in that version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See kubernetes-sigs/controller-runtime#2611 and https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.17.0. FWIW, I had a quick go (sorry, not sorry) at upgrading this but ran into also sorts of dependency issue due to the overrides in this repo (e.g. here).
@@ -232,7 +232,7 @@ require ( | |||
k8s.io/component-base v0.28.1 | |||
k8s.io/klog/v2 v2.100.1 | |||
k8s.io/utils v0.0.0-20230726121419-3b25d923346b | |||
sigs.k8s.io/controller-runtime v0.16.2 | |||
sigs.k8s.io/controller-runtime v0.16.2 // TODO: Remove custom rest mapper from component/prometheus/operator/common/crdmanager.go when upgrading past v0.17.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment meant to be in the go.mod
? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I intentionally put it here so that it will be visible in the diff when someone does upgrade this dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for clarifying!
|
||
// TODO: Remove custom opts.Mapper when sigs.k8s.io/controller-runtime >= 0.17.0 as `NewDynamicRESTMapper` is the default in that version | ||
var err error | ||
opts.HTTPClient, err = rest.HTTPClientFor(restConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assignment is a bit premature since if there are errors, opts
attribute still keep the erroneous client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code returns on error just below so opts
will be discarded in that scenario regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. LGTM then
de324cb
to
349eb39
Compare
Rebased to address conflicts introduced by #6552. |
This PR has not had any activity in the past 30 days, so the |
This PR has not had any activity in the past 30 days, so the |
Waiting on maintainers |
This PR has not had any activity in the past 30 days, so the |
PR Description
This PR adds a retry when attempting to run
prometheus.operator
components which ensures that component do not compltetely fail to start up if there is any form of delay in the pod being able to communicate with the Kubernetes API.At present, the
prometheus.operator
flow components fail to run on our GKE clusters when a new pod starts up. Logs as follows:However, if we force reload the config (remove the component and then add it back in again) then everything works as expected.
With these changes in place, the components successfully start after a retry or two. Logs as follows:
Which issue(s) this PR fixes
Notes to the Reviewer
PR Checklist