Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary access to v1.Pod lead to timeout. #4955

Open
qin-nz opened this issue Dec 16, 2024 · 5 comments
Open

Unnecessary access to v1.Pod lead to timeout. #4955

qin-nz opened this issue Dec 16, 2024 · 5 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@qin-nz
Copy link

qin-nz commented Dec 16, 2024

What happened:

my config:

        --source=service
        --service-type-filter=LoadBalancer

my k8s cluster has too many pods, and apiserver can NOT return in 60 seconds.

It will error at:

time="2024-12-16T12:43:02Z" level=info msg="Created Kubernetes client https://10.0.0.1:443"
time="2024-12-16T12:44:02Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

What you expected to happen:

Because I specifiy source=service and service-type-filter=LoadBalancer, So it should NEVER accecss api to v1.Pod.

But actually, NewServiceSource call waitForCacheSync which will get all pods regardless of service-type-filter.

func NewServiceSource(ctx context.Context, kubeClient kubernetes.Interface, namespace, annotationFilter string, fqdnTemplate string, combineFqdnAnnotation bool, compatibility string, publishInternal bool, publishHostIP bool, alwaysPublishNotReadyAddresses bool, serviceTypeFilter []string, ignoreHostnameAnnotation bool, labelSelector labels.Selector, resolveLoadBalancerHostname bool) (Source, error) {
tmpl, err := parseTemplate(fqdnTemplate)
if err != nil {
return nil, err
}
// Use shared informers to listen for add/update/delete of services/pods/nodes in the specified namespace.
// Set resync period to 0, to prevent processing when nothing has changed
informerFactory := kubeinformers.NewSharedInformerFactoryWithOptions(kubeClient, 0, kubeinformers.WithNamespace(namespace))
serviceInformer := informerFactory.Core().V1().Services()
endpointsInformer := informerFactory.Core().V1().Endpoints()
podInformer := informerFactory.Core().V1().Pods()
nodeInformer := informerFactory.Core().V1().Nodes()
// Add default resource event handlers to properly initialize informer.
serviceInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
endpointsInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
podInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
nodeInformer.Informer().AddEventHandler(
cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
},
},
)
informerFactory.Start(ctx.Done())
// wait for the local cache to be populated.
if err := waitForCacheSync(context.Background(), informerFactory); err != nil {
return nil, err
}

So it became timeout because of hard code time.

ctx, cancel := context.WithTimeout(ctx, 60*time.Second)

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • External-DNS version (use external-dns --version): v0.14.2
  • DNS provider: rfc2136
  • Others:

** Releated issues**:

@qin-nz qin-nz added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2024
@qin-nz qin-nz changed the title Too many pod lead to timeout. Unnecessary access to v1.Pod lead to timeout. Dec 16, 2024
@qin-nz
Copy link
Author

qin-nz commented Dec 17, 2024

I try delete this line (and releated lines). There is no timeout anymore.

podInformer := informerFactory.Core().V1().Pods()

@dmarkhas
Copy link
Contributor

I've seen that this can happen due to permissions:
#4960

Make sure the account running external-dns is allowed to list pods.

@qin-nz
Copy link
Author

qin-nz commented Dec 19, 2024

@dmarkhas

  1. Yes, I am sure external-dns is allowed to list pods. Because when I delete thousands of pod. List pod can return in 50 seconds.
  2. When use --service-type-filter=LoadBalancer, list pod is unnecessary. So the code should NOT list pods.

@ivankatliarchuk
Copy link
Contributor

/help

@k8s-ci-robot
Copy link
Contributor

@ivankatliarchuk:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants