Unnecessary access to v1.Pod lead to timeout. #4955

qin-nz · 2024-12-16T12:59:16Z

What happened:

my config:

        --source=service
        --service-type-filter=LoadBalancer

my k8s cluster has too many pods, and apiserver can NOT return in 60 seconds.

It will error at:

time="2024-12-16T12:43:02Z" level=info msg="Created Kubernetes client https://10.0.0.1:443"
time="2024-12-16T12:44:02Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

What you expected to happen:

Because I specifiy source=service and service-type-filter=LoadBalancer, So it should NEVER accecss api to v1.Pod.

But actually, NewServiceSource call waitForCacheSync which will get all pods regardless of service-type-filter.

external-dns/source/service.go

Lines 67 to 113 in 2a45cc8

    
           func NewServiceSource(ctx context.Context, kubeClient kubernetes.Interface, namespace, annotationFilter string, fqdnTemplate string, combineFqdnAnnotation bool, compatibility string, publishInternal bool, publishHostIP bool, alwaysPublishNotReadyAddresses bool, serviceTypeFilter []string, ignoreHostnameAnnotation bool, labelSelector labels.Selector, resolveLoadBalancerHostname bool) (Source, error) { 
        
           	tmpl, err := parseTemplate(fqdnTemplate) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	// Use shared informers to listen for add/update/delete of services/pods/nodes in the specified namespace. 
        
           	// Set resync period to 0, to prevent processing when nothing has changed 
        
           	informerFactory := kubeinformers.NewSharedInformerFactoryWithOptions(kubeClient, 0, kubeinformers.WithNamespace(namespace)) 
        
           	serviceInformer := informerFactory.Core().V1().Services() 
        
           	endpointsInformer := informerFactory.Core().V1().Endpoints() 
        
           	podInformer := informerFactory.Core().V1().Pods() 
        
           	nodeInformer := informerFactory.Core().V1().Nodes() 
        
           	// Add default resource event handlers to properly initialize informer. 
        
           	serviceInformer.Informer().AddEventHandler( 
        
           		cache.ResourceEventHandlerFuncs{ 
        
           			AddFunc: func(obj interface{}) { 
        
           			}, 
        
           		}, 
        
           	) 
        
           	endpointsInformer.Informer().AddEventHandler( 
        
           		cache.ResourceEventHandlerFuncs{ 
        
           			AddFunc: func(obj interface{}) { 
        
           			}, 
        
           		}, 
        
           	) 
        
           	podInformer.Informer().AddEventHandler( 
        
           		cache.ResourceEventHandlerFuncs{ 
        
           			AddFunc: func(obj interface{}) { 
        
           			}, 
        
           		}, 
        
           	) 
        
           	nodeInformer.Informer().AddEventHandler( 
        
           		cache.ResourceEventHandlerFuncs{ 
        
           			AddFunc: func(obj interface{}) { 
        
           			}, 
        
           		}, 
        
           	) 
        
           	informerFactory.Start(ctx.Done()) 
        
           	// wait for the local cache to be populated. 
        
           	if err := waitForCacheSync(context.Background(), informerFactory); err != nil { 
        
           		return nil, err 
        
           	}

So it became timeout because of hard code time.

external-dns/source/source.go

Line 356 in 2a45cc8

ctx, cancel := context.WithTimeout(ctx, 60*time.Second)

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

External-DNS version (use external-dns --version): v0.14.2
DNS provider: rfc2136
Others:

** Releated issues**:

The text was updated successfully, but these errors were encountered:

qin-nz · 2024-12-17T02:56:06Z

I try delete this line (and releated lines). There is no timeout anymore.

external-dns/source/service.go

Line 78 in 2a45cc8

podInformer := informerFactory.Core().V1().Pods()

dmarkhas · 2024-12-18T18:38:53Z

I've seen that this can happen due to permissions:
#4960

Make sure the account running external-dns is allowed to list pods.

qin-nz · 2024-12-19T02:18:01Z

@dmarkhas

Yes, I am sure external-dns is allowed to list pods. Because when I delete thousands of pod. List pod can return in 50 seconds.
When use --service-type-filter=LoadBalancer, list pod is unnecessary. So the code should NOT list pods.

ivankatliarchuk · 2025-01-25T12:16:49Z

/help

k8s-ci-robot · 2025-01-25T12:16:51Z

@ivankatliarchuk:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

qin-nz added the kind/bug Categorizes issue or PR as related to a bug. label Dec 16, 2024

qin-nz changed the title ~~Too many pod lead to timeout.~~ Unnecessary access to v1.Pod lead to timeout. Dec 16, 2024

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary access to v1.Pod lead to timeout. #4955

Unnecessary access to v1.Pod lead to timeout. #4955

qin-nz commented Dec 16, 2024 •

edited

Loading

qin-nz commented Dec 17, 2024

dmarkhas commented Dec 18, 2024

qin-nz commented Dec 19, 2024

ivankatliarchuk commented Jan 25, 2025

k8s-ci-robot commented Jan 25, 2025

Unnecessary access to v1.Pod lead to timeout. #4955

Unnecessary access to v1.Pod lead to timeout. #4955

Comments

qin-nz commented Dec 16, 2024 • edited Loading

qin-nz commented Dec 17, 2024

dmarkhas commented Dec 18, 2024

qin-nz commented Dec 19, 2024

ivankatliarchuk commented Jan 25, 2025

k8s-ci-robot commented Jan 25, 2025

Guidelines

qin-nz commented Dec 16, 2024 •

edited

Loading