Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase RequestTimout to fix overlappingIP context deadline error #478

Conversation

smoshiur1237
Copy link

@smoshiur1237 smoshiur1237 commented May 30, 2024

What this PR does / why we need it:
ListOverlappingIPs function fails with error : failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded .
Also, DeleteOverlappingIP function also unable to delete unused overlapping after scale down of pods.

kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  197.0.0.0-8 -o yaml | grep -c podref
143

Here with the test case we are having 500 pods with overlapping IP feature enabled, during scale down of pods from 500 to 1, we are getting the error with context deadline exceed and also see undeleted pod reference. So basically it is an issue with the RequestTimeout where it was having 10s timeout. The client have default 5qps and for 500 pods, it needs 100s to send all the query. Because of 10s request timeout used in overlapping ip list and deletion, it gets timed out and giving the following error. The modification of this timeout from 10s to 100s will not change the basic functionality but adding more time to process the query and deletion.

Which issue(s) this PR fixes:
Fixes #389

@smoshiur1237
Copy link
Author

/cc @dougbtv @manuelbuil
Please review and this should fix the issue.

@smoshiur1237
Copy link
Author

/cc @mlguerrero12
Please take a look

@mlguerrero12
Copy link
Collaborator

We have a customer reporting this issue with 100 nodes and 30k pods. What you're proposing might work for 500 pods but not for 30k.

@smoshiur1237
Copy link
Author

smoshiur1237 commented May 30, 2024

We have a customer reporting this issue with 100 nodes and 30k pods. What you're proposing might work for 500 pods but not for 30k.

Here the error and podref issue is coming up because of the timeout. Yes 100s timeout will work with 500pods incase of listing and deletion of overlappingIP. Yes it may not work for 30k pods. Do you have any suggestion how to handle this?

@smoshiur1237
Copy link
Author

New fix PR is up #480, so closing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants