Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptive restore error on timeout due to terminating namespace #7424

Conversation

kaovilai
Copy link
Member

@kaovilai kaovilai commented Feb 13, 2024

Signed-off-by: Tiger Kaovilai [email protected]

Thank you for contributing to Velero!

Please add a summary of your change

Make restore error descriptive when namespace being restored is in terminating state.
Any user seeing this error should know that velero does not force a namespace to disappear by removing finalizers because that could be destructive to some workloads.

User should make namespace be in a state other than terminating for Velero to continue restore process.

Does your change fix a particular issue?

Fixes #5697

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@kaovilai kaovilai changed the title Descriptive restore error on terminating namespace Descriptive restore error when restoring into a terminating namespace Feb 13, 2024
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from 73b25fd to bbd910c Compare February 13, 2024 19:50
Copy link

codecov bot commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.71%. Comparing base (2518824) to head (e7ffa62).
Report is 403 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7424   +/-   ##
=======================================
  Coverage   61.71%   61.71%           
=======================================
  Files         263      263           
  Lines       28869    28873    +4     
=======================================
+ Hits        17816    17819    +3     
- Misses       9793     9794    +1     
  Partials     1260     1260           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 2 times, most recently from cf72b8a to fb25a55 Compare February 13, 2024 21:06
@kaovilai kaovilai changed the title Descriptive restore error when restoring into a terminating namespace Prevent restoring namespaces from backup as Terminating, descriptive restore error on timeout due to terminating namespace Feb 13, 2024
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from fb25a55 to 6a72c2f Compare February 13, 2024 22:57
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 6 times, most recently from dfe96a5 to 18bb3a0 Compare February 14, 2024 00:24
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 2 times, most recently from 7177214 to c7b189d Compare February 14, 2024 18:51
@ywk253100 ywk253100 requested a review from reasonerjt February 19, 2024 06:56
pkg/util/kube/utils.go Outdated Show resolved Hide resolved
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch 3 times, most recently from 98ee21a to 28b5d26 Compare February 20, 2024 06:21
@kaovilai kaovilai changed the title Prevent restoring namespaces from backup as Terminating, descriptive restore error on timeout due to terminating namespace Descriptive restore error on timeout due to terminating namespace Feb 20, 2024
@kaovilai kaovilai requested a review from blackpiglet February 20, 2024 06:22
blackpiglet
blackpiglet previously approved these changes Feb 20, 2024
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from 28b5d26 to ad879f3 Compare March 26, 2024 19:57
… Descriptive restore error on terminating namespace.

Signed-off-by: Tiger Kaovilai <[email protected]>

revert utils_test.go

Signed-off-by: Tiger Kaovilai <[email protected]>

address https://github.com/vmware-tanzu/velero/pull/7424/files/c7b189dd6035839c9eb8ce3dab4ead574de77adb#r1494194484

Signed-off-by: Tiger Kaovilai <[email protected]>

Update pkg/util/kube/utils.go

Signed-off-by: Tiger Kaovilai <[email protected]>
Signed-off-by: Tiger Kaovilai <[email protected]>
@kaovilai kaovilai force-pushed the descriptiveRestoreErrorOnNamespaceTerminating branch from ad879f3 to e7ffa62 Compare March 26, 2024 19:58
@mayankagg9722
Copy link
Contributor

mayankagg9722 commented Mar 27, 2024

HI @kaovilai @blackpiglet we are seeing the Issue #7516. Please look into it once in the same method.

EnsureNamespaceExistsAndIsReady

During restore for every resource within the namespace, we are calling the check to await on if namespace exists (wait for 10 min polling). For instance, if we have 100 resources in a namespace that itself is in terminating state for very long then it impacts the restore flow to get stuck/halt and it too increases the time to restore as it does it for each resource in the same namespace.

@@ -97,6 +103,9 @@ func EnsureNamespaceExistsAndIsReady(namespace *corev1api.Namespace, client core

// err will be set if we timed out or encountered issues retrieving the namespace,
if err != nil {
if terminatingNamespace {
return false, nsCreated, errors.Wrapf(err, "timed out waiting for terminating namespace %s to disappear before restoring", namespace.Name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this call is used during restore as well, and if such error occur for the resource during restore, can we also add this to restore warnings list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention #7516

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is warning preferred over restore error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also per PR title, the intent is that the error are visible in restore error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes do you see needed to be made to the PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry on the late reply, I was awk.
Yes, overall its good to have this error that you added and its visible during restore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reducing the time to address the issue #7516 we need to look at other options.

@anshulahuja98
Copy link
Collaborator

@kaovilai , feel free to merge this now, approved it.

@kaovilai
Copy link
Member Author

@anshulahuja98 I don't have access to merge. Not a maintainer.

@anshulahuja98 anshulahuja98 merged commit f5671c7 into vmware-tanzu:main Aug 23, 2024
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore gets stuck if the destination namespace is in Terminating phase
6 participants