-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for delete and recreate option to ExistingResourcePolicy
feature
#6142
Comments
Current location of the design is here: https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/existing-resource-policy_design.md Note that there are two aspects of the second implementation phase:
The second one is important here -- since Also, as an implementation comment, I think we should only delete/recreate in cases where update fails. So even if policy is |
👋 |
Per comments in the PR seems there's no consensus on the implementation, we may move this out of v1.12 |
So I do not think we have agreement yet on how to approach this. Don't think we'll have design in by feature freeze so let's move it out to 1.14? |
I can commit having an implementation before current feature complete if we have a design agreement in by first week of november. |
@kaovilai would be great if we could have this feature shipped. Really looking forward to it. My suggestion would be to do it in a NS override route for simplicity than at a resource level. |
So user would specify which namespace this policy applies to? does it apply to all resources in this namespace? |
@anshulahuja98 I think we need resource-level control, since this is something that you'd probably only want for specific resources that:
Whether you also need namespace-level control is another question. We already have design work done in terms of proposing an api for resource-level control. I'm not sure namespace-scoped is any simpler than resource scoped from an impl/api point of view, it's just a different filter. Implementing both, however, would be more complicated. Scoping by namespace seems less useful to me than by resource, thinking through the sorts of use cases where you'd need this feature. |
The issue with specifying resource level leads to complexity in specifying the correlation between various resources and the ordering in which to delete. That is very hard for the end user to figure out which resources will face issues. Think of it this way, even if we recommend an order without customer input, it's pretty hard to say with certainty it will pass with even Core API resources. The investigation and testing required would be complex. Think of this as what will be the default option in delete and rec, where a basic customer can just proceed. On the other hand simply giving a way to delete namespace is much more likely to succeed IMO. For example basic scenarios such as PVC, PVC getting deleted due to NS will lead to PV deletion automatically. |
Or if we want to say do at a resource level thing - maybe we should do it in 2 passes -> first trigger delete for all resources we will restore and then do actual restore. If we do delete and restore 1 by 1, it is likely resources will stay stuck in deletion. if we do full pass for delete before hand, it is likely that all dependent resources must have resolved deletion (most likely) If we take an approach such as this, I am okay with going with it as compared to the NS approach. Maybe @kaovilai can evalute the feasibiliyt of the 2 Pass approach in current setup |
I'll be on vacation for some time, will revert back on Wednesday if there are any further discussions. |
@anshulahuja98 The problem with the first pass is that we only want to delete resources that fail the patch. So for resources that don't have immutable fields, we'd just patch as with Update -- Delete was really just intended for resources that fail to patch, so that's the biggest problem with the two pass approach. Second problem is that the two pass approach would require a significant change to restore logic. We'd essentially have to run the restore logic twice, calling all RIAs, etc, once without creating (only deleting), then again with creating. Calling RIAs would be required before the delete pass, because it's possible for a RIA to discard a resource -- so if we didn't call the RIA, we wouldn't know that, we'd delete it, but then the second pass with RIAs would discard it and not restore. |
But even if we're just doing the "every item per namespace" we still have the problem of dependent resources, since again we're only deleting things that we can't patch, which means there's a relatively small set of resources that will be deleted and recreated, and things could get stuck on deleting depending on how we implement this -- do we remove finalizers or not, for example. But this is where the resource-specific approach could actually reduce these errors, since we can not apply this new policy to resource types that are known to be problematic to delete and recreate (due to dependencies, etc.) |
I see where you're coming from with this. I may have been thinking from a very small subset of resource kinds with immutable fields such as pods. The expectation is you only specify the kind you know will delete/create cleanly. Perhaps this backup with this restore policy will contain only pods for instance. But if thinking in terms of a backup with multiple resources then scoping by NS is more likely to succeed. What if this option is only available if you restore a limited number of namespaces and you specify which resource it applies to? We can give an explicit example of pods being one that we know to needing this. |
This issue might need re-titling after our discussions concluded that original issue may not be that useful. https://hackmd.io/KKWcslVSR3yysufvC76MHg#Solutions-for-usecase |
Hello just wanted to note that we ran into this while developing and testing our backup restore strategy we had to delete persistent volume claims first from the namespace where we wanted to restore them to get them restored with the data we wanted to restore from the backup so when the pvc's are still existent, data is broken, you do velero restore backup, you will not get your pvc data restored |
Expected. Your usecase should be covered by #7481 |
Describe the problem/challenge you have
ExistingResourcePolicy
feature was implemented in velero 1.9 with only 2 options -none
andupdate
We would like to complete the phase 2 of this feature's implementation i.e also provide the delete and recreate optionDesign link: #4613
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: