Add support for delete and recreate option to `ExistingResourcePolicy` feature #6142

shubham-pampattiwar · 2023-04-12T18:44:55Z

Describe the problem/challenge you have

ExistingResourcePolicy feature was implemented in velero 1.9 with only 2 options - none and update We would like to complete the phase 2 of this feature's implementation i.e also provide the delete and recreate option

Design link: #4613

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "The project would be better with this feature added"
👎 for "This feature will not enhance the project in a meaningful way"

The text was updated successfully, but these errors were encountered:

sseago · 2023-04-12T19:01:15Z

Current location of the design is here: https://github.com/vmware-tanzu/velero/blob/main/design/Implemented/existing-resource-policy_design.md

Note that there are two aspects of the second implementation phase:

add the recreate option (in addition to none and update
implement the "approach 3" from the design -- add a second spec field, existingResourcePolicyOverrides to allow per-resource-type overrides to the otherwise-defined existingResourcePolicy

The second one is important here -- since recreate is potentially destructive (it's possible that velero will delete a cluster resource and be unable to recreate it), users may want to limit its use to resource types they know are safe. They may set existingResourcePolicy to update but set a recreate override for pods.

Also, as an implementation comment, I think we should only delete/recreate in cases where update fails. So even if policy is recreate, we first attempt update, and if that fails, we recreate.

kaovilai · 2023-05-04T15:21:22Z

👋

reasonerjt · 2023-07-24T02:36:58Z

Per comments in the PR seems there's no consensus on the implementation, we may move this out of v1.12

kaovilai · 2023-10-17T14:54:35Z

So I do not think we have agreement yet on how to approach this. Don't think we'll have design in by feature freeze so let's move it out to 1.14?

kaovilai · 2023-10-17T15:34:11Z

I can commit having an implementation before current feature complete if we have a design agreement in by first week of november.

anshulahuja98 · 2023-10-18T05:17:47Z

@kaovilai would be great if we could have this feature shipped. Really looking forward to it.

My suggestion would be to do it in a NS override route for simplicity than at a resource level.

kaovilai · 2023-10-18T05:58:47Z

do it in a NS override route

So user would specify which namespace this policy applies to? does it apply to all resources in this namespace?

sseago · 2023-10-19T13:46:53Z

@anshulahuja98 I think we need resource-level control, since this is something that you'd probably only want for specific resources that:

you expect update/patch to be problematic (immutable fields, etc)
you know deleting and recreating won't break anything

Whether you also need namespace-level control is another question. We already have design work done in terms of proposing an api for resource-level control. I'm not sure namespace-scoped is any simpler than resource scoped from an impl/api point of view, it's just a different filter. Implementing both, however, would be more complicated.

Scoping by namespace seems less useful to me than by resource, thinking through the sorts of use cases where you'd need this feature.

anshulahuja98 · 2023-10-19T14:00:28Z

The issue with specifying resource level leads to complexity in specifying the correlation between various resources and the ordering in which to delete. That is very hard for the end user to figure out which resources will face issues.

Think of it this way, even if we recommend an order without customer input, it's pretty hard to say with certainty it will pass with even Core API resources. The investigation and testing required would be complex. Think of this as what will be the default option in delete and rec, where a basic customer can just proceed.

On the other hand simply giving a way to delete namespace is much more likely to succeed IMO. For example basic scenarios such as PVC, PVC getting deleted due to NS will lead to PV deletion automatically.

anshulahuja98 · 2023-10-19T14:02:23Z

Or if we want to say do at a resource level thing - maybe we should do it in 2 passes -> first trigger delete for all resources we will restore and then do actual restore.

If we do delete and restore 1 by 1, it is likely resources will stay stuck in deletion.

if we do full pass for delete before hand, it is likely that all dependent resources must have resolved deletion (most likely)

If we take an approach such as this, I am okay with going with it as compared to the NS approach.

Maybe @kaovilai can evalute the feasibiliyt of the 2 Pass approach in current setup

anshulahuja98 · 2023-10-19T14:02:55Z

I'll be on vacation for some time, will revert back on Wednesday if there are any further discussions.

sseago · 2023-10-19T16:29:48Z

@anshulahuja98 The problem with the first pass is that we only want to delete resources that fail the patch. So for resources that don't have immutable fields, we'd just patch as with Update -- Delete was really just intended for resources that fail to patch, so that's the biggest problem with the two pass approach. Second problem is that the two pass approach would require a significant change to restore logic. We'd essentially have to run the restore logic twice, calling all RIAs, etc, once without creating (only deleting), then again with creating. Calling RIAs would be required before the delete pass, because it's possible for a RIA to discard a resource -- so if we didn't call the RIA, we wouldn't know that, we'd delete it, but then the second pass with RIAs would discard it and not restore.

sseago · 2023-10-19T16:31:47Z

But even if we're just doing the "every item per namespace" we still have the problem of dependent resources, since again we're only deleting things that we can't patch, which means there's a relatively small set of resources that will be deleted and recreated, and things could get stuck on deleting depending on how we implement this -- do we remove finalizers or not, for example. But this is where the resource-specific approach could actually reduce these errors, since we can not apply this new policy to resource types that are known to be problematic to delete and recreate (due to dependencies, etc.)

kaovilai · 2023-10-31T03:45:07Z

If we do delete and restore 1 by 1, it is likely resources will stay stuck in deletion.

if we do full pass for delete before hand, it is likely that all dependent resources must have resolved deletion (most likely)

If we take an approach such as this, I am okay with going with it as compared to the NS approach.

Maybe @kaovilai can evalute the feasibiliyt of the 2 Pass approach in current setup

I see where you're coming from with this. I may have been thinking from a very small subset of resource kinds with immutable fields such as pods. The expectation is you only specify the kind you know will delete/create cleanly. Perhaps this backup with this restore policy will contain only pods for instance.

But if thinking in terms of a backup with multiple resources then scoping by NS is more likely to succeed.

What if this option is only available if you restore a limited number of namespaces and you specify which resource it applies to? We can give an explicit example of pods being one that we know to needing this.

kaovilai · 2024-03-04T18:45:12Z

This issue might need re-titling after our discussions concluded that original issue may not be that useful.

https://kubernetes.slack.com/archives/C021GPR1L3S/p1709576872952979?thread_ts=1709329227.490779&cid=C021GPR1L3S

https://hackmd.io/KKWcslVSR3yysufvC76MHg#Solutions-for-usecase

Elyytscha · 2024-04-16T12:27:53Z

Hello just wanted to note that we ran into this while developing and testing our backup restore strategy

we had to delete persistent volume claims first from the namespace where we wanted to restore them to get them restored with the data we wanted to restore from the backup

so when the pvc's are still existent, data is broken, you do velero restore backup, you will not get your pvc data restored
we had to delete the pvc's first to achieve this

kaovilai · 2024-04-16T16:13:19Z

you will not get your pvc data restored
we had to delete the pvc's first to achieve this

Expected. Your usecase should be covered by #7481

shubham-pampattiwar added the 1.12-candidate label Apr 12, 2023

reasonerjt mentioned this issue Apr 18, 2023

Need recreate method for existing resource policy feature #6033

Closed

shubham-pampattiwar assigned kaovilai May 4, 2023

reasonerjt removed the 1.12-candidate label May 31, 2023

reasonerjt added this to the v1.12 milestone May 31, 2023

kaovilai mentioned this issue Jun 5, 2023

Implement recreate ExistingResourcePolicy to restore API #6354

Closed

3 tasks

reasonerjt added the defer-candidate label Jul 24, 2023

reasonerjt removed this from the v1.12 milestone Aug 2, 2023

reasonerjt added 1.13-candidate issue/pr that should be considered to target v1.13 minor release and removed defer-candidate labels Aug 2, 2023

reasonerjt mentioned this issue Aug 14, 2023

RFE: option to delete & recreate objects that already exist when restoring #469

Closed

reasonerjt added Needs triage We need discussion to understand problem and decide the priority Reviewed Q3 2023 Needs Design and removed Needs triage We need discussion to understand problem and decide the priority labels Aug 23, 2023

anshulahuja98 added the 1.14-candidate label Dec 20, 2023

mateusoliveira43 mentioned this issue Jan 23, 2024

existingResourcePolicy validation outside CLI #7347

Closed

reasonerjt added 2024 Q1 reviewed and removed 1.13-candidate issue/pr that should be considered to target v1.13 minor release labels Jan 24, 2024

reasonerjt removed the 1.14-candidate label Feb 28, 2024

reasonerjt added this to the v1.14 milestone Feb 28, 2024

reasonerjt removed this from the v1.14 milestone Mar 13, 2024

reasonerjt added backlog kind/requirement labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for delete and recreate option to `ExistingResourcePolicy` feature #6142

Add support for delete and recreate option to `ExistingResourcePolicy` feature #6142

shubham-pampattiwar commented Apr 12, 2023

sseago commented Apr 12, 2023

kaovilai commented May 4, 2023

reasonerjt commented Jul 24, 2023

kaovilai commented Oct 17, 2023

kaovilai commented Oct 17, 2023

anshulahuja98 commented Oct 18, 2023

kaovilai commented Oct 18, 2023

sseago commented Oct 19, 2023

anshulahuja98 commented Oct 19, 2023

anshulahuja98 commented Oct 19, 2023

anshulahuja98 commented Oct 19, 2023

sseago commented Oct 19, 2023

sseago commented Oct 19, 2023

kaovilai commented Oct 31, 2023 •

edited

Loading

kaovilai commented Mar 4, 2024 •

edited

Loading

Elyytscha commented Apr 16, 2024

kaovilai commented Apr 16, 2024

Add support for delete and recreate option to ExistingResourcePolicy feature #6142

Add support for delete and recreate option to ExistingResourcePolicy feature #6142

Comments

shubham-pampattiwar commented Apr 12, 2023

sseago commented Apr 12, 2023

kaovilai commented May 4, 2023

reasonerjt commented Jul 24, 2023

kaovilai commented Oct 17, 2023

kaovilai commented Oct 17, 2023

anshulahuja98 commented Oct 18, 2023

kaovilai commented Oct 18, 2023

sseago commented Oct 19, 2023

anshulahuja98 commented Oct 19, 2023

anshulahuja98 commented Oct 19, 2023

anshulahuja98 commented Oct 19, 2023

sseago commented Oct 19, 2023

sseago commented Oct 19, 2023

kaovilai commented Oct 31, 2023 • edited Loading

kaovilai commented Mar 4, 2024 • edited Loading

Elyytscha commented Apr 16, 2024

kaovilai commented Apr 16, 2024

Add support for delete and recreate option to `ExistingResourcePolicy` feature #6142

Add support for delete and recreate option to `ExistingResourcePolicy` feature #6142

kaovilai commented Oct 31, 2023 •

edited

Loading

kaovilai commented Mar 4, 2024 •

edited

Loading