-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement recreate ExistingResourcePolicy to restore API #6354
Implement recreate ExistingResourcePolicy to restore API #6354
Conversation
6723627
to
b8ea1c4
Compare
b8ea1c4
to
f8a0bcc
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6354 +/- ##
==========================================
- Coverage 61.66% 61.54% -0.12%
==========================================
Files 263 263
Lines 28634 28700 +66
==========================================
+ Hits 17657 17664 +7
- Misses 9734 9794 +60
+ Partials 1243 1242 -1 ☔ View full report in Codecov by Sentry. |
290c65c
to
da6fdac
Compare
da6fdac
to
24e31cc
Compare
@@ -55,10 +55,11 @@ skip restoration. | |||
- Changed resources: Velero will first try to patch the changed resource, Now if the patch: | |||
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff | |||
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error. | |||
3. `recreate`: If resource already exists, then Velero will delete it and recreate the resource. | |||
3. `recreate`: Similar to `update` but if resource already exists and is immutable (such as Pods), then Velero attempt to recreate the resource by deleting it first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try this out with CSI PVC restore?
will it make sure that the original PVC is deleted and new PVC with CSI Snapshot source is created.
I believe this has a lot of value in that context of doing actual volume restore which was not supported by current none / patch scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anshulahuja98 without any special code for specific resource types (which I don't think we have right now), we would only delete and recreate something if:
- the kubernetes metadata in-cluster is different from what's in-cluster
- the attempt to patch the resource failed.
Also, at the point where the delete-and-recreate happens, all plugins, etc. have already run. However, it's possible that this will actually work if there are immutable fields that we're attempting to patch here. Otherwise, there may be value in making an exception for the PVC itself in the code. I think that may be better handled as a follow-on change to handle the specific PVC use case, since there may be a number of edge cases to deal with -- CSI-with-datamover, CSI-without-datamover, native snapshots, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought further on this. I am now trying to think from the angle of, what is recreate exactly solving for the end user. I have a worry that it might lead to the cluster in a bad shape than expected restore with the current design approach of deleting if patch fails.
For example we have a deployment
and a pvc
attached to it. As per the RestoreORder used by velero today it will first go to PVC -> patch will fail most likely(let's assume it will) -> velero will try to delete PVC but it will never complete since PVC is mounted on deployment -> velero will now move forward after just marking PVC for deletion -> Velero will now try to restore deployment (where let's assume recreate kicks in ) -> Deployment gets deleted -> leading to the terminating state PVC
getting cleaned up -> Deployment created but PVC is gone / or if recreate didn't kick in for deployment, PVC will be in terminating state.
As a result we caused the user data loss by marking the current PVC deleted while not restoring the data back. The cluster would be in a bad state. IMO: this is kind of a P0 in my opinion to avoid users shooting themselves in the foot.
I believe from k8s core API this is very basic use case which we should think how it can be supported as part of the design or even if a follow on PR can solve this properly if we special case PVC. The question is - is the design extensible in current velero to support it properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar concern with problems because of complex relationships between objects:
- An object may belong to an application which consists of multiple objects
- If one object exists at the time of restore, it may indicate that the entire application is running with many other objects
- We don't know the relationship of these objects, there may be one possibility -- one other object is monitoring the kind of the current object, if it is missing, a new one with the same kind & name will be created
- Then even though we successfully delete the object as part of the recreate logic, there may still be further problems:
- Either the creation of the object as the other part of the recreate logic will fail because the application recreates the object first
- Or the existing application itself goes to a chaos or modifies the newly created object erroneously which cause the object unusable
- Finally, it looks like we have successfully created all the objects due for the restore, but the restored application cannot work appropriately
Totally speaking, this indicates that it is more safe to delete the application/namespace as a whole instead of deleting single objects one by one for this recreate logic, because the latter is not in a consistent manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anshulahuja98 "I thought further on this. I am now trying to think from the angle of, what is recreate exactly solving for the end user. I have a worry that it might lead to the cluster in a bad shape than expected restore with the current design approach of deleting if patch fails."
The main thing this is supposed to resolve is if the backup contains different metadata for immutable fields (or for immutable objects). If the patch attempts to modify an immutable field or an immutable resource, it will fail.
As for the concern that this will break things in some cases (PVCs, etc.), I had similar concerns, which is why my original recommendation was to implement this after we had implemented the per-resource-type overrides from the original design document. Right now, policy is all-or-none. We either do nothing across the board, or we attempt to patch resources across the board, or (with this PR), attempt to delete-and-recreate for everything that fails patching. I'm starting to think we may want to first implement the per-resource override functionality and then the recreate policy. That would allow us, for example, to set the update/patch policy for (almost) everything, but then recreate for just the specific resource type that the user has hit patch failures with. This will limit the potential breakage from the recreate policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might work for purely k8s objects backup.. if there are other CRDs for other controllers, the order of creation of objects may matter and therefore the chances that everything work is not guaranteed.. so perhaps we introduce it as an alpha feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your point is valid for other controllers as well.
But a mass deletion before creation of resources / deleting entire namespaces is still more likely to succeed. Since atleast resources do not stay stuck in deletion for the most part.
I believe we need to go back to design phase for this. Introducing current impl as alpha feature is not useful for any real customer scenario and will introduce more complications in understanding.
But that is just my opinion on this particular feature. Others please share your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to check compete offerings on how they tackle this issue to come up with the design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a mass deletion before creation of resources / deleting entire namespaces is still more likely to succeed.
I disagree on this part.. I've seen resource unable to be created/applied because it's being deleted (blocked by finalizer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless we also mass remove finalizers
pkg/restore/restore.go
Outdated
time.Sleep(10 * time.Second) | ||
continue | ||
} | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a very common scenario that the resources have finalizers which prevent deletion.
Can we perhaps remove those finalizers if deletion doesn't happen in 2mins and try deletion again.
Or otherwise can we try to think through these finalizer cases...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how often are finalizer workloads going to restore/delete cleanly with just a simple finalizer removal?
I think most controllers with finalizer just need extra time to process deletion and go away on its own.. others are intentional deletion blocks that need more attention for deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I get your point. I think we can skip it then as part of the current work.
My thought around finalizers stemmed from the points I have put in the comment above.
resolving this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I think there are more cases that the deletion of objects never succeeds in the recreate situation because of various unpredicted mechanisms, i.e., orders of deletion. Here is one example:
- As the current restore order, the restore of VolumeSnapshot objects is prior to PVC objects
- Suppose there is a PVC with its DataSource referring to a VolumeSnapshot object
- If we delete the VolumeSnapshot object first, it will always fail and complaining that this operation is blocked because a PVC is referring it.
- As far as I remember, the VS object is not marked as on-deletion in this case, that is, even though we delete the PVC later, the VS will not be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaovilai "others are intentional deletion blocks that need more attention for deletion." -- this could be a problem for restore. Following my comment above about implementing this after we have per-resource-type overrides, maybe we could have 2 different policies:
- recreate-if-no-finalizers -- this would attempt to patch, and if patch failed, would delete-and-recreate only if there are no finalizers on the resource.
- recreate-and-delete-finalizers -- this would attempt to patch and if patch failed, would remove finalizers before attempting to delete
The latter is potentially dangerous, so we would not want to allow it as an overall policy, but might make sense for some specific named resource/crd that the user knows is safe to delete-and-recreate in this way.
pkg/restore/restore.go
Outdated
for timeStarted := time.Now(); apierrors.IsNotFound(err) || time.Now().After(timeStarted.Add(2*time.Minute)); { | ||
_, err = resourceClient.Get(obj.GetName(), metav1.GetOptions{}) | ||
if !apierrors.IsNotFound(err) { | ||
ctx.log.Warnf("get attempt to check object is gone failed for %s %s: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe update this warning to add a little context on why we are fetching the object
@kaovilai The PR needs a rebase. |
pkg/restore/restore.go
Outdated
time.Sleep(10 * time.Second) | ||
continue | ||
} | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I think there are more cases that the deletion of objects never succeeds in the recreate situation because of various unpredicted mechanisms, i.e., orders of deletion. Here is one example:
- As the current restore order, the restore of VolumeSnapshot objects is prior to PVC objects
- Suppose there is a PVC with its DataSource referring to a VolumeSnapshot object
- If we delete the VolumeSnapshot object first, it will always fail and complaining that this operation is blocked because a PVC is referring it.
- As far as I remember, the VS object is not marked as on-deletion in this case, that is, even though we delete the PVC later, the VS will not be deleted
@@ -55,10 +55,11 @@ skip restoration. | |||
- Changed resources: Velero will first try to patch the changed resource, Now if the patch: | |||
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff | |||
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error. | |||
3. `recreate`: If resource already exists, then Velero will delete it and recreate the resource. | |||
3. `recreate`: Similar to `update` but if resource already exists and is immutable (such as Pods), then Velero attempt to recreate the resource by deleting it first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar concern with problems because of complex relationships between objects:
- An object may belong to an application which consists of multiple objects
- If one object exists at the time of restore, it may indicate that the entire application is running with many other objects
- We don't know the relationship of these objects, there may be one possibility -- one other object is monitoring the kind of the current object, if it is missing, a new one with the same kind & name will be created
- Then even though we successfully delete the object as part of the recreate logic, there may still be further problems:
- Either the creation of the object as the other part of the recreate logic will fail because the application recreates the object first
- Or the existing application itself goes to a chaos or modifies the newly created object erroneously which cause the object unusable
- Finally, it looks like we have successfully created all the objects due for the restore, but the restored application cannot work appropriately
Totally speaking, this indicates that it is more safe to delete the application/namespace as a whole instead of deleting single objects one by one for this recreate logic, because the latter is not in a consistent manner.
|
||
*Note:* The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope. | ||
Another thing to highlight is that Velero will not be deleting any resources in any of the policy options proposed in | ||
*Note:* The `recreate` option is being implemented in [#6354](https://github.com/vmware-tanzu/velero/pull/6354) separate from original proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than just clarifying this, is it a better idea to modify this design in the current PR to elaborate on the detailed design of the recreate option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going through with the recreate option, then I can add more details here if desired.
} | ||
// Create object from latest backup/restore) | ||
obj.SetNamespace(namespace) | ||
_, err = resourceClient.Create(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above polling timeout, it means the object still exists, then resourceClient.Create
will definitely fail, so maybe we don't need to go with this try, but break by adding the error and explicitly saying that the deletion of the existing object failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
24e31cc
to
b635e85
Compare
from the community call this PR is bumped out of 1.12.0 release while we discuss and update designs. |
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
6e25841
to
51a87d1
Compare
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
51a87d1
to
07c9d6c
Compare
Since this a longstanding issue with many old comments, it would be nice if someone can highlight the overall changes being implemented. In particular, is there any resource type specific logic or delete+recreate is applied to all resource types? One particular scenario comes to mind: if a pod is deleted, deployment may try to start the pod, racing with Velero which also will try to create it? I remember discussions about allowing "replace" for only specific resource types. I am guessing that is not being considered in this PR? |
I'm just going to go with the simplest approach first, and see what reviewers think. Progress over perfection. I'm not sure there's any reasonable logic for recreate that would apply for all resource types; it would be specific to items that are not practical or possible to use update ExistingResourcePolicy.
If it's universally applicable in all k8s, such as pods/deployments then I will try to make it work for that. |
For a pod owned by deployment, we probably don't bother delete+recreate if possible given that you can just update the deployment to achieve this effect. The case that recreate could make sense would be a lone pod without a deployment/replicaset. |
Agreed. That is why I wanted to know if the replace logic uniformly tries to delete and recreate for all resource types or whether it incorporates some specific logic, depending on the resource type. |
I wanted it to be on resource types, but in the issue there were suggestions by @anshulahuja98 that it may need two passes for deletion to prevent being stuck
I will try to make it only delete specific types that's enabled, and only do so on ones without ownerReferences perhaps. |
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
07c9d6c
to
f0075aa
Compare
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
f0075aa
to
8ed1cfb
Compare
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
cd499d1
to
2f21a24
Compare
addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
2f21a24
to
f4a20fc
Compare
Signed-off-by: Tiger Kaovilai <[email protected]> use wait.PollImmediate addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
Signed-off-by: Tiger Kaovilai <[email protected]> use wait.PollImmediate addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
f4a20fc
to
5658921
Compare
Signed-off-by: Tiger Kaovilai <[email protected]> use wait.PollImmediate addresses vmware-tanzu#6354 (review) Signed-off-by: Tiger Kaovilai <[email protected]>
5658921
to
447845d
Compare
Closing, consolidating PVC data restore in-place into #7481. For a single immutable item recreate, the usecase is currently not being considered. |
HackMD where we discussed usecases/solutions: https://hackmd.io/KKWcslVSR3yysufvC76MHg#Solutions-for-usecase
Signed-off-by: Tiger Kaovilai [email protected]
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #6142
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.