Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement recreate ExistingResourcePolicy to restore API #6354

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/unreleased/6354-kaovilai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Implement recreate ExistingResourcePolicy to restore API to enable restore for immutable resources such as Pods.
23 changes: 22 additions & 1 deletion config/crd/v1/bases/velero.io_restores.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,30 @@
type: array
existingResourcePolicy:
description: ExistingResourcePolicy specifies the restore behavior
for the Kubernetes resource to be restored
for the kubernetes resource to be restored
enum:
- none
- update
- recreate
nullable: true
type: string
existingResourcePolicyRecreateResources:
description: ExistingResourcePolicyRecreateResources is a slice of
resource names where recreate ExistingResourcePolicy will apply.
Resoruce names can be in the format of "group/version/resource"

Check failure on line 68 in config/crd/v1/bases/velero.io_restores.yaml

View workflow job for this annotation

GitHub Actions / Run Codespell

Resoruce ==> Resource
or "resource" If empty, a default list of resources (Pod, PV, Secret,
ConfigMap) will be used. Resources with finalizers without velero.io/recreate-remove-finalizer=true
annotation will not be recreated. Recreate will not recreate resources
with ownershipReferences.
example:
- pods
- persistentvolumes
- secrets
- configmaps
items:
type: string
nullable: true
type: array
hooks:
description: Hooks represent custom behaviors that should be executed
during or post restore.
Expand Down
2 changes: 1 addition & 1 deletion config/crd/v1/crds/crds.go

Large diffs are not rendered by default.

44 changes: 37 additions & 7 deletions design/Implemented/existing-resource-policy_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@

## Non Goals
- Change existing restore workflow for `ServiceAccount` objects
- Add support for `ExistingResourcePolicy` as `recreate` for Kubernetes resources. (Future scope feature)

## Unrelated Proposals (Completely different functionalities than the one proposed in the design)
- Add support for `ExistingResourcePolicy` to restore API for Non-Kubernetes resources.
Expand Down Expand Up @@ -55,10 +54,9 @@
- Changed resources: Velero will first try to patch the changed resource, Now if the patch:
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error.
3. `recreate`: If resource already exists, then Velero will delete it and recreate the resource.
3. `recreate`: Similar to `update` but if resource already exists and is immutable (such as Pods), then Velero attempt to recreate the resource by deleting it first. More details below.

*Note:* The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope.
Another thing to highlight is that Velero will not be deleting any resources in any of the policy options proposed in
Velero will not be deleting any resources for `update` and `none` policy proposed in
this design but Velero will patch the resources in `update` policy option.

Example:
Expand Down Expand Up @@ -140,22 +138,38 @@
The `existingResourcePolicy` spec field will be an `PolicyType` type field.

Restore API:
```
```go
type RestoreSpec struct {
.
.
.
// ExistingResourcePolicy specifies the restore behaviour for the Kubernetes resource to be restored
// +kubebuilder:validation:Enum=none;update;recreate
// +optional
// +nullable
ExistingResourcePolicy PolicyType

}
```
PolicyType:
```
```go
type PolicyType string
const PolicyTypeNone PolicyType = "none"
const PolicyTypePatch PolicyType = "update"
const PolicyTypePatch PolicyType = "recreate"
```

existingResourcePolicyRecreateResources will be a new field in the Restore API which will be a slice of resource names where recreate ExistingResourcePolicy will apply. Resoruce names can be in the format of "group/version/resource" or "resource". If empty, a default list of resources (Pod, PV, Secret, ConfigMap) will be used. Recreate will not recreate resources with ownershipReferences or finalizers.

Check failure on line 162 in design/Implemented/existing-resource-policy_design.md

View workflow job for this annotation

GitHub Actions / Run Codespell

Resoruce ==> Resource
```go
// ExistingResourcePolicyRecreateResources is a slice of resource names where recreate ExistingResourcePolicy will apply.
// Resoruce names can be in the format of "group/version/resource" or "resource"

Check failure on line 165 in design/Implemented/existing-resource-policy_design.md

View workflow job for this annotation

GitHub Actions / Run Codespell

Resoruce ==> Resource
// If empty, a default list of resources (Pod, PV, Secret, ConfigMap) will be used.
// Resources with finalizers without velero.io/recreate-remove-finalizer=true annotation will not be recreated.
// Recreate will not recreate resources with ownershipReferences.
// +kubebuilder:example:={"pods", "persistentvolumes", "secrets", "configmaps"}
// +optional
// +nullable
ExistingResourcePolicyRecreateResources []string `json:"existingResourcePolicyRecreateResources,omitempty"`
```

### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API
Expand Down Expand Up @@ -259,4 +273,20 @@
We have decided to go ahead with the implementation of Approach 1 as:
- It is easier to implement
- It is also easier to scale and leaves room for improvement and the door open to expanding to approach 3
- It also provides an option to preserve the existing velero restore workflow
- It also provides an option to preserve the existing velero restore workflow

### Recreate policy details
Existing resource restore policy *Recreate* purpose is to help users restore existing resources that exists in cluster but differ from backup resouce, are immutable and failed to be updated due to immutability.

Check failure on line 279 in design/Implemented/existing-resource-policy_design.md

View workflow job for this annotation

GitHub Actions / Run Codespell

resouce ==> resource

The policy will be used to delete the existing resource and recreate it with the backup resource.

The policy will be used for resources that are immutable and cannot first be updated.

#### Restrictions and Limitations
Resources with OwnerReferences will not be supported by the recreate policy, at least for the initial implementation.
The policy will only be used on resources specified at Restore.Spec. like Pods, PVCs, PVs, etc.

If an existing resource have finalizers, it would need to have annotations `velero.io/recreate-remove-finalizer=true` to be considered for recreation.


*Note:* The `recreate` option is implemented in [#6354](https://github.com/vmware-tanzu/velero/pull/6354) separate from original update policy proposal.
16 changes: 15 additions & 1 deletion pkg/apis/velero/v1/restore_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,22 @@
// +optional
Hooks RestoreHooks `json:"hooks,omitempty"`

// ExistingResourcePolicy specifies the restore behavior for the Kubernetes resource to be restored
// ExistingResourcePolicy specifies the restore behavior for the kubernetes resource to be restored
// +kubebuilder:validation:Enum=none;update;recreate
kaovilai marked this conversation as resolved.
Show resolved Hide resolved
// +optional
// +nullable
ExistingResourcePolicy PolicyType `json:"existingResourcePolicy,omitempty"`

// ExistingResourcePolicyRecreateResources is a slice of resource names where recreate ExistingResourcePolicy will apply.
// Resoruce names can be in the format of "group/version/resource" or "resource"

Check failure on line 119 in pkg/apis/velero/v1/restore_types.go

View workflow job for this annotation

GitHub Actions / Run Codespell

Resoruce ==> Resource
// If empty, a default list of resources (Pod, PV, Secret, ConfigMap) will be used.
// Resources with finalizers without velero.io/recreate-remove-finalizer=true annotation will not be recreated.
// Recreate will not recreate resources with ownershipReferences.
// +kubebuilder:example:={"pods", "persistentvolumes", "secrets", "configmaps"}
// +optional
// +nullable
ExistingResourcePolicyRecreateResources []string `json:"existingResourcePolicyRecreateResources,omitempty"`

// ItemOperationTimeout specifies the time used to wait for RestoreItemAction operations
// The default value is 4 hour.
// +optional
Expand Down Expand Up @@ -309,6 +320,9 @@
// PolicyTypeUpdate means velero will try to attempt a patch on
// the changed resources.
PolicyTypeUpdate PolicyType = "update"

// PolicyTypeRecreate means velero attempt a patch on changed resource and fall back to recreating the resource when patch fails.
PolicyTypeRecreate PolicyType = "recreate"
)

// RestoreStatus captures the current status of a Velero restore
Expand Down
5 changes: 5 additions & 0 deletions pkg/apis/velero/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions pkg/cmd/cli/restore/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@
flags.Var(&o.Labels, "labels", "Labels to apply to the restore.")
flags.Var(&o.IncludeResources, "include-resources", "Resources to include in the restore, formatted as resource.group, such as storageclasses.storage.k8s.io (use '*' for all resources).")
flags.Var(&o.ExcludeResources, "exclude-resources", "Resources to exclude from the restore, formatted as resource.group, such as storageclasses.storage.k8s.io.")
flags.StringVar(&o.ExistingResourcePolicy, "existing-resource-policy", "", "Restore Policy to be used during the restore workflow, can be - none or update")
flags.StringVar(&o.ExistingResourcePolicy, "existing-resource-policy", "", "Restore Policy to be used during the restore workflow, can be - none, update, or recreate")
flags.Var(&o.StatusIncludeResources, "status-include-resources", "Resources to include in the restore status, formatted as resource.group, such as storageclasses.storage.k8s.io.")
flags.Var(&o.StatusExcludeResources, "status-exclude-resources", "Resources to exclude from the restore status, formatted as resource.group, such as storageclasses.storage.k8s.io.")
flags.VarP(&o.Selector, "selector", "l", "Only restore resources matching this label selector.")
Expand Down Expand Up @@ -197,7 +197,7 @@
}

if len(o.ExistingResourcePolicy) > 0 && !isResourcePolicyValid(o.ExistingResourcePolicy) {
return errors.New("existing-resource-policy has invalid value, it accepts only none, update as value")
return errors.New("existing-resource-policy has invalid value, it accepts only none, update, or recreate as value")

Check warning on line 200 in pkg/cmd/cli/restore/create.go

View check run for this annotation

Codecov / codecov/patch

pkg/cmd/cli/restore/create.go#L200

Added line #L200 was not covered by tests
}

switch {
Expand Down Expand Up @@ -422,7 +422,7 @@
}

func isResourcePolicyValid(resourcePolicy string) bool {
if resourcePolicy == string(api.PolicyTypeNone) || resourcePolicy == string(api.PolicyTypeUpdate) {
if resourcePolicy == string(api.PolicyTypeNone) || resourcePolicy == string(api.PolicyTypeUpdate) || resourcePolicy == string(api.PolicyTypeRecreate) {
return true
}
return false
Expand Down
4 changes: 3 additions & 1 deletion pkg/controller/restore_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,9 @@ var nonRestorableResources = []string{
"backuprepositories.velero.io",
}

var ExternalResourcesFinalizer = "restores.velero.io/external-resources-finalizer"
const (
ExternalResourcesFinalizer = "restores.velero.io/external-resources-finalizer"
)

type restoreReconciler struct {
ctx context.Context
Expand Down
99 changes: 91 additions & 8 deletions pkg/restore/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
"os/signal"
"path/filepath"
"sort"
"strconv"
"strings"
"sync"
"time"
Expand Down Expand Up @@ -1621,20 +1622,49 @@
ctx.log.Infof("restore API has resource policy defined %s , executing restore workflow accordingly for changed resource %s %s", resourcePolicy, fromCluster.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))

// existingResourcePolicy is set as none, add warning
if resourcePolicy == velerov1api.PolicyTypeNone {
e := errors.Errorf("could not restore, %s %q already exists. Warning: the in-cluster version is different than the backed-up version",
obj.GetKind(), obj.GetName())
warnings.Add(namespace, e)
// existingResourcePolicy is set as update, attempt patch on the resource and add warning if it fails
} else if resourcePolicy == velerov1api.PolicyTypeUpdate {
couldNotRestoreAlreadyExistsError := errors.Errorf("could not restore, %s %q already exists. Warning: the in-cluster version is different than the backed-up version",
obj.GetKind(), obj.GetName())
switch resourcePolicy {
case velerov1api.PolicyTypeNone:
warnings.Add(namespace, couldNotRestoreAlreadyExistsError)

Check warning on line 1629 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L1628-L1629

Added lines #L1628 - L1629 were not covered by tests
case velerov1api.PolicyTypeUpdate, velerov1api.PolicyTypeRecreate:
// existingResourcePolicy is set as update or recreate, attempt patch on the resource and add warning if it fails
// processing update as existingResourcePolicy
warningsFromUpdateRP, errsFromUpdateRP := ctx.processUpdateResourcePolicy(fromCluster, fromClusterWithLabels, obj, namespace, resourceClient)
if warningsFromUpdateRP.IsEmpty() && errsFromUpdateRP.IsEmpty() {
itemStatus.action = itemRestoreResultUpdated
ctx.restoredItems[itemKey] = itemStatus
}
warnings.Merge(&warningsFromUpdateRP)
errs.Merge(&errsFromUpdateRP)
if !errsFromUpdateRP.IsEmpty() && resourcePolicy == velerov1api.PolicyTypeRecreate {
restoreHasRecreateResourceName := false
fromClusterHasOwnerRef := len(fromCluster.GetOwnerReferences()) > 0
// Only process objects that has no owner references
if !fromClusterHasOwnerRef {
for i, recreateResourceName := range ctx.restore.Spec.ExistingResourcePolicyRecreateResources {
if recreateResourceName == groupResource.String() {
restoreHasRecreateResourceName = true
break

Check warning on line 1646 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L1639-L1646

Added lines #L1639 - L1646 were not covered by tests
}
if i == len(ctx.restore.Spec.ExistingResourcePolicyRecreateResources)-1 {
warnings.Add(namespace, couldNotRestoreAlreadyExistsError)
}

Check warning on line 1650 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L1648-L1650

Added lines #L1648 - L1650 were not covered by tests
}
if restoreHasRecreateResourceName {
// if resourcePolicy is recreate, attempt to recreate if patch had errors
ctx.log.Infof("patch attempt had errors, falling back to recreate due to recreate existingResourcePolicy for %s %s", fromCluster.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))
warningsFromRecreateRP, errsFromRecreateRP := ctx.processRecreateResourcePolicy(fromCluster, fromClusterWithLabels, obj, namespace, resourceClient)
if warningsFromRecreateRP.IsEmpty() && errsFromRecreateRP.IsEmpty() {
itemStatus.action = itemRestoreResultUpdated
ctx.restoredItems[itemKey] = itemStatus
}
warnings.Merge(&warningsFromRecreateRP)
errs.Merge(&errsFromRecreateRP)

Check warning on line 1661 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L1652-L1661

Added lines #L1652 - L1661 were not covered by tests
}
}
} else {
warnings.Merge(&warningsFromUpdateRP)
errs.Merge(&errsFromUpdateRP)
}
}
} else {
// Preserved Velero behavior when existingResourcePolicy is not specified by the user
Expand Down Expand Up @@ -2514,3 +2544,56 @@

return updatedObj, nil
}

const RecreateRemoveFinalizerAnnotation = "velero.io/recreate-remove-finalizer"

// function to process existingResourcePolicy as recreate, tries to delete object first before restoring obj
func (ctx *restoreContext) processRecreateResourcePolicy(fromCluster, fromClusterWithLabels, obj *unstructured.Unstructured, namespace string, resourceClient client.Dynamic) (warnings, errs results.Result) {
ctx.log.Infof("restore API has existingResourcePolicy defined as recreate , executing restore workflow accordingly for changed resource %s %s ", obj.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))
ctx.log.Infof("attempting recreate on %s %q", fromCluster.GetKind(), fromCluster.GetName())
// TODO: check for fromCluster for finalizers, if present, check for annotation to force deletion.
if removeFinalizer, err := strconv.ParseBool(fromCluster.GetAnnotations()[RecreateRemoveFinalizerAnnotation]); err == nil &&
removeFinalizer &&
len(fromCluster.GetFinalizers()) > 0 {
ctx.log.Infof("force deletion of finalizers for %s %q from annotation", fromCluster.GetKind(), fromCluster.GetName())
// patch remove finalizers
_, err := resourceClient.Patch(fromCluster.GetName(), []byte((`{"metadata": {"finalizers": []}}`)))
if err != nil {
ctx.log.Warnf("force deletion of finalizers attempt failed for %s %s: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
warnings.Add(namespace, err)
} else {
ctx.log.Infof("force deletion of finalizers successfully updated for %s %s", fromCluster.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))
}

Check warning on line 2566 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L2551-L2566

Added lines #L2551 - L2566 were not covered by tests
}
// try to delete the object in cluster
err := resourceClient.Delete(obj.GetName(), metav1.DeleteOptions{})
if err != nil {
if !apierrors.IsNotFound(err) {
ctx.log.Errorf("delete attempt failed for %s %s: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
}

Check warning on line 2573 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L2569-L2573

Added lines #L2569 - L2573 were not covered by tests
}
// wait up to 2 minutes until object does not exists in cluster
wait.PollImmediate(5*time.Second, 2*time.Minute, func() (bool, error) {
_, err = resourceClient.Get(obj.GetName(), metav1.GetOptions{})
if !apierrors.IsNotFound(err) {
ctx.log.Infof("waiting until %s %s IsNotFound for recreate existingResourcePolicy. current err from get: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
return false, nil
}
return true, nil

Check warning on line 2582 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L2576-L2582

Added lines #L2576 - L2582 were not covered by tests
})

// Create object from latest backup/restore)
obj.SetNamespace(namespace)
_, err = resourceClient.Create(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above polling timeout, it means the object still exists, then resourceClient.Create will definitely fail, so maybe we don't need to go with this try, but break by adding the error and explicitly saying that the deletion of the existing object failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

if err != nil {
ctx.log.Warnf("create attempt failed for %s %s: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
warnings.Add(namespace, err)
// try just patching the labels
warningsFromUpdate, errsFromUpdate := ctx.updateBackupRestoreLabels(fromCluster, fromClusterWithLabels, namespace, resourceClient)
warnings.Merge(&warningsFromUpdate)
errs.Merge(&errsFromUpdate)
} else {
ctx.log.Infof("%s %s successfully recreated", obj.GroupVersionKind().Kind, kube.NamespaceAndName(obj))
}
return warnings, errs

Check warning on line 2598 in pkg/restore/restore.go

View check run for this annotation

Codecov / codecov/patch

pkg/restore/restore.go#L2586-L2598

Added lines #L2586 - L2598 were not covered by tests
}
Loading
Loading