Skip to content

Commit

Permalink
Implement recreate ExistingResourcePolicy to restore API
Browse files Browse the repository at this point in the history
Signed-off-by: Tiger Kaovilai <[email protected]>

use wait.PollImmediate

addresses vmware-tanzu#6354 (review)

Signed-off-by: Tiger Kaovilai <[email protected]>
  • Loading branch information
kaovilai committed Mar 1, 2024
1 parent 1579842 commit f4a20fc
Show file tree
Hide file tree
Showing 10 changed files with 149 additions and 21 deletions.
1 change: 1 addition & 0 deletions changelogs/unreleased/6354-kaovilai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Implement recreate ExistingResourcePolicy to restore API to enable restore for immutable resources such as Pods.
23 changes: 22 additions & 1 deletion config/crd/v1/bases/velero.io_restores.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,30 @@ spec:
type: array
existingResourcePolicy:
description: ExistingResourcePolicy specifies the restore behavior
for the Kubernetes resource to be restored
for the kubernetes resource to be restored
enum:
- none
- update
- recreate
nullable: true
type: string
existingResourcePolicyRecreateResources:
description: ExistingResourcePolicyRecreateResources is a slice of
resource names where recreate ExistingResourcePolicy will apply.
Resoruce names can be in the format of "group/version/resource"
or "resource" If empty, a default list of resources (Pod, PV, Secret,
ConfigMap) will be used. Resources with finalizers without velero.io/recreate-remove-finalizer=true
annotation will not be recreated. Recreate will not recreate resources
with ownershipReferences.
example:
- pods
- persistentvolumes
- secrets
- configmaps
items:
type: string
nullable: true
type: array
hooks:
description: Hooks represent custom behaviors that should be executed
during or post restore.
Expand Down
2 changes: 1 addition & 1 deletion config/crd/v1/crds/crds.go

Large diffs are not rendered by default.

44 changes: 37 additions & 7 deletions design/Implemented/existing-resource-policy_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ Related issue: https://github.com/vmware-tanzu/velero/issues/4066

## Non Goals
- Change existing restore workflow for `ServiceAccount` objects
- Add support for `ExistingResourcePolicy` as `recreate` for Kubernetes resources. (Future scope feature)

## Unrelated Proposals (Completely different functionalities than the one proposed in the design)
- Add support for `ExistingResourcePolicy` to restore API for Non-Kubernetes resources.
Expand Down Expand Up @@ -55,10 +54,9 @@ skip restoration.
- Changed resources: Velero will first try to patch the changed resource, Now if the patch:
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error.
3. `recreate`: If resource already exists, then Velero will delete it and recreate the resource.
3. `recreate`: Similar to `update` but if resource already exists and is immutable (such as Pods), then Velero attempt to recreate the resource by deleting it first. More details below.

*Note:* The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope.
Another thing to highlight is that Velero will not be deleting any resources in any of the policy options proposed in
Velero will not be deleting any resources for `update` and `none` policy proposed in
this design but Velero will patch the resources in `update` policy option.

Example:
Expand Down Expand Up @@ -140,22 +138,38 @@ existingResourcePolicyOverrides:
The `existingResourcePolicy` spec field will be an `PolicyType` type field.

Restore API:
```
```go
type RestoreSpec struct {
.
.
.
// ExistingResourcePolicy specifies the restore behaviour for the Kubernetes resource to be restored
// +kubebuilder:validation:Enum=none;update;recreate
// +optional
// +nullable
ExistingResourcePolicy PolicyType

}
```
PolicyType:
```
```go
type PolicyType string
const PolicyTypeNone PolicyType = "none"
const PolicyTypePatch PolicyType = "update"
const PolicyTypePatch PolicyType = "recreate"
```

existingResourcePolicyRecreateResources will be a new field in the Restore API which will be a slice of resource names where recreate ExistingResourcePolicy will apply. Resoruce names can be in the format of "group/version/resource" or "resource". If empty, a default list of resources (Pod, PV, Secret, ConfigMap) will be used. Recreate will not recreate resources with ownershipReferences or finalizers.
```go
// ExistingResourcePolicyRecreateResources is a slice of resource names where recreate ExistingResourcePolicy will apply.
// Resoruce names can be in the format of "group/version/resource" or "resource"
// If empty, a default list of resources (Pod, PV, Secret, ConfigMap) will be used.
// Resources with finalizers without velero.io/recreate-remove-finalizer=true annotation will not be recreated.
// Recreate will not recreate resources with ownershipReferences.
// +kubebuilder:example:={"pods", "persistentvolumes", "secrets", "configmaps"}
// +optional
// +nullable
ExistingResourcePolicyRecreateResources []string `json:"existingResourcePolicyRecreateResources,omitempty"`
```

### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API
Expand Down Expand Up @@ -259,4 +273,20 @@ Restore describer will also be updated to reflect the policy `pkg/cmd/util/outpu
We have decided to go ahead with the implementation of Approach 1 as:
- It is easier to implement
- It is also easier to scale and leaves room for improvement and the door open to expanding to approach 3
- It also provides an option to preserve the existing velero restore workflow
- It also provides an option to preserve the existing velero restore workflow

### Recreate policy details
Existing resource restore policy *Recreate* purpose is to help users restore existing resources that exists in cluster but differ from backup resouce, are immutable and failed to be updated due to immutability.

The policy will be used to delete the existing resource and recreate it with the backup resource.

The policy will be used for resources that are immutable and cannot first be updated.

#### Restrictions and Limitations
Resources with OwnerReferences will not be supported by the recreate policy, at least for the initial implementation.
The policy will only be used on resources specified at Restore.Spec. like Pods, PVCs, PVs, etc.

If an existing resource have finalizers, it would need to have annotations `velero.io/recreate-remove-finalizer=true` to be considered for recreation.


*Note:* The `recreate` option is implemented in [#6354](https://github.com/vmware-tanzu/velero/pull/6354) separate from original update policy proposal.
16 changes: 15 additions & 1 deletion pkg/apis/velero/v1/restore_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,22 @@ type RestoreSpec struct {
// +optional
Hooks RestoreHooks `json:"hooks,omitempty"`

// ExistingResourcePolicy specifies the restore behavior for the Kubernetes resource to be restored
// ExistingResourcePolicy specifies the restore behavior for the kubernetes resource to be restored
// +kubebuilder:validation:Enum=none;update;recreate
// +optional
// +nullable
ExistingResourcePolicy PolicyType `json:"existingResourcePolicy,omitempty"`

// ExistingResourcePolicyRecreateResources is a slice of resource names where recreate ExistingResourcePolicy will apply.
// Resoruce names can be in the format of "group/version/resource" or "resource"
// If empty, a default list of resources (Pod, PV, Secret, ConfigMap) will be used.
// Resources with finalizers without velero.io/recreate-remove-finalizer=true annotation will not be recreated.
// Recreate will not recreate resources with ownershipReferences.
// +kubebuilder:example:={"pods", "persistentvolumes", "secrets", "configmaps"}
// +optional
// +nullable
ExistingResourcePolicyRecreateResources []string `json:"existingResourcePolicyRecreateResources,omitempty"`

// ItemOperationTimeout specifies the time used to wait for RestoreItemAction operations
// The default value is 4 hour.
// +optional
Expand Down Expand Up @@ -309,6 +320,9 @@ const (
// PolicyTypeUpdate means velero will try to attempt a patch on
// the changed resources.
PolicyTypeUpdate PolicyType = "update"

// PolicyTypeRecreate means velero attempt a patch on changed resource and fall back to recreating the resource when patch fails.
PolicyTypeRecreate PolicyType = "recreate"
)

// RestoreStatus captures the current status of a Velero restore
Expand Down
5 changes: 5 additions & 0 deletions pkg/apis/velero/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions pkg/cmd/cli/restore/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ func (o *CreateOptions) BindFlags(flags *pflag.FlagSet) {
flags.Var(&o.Labels, "labels", "Labels to apply to the restore.")
flags.Var(&o.IncludeResources, "include-resources", "Resources to include in the restore, formatted as resource.group, such as storageclasses.storage.k8s.io (use '*' for all resources).")
flags.Var(&o.ExcludeResources, "exclude-resources", "Resources to exclude from the restore, formatted as resource.group, such as storageclasses.storage.k8s.io.")
flags.StringVar(&o.ExistingResourcePolicy, "existing-resource-policy", "", "Restore Policy to be used during the restore workflow, can be - none or update")
flags.StringVar(&o.ExistingResourcePolicy, "existing-resource-policy", "", "Restore Policy to be used during the restore workflow, can be - none, update, or recreate")
flags.Var(&o.StatusIncludeResources, "status-include-resources", "Resources to include in the restore status, formatted as resource.group, such as storageclasses.storage.k8s.io.")
flags.Var(&o.StatusExcludeResources, "status-exclude-resources", "Resources to exclude from the restore status, formatted as resource.group, such as storageclasses.storage.k8s.io.")
flags.VarP(&o.Selector, "selector", "l", "Only restore resources matching this label selector.")
Expand Down Expand Up @@ -197,7 +197,7 @@ func (o *CreateOptions) Validate(c *cobra.Command, args []string, f client.Facto
}

if len(o.ExistingResourcePolicy) > 0 && !isResourcePolicyValid(o.ExistingResourcePolicy) {
return errors.New("existing-resource-policy has invalid value, it accepts only none, update as value")
return errors.New("existing-resource-policy has invalid value, it accepts only none, update, or recreate as value")
}

switch {
Expand Down Expand Up @@ -422,7 +422,7 @@ func (o *CreateOptions) Run(c *cobra.Command, f client.Factory) error {
}

func isResourcePolicyValid(resourcePolicy string) bool {
if resourcePolicy == string(api.PolicyTypeNone) || resourcePolicy == string(api.PolicyTypeUpdate) {
if resourcePolicy == string(api.PolicyTypeNone) || resourcePolicy == string(api.PolicyTypeUpdate) || resourcePolicy == string(api.PolicyTypeRecreate) {
return true
}
return false
Expand Down
5 changes: 4 additions & 1 deletion pkg/controller/restore_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,10 @@ var nonRestorableResources = []string{
"backuprepositories.velero.io",
}

var ExternalResourcesFinalizer = "restores.velero.io/external-resources-finalizer"
const (
ExternalResourcesFinalizer = "restores.velero.io/external-resources-finalizer"
RecreateRemoveFinalizerAnnotation = "velero.io/recreate-remove-finalizer"
)

type restoreReconciler struct {
ctx context.Context
Expand Down
60 changes: 55 additions & 5 deletions pkg/restore/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -1621,20 +1621,33 @@ func (ctx *restoreContext) restoreItem(obj *unstructured.Unstructured, groupReso
ctx.log.Infof("restore API has resource policy defined %s , executing restore workflow accordingly for changed resource %s %s", resourcePolicy, fromCluster.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))

// existingResourcePolicy is set as none, add warning
if resourcePolicy == velerov1api.PolicyTypeNone {
switch resourcePolicy {
case velerov1api.PolicyTypeNone:
e := errors.Errorf("could not restore, %s %q already exists. Warning: the in-cluster version is different than the backed-up version",
obj.GetKind(), obj.GetName())
warnings.Add(namespace, e)
// existingResourcePolicy is set as update, attempt patch on the resource and add warning if it fails
} else if resourcePolicy == velerov1api.PolicyTypeUpdate {
case velerov1api.PolicyTypeUpdate, velerov1api.PolicyTypeRecreate:
// existingResourcePolicy is set as update or recreate, attempt patch on the resource and add warning if it fails
// processing update as existingResourcePolicy
warningsFromUpdateRP, errsFromUpdateRP := ctx.processUpdateResourcePolicy(fromCluster, fromClusterWithLabels, obj, namespace, resourceClient)
if warningsFromUpdateRP.IsEmpty() && errsFromUpdateRP.IsEmpty() {
itemStatus.action = itemRestoreResultUpdated
ctx.restoredItems[itemKey] = itemStatus
}
warnings.Merge(&warningsFromUpdateRP)
errs.Merge(&errsFromUpdateRP)
// if resourcePolicy is recreate, attempt to recreate if patch had errors
if !errsFromUpdateRP.IsEmpty() && resourcePolicy == velerov1api.PolicyTypeRecreate {
ctx.log.Infof("patch attempt had errors, falling back to recreate due to recreate existingResourcePolicy for %s %s", fromCluster.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))
warningsFromRecreateRP, errsFromRecreateRP := ctx.processRecreateResourcePolicy(fromCluster, fromClusterWithLabels, obj, namespace, resourceClient)
if warningsFromRecreateRP.IsEmpty() && errsFromRecreateRP.IsEmpty() {
itemStatus.action = itemRestoreResultUpdated
ctx.restoredItems[itemKey] = itemStatus
}
warnings.Merge(&warningsFromRecreateRP)
errs.Merge(&errsFromRecreateRP)
} else {
warnings.Merge(&warningsFromUpdateRP)
errs.Merge(&errsFromUpdateRP)
}
}
} else {
// Preserved Velero behavior when existingResourcePolicy is not specified by the user
Expand Down Expand Up @@ -2514,3 +2527,40 @@ func (ctx *restoreContext) handleSkippedPVHasRetainPolicy(

return updatedObj, nil
}

// function to process existingResourcePolicy as recreate, tries to delete object first before restoring obj
func (ctx *restoreContext) processRecreateResourcePolicy(fromCluster, fromClusterWithLabels, obj *unstructured.Unstructured, namespace string, resourceClient client.Dynamic) (warnings, errs results.Result) {
ctx.log.Infof("restore API has existingResourcePolicy defined as recreate , executing restore workflow accordingly for changed resource %s %s ", obj.GroupVersionKind().Kind, kube.NamespaceAndName(fromCluster))
ctx.log.Infof("attempting recreate on %s %q", fromCluster.GetKind(), fromCluster.GetName())
// try to delete the object in cluster
err := resourceClient.Delete(obj.GetName(), metav1.DeleteOptions{})
if err != nil {
if !apierrors.IsNotFound(err) {
ctx.log.Errorf("delete attempt failed for %s %s: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
}
}
// wait up to 2 minutes until object does not exists in cluster
wait.PollImmediate(5*time.Second, 2*time.Minute, func() (bool, error) {
_, err = resourceClient.Get(obj.GetName(), metav1.GetOptions{})
if !apierrors.IsNotFound(err) {
ctx.log.Infof("waiting until %s %s IsNotFound for recreate existingResourcePolicy. current err from get: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
return false, nil
}
return true, nil
})

// Create object from latest backup/restore)
obj.SetNamespace(namespace)
_, err = resourceClient.Create(obj)
if err != nil {
ctx.log.Warnf("create attempt failed for %s %s: %v", fromCluster.GroupVersionKind(), kube.NamespaceAndName(fromCluster), err)
warnings.Add(namespace, err)
// try just patching the labels
warningsFromUpdate, errsFromUpdate := ctx.updateBackupRestoreLabels(fromCluster, fromClusterWithLabels, namespace, resourceClient)
warnings.Merge(&warningsFromUpdate)
errs.Merge(&errsFromUpdate)
} else {
ctx.log.Infof("%s %s successfully recreated", obj.GroupVersionKind().Kind, kube.NamespaceAndName(obj))
}
return warnings, errs
}
Loading

0 comments on commit f4a20fc

Please sign in to comment.