Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disabled PV re-provisioning by StorageClasses option on restore #8287

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

clcondorcet
Copy link

@clcondorcet clcondorcet commented Oct 11, 2024

Thank you for contributing to Velero!

Please add a summary of your change

This PR addresses a specific disaster recovery use case:

When restoring a cluster after a disaster, PV may or may not have associated snapshots.
However, the underlying volumes from the CSI driver remain intact. I want to relink PVs with existing volumes instead of creating new ones. This PR propose a way to retrieve existing volumes without recreating them.

In my case, the PV has a Reclaim Policy Delete enforced by the StorageClass.
Velero currently does not restore PVs with a Delete policy and no snapshots, which makes sense for regular backup/restore scenarios but can be limiting in disaster recovery situations.

Proposed solution:

This PR introduces a new feature that allows Velero to restore PVs as-is when they meet the following conditions:

  • No snapshot is available.
  • The PV's StorageClass is listed in a newly introduced field in the CRD specifications called disabledPVReprovisioningStorageClasses.

This ensures that Velero can relink the PV to its existing volume and rebind the associated PVC, similar to how it restores PVs with a Reclaim Policy Retain.

Implementation Details:

This solution uses StorageClass names rather than PV names, as some CSI drivers generate random PV names.
Also, using StorageClass makes sense in this context because PVs sharing the same StorageClass generally have similar configurations.

This solution does not bypass snapshots. If snapshots are available, restoring from them is preferred as it is generally more reliable. However, this is open for discussion.

Does your change fix a particular issue?

There is no direct issue but I found two issues that are close to what I've added.

Please indicate you've done the following:

Signed-off-by: Clément Rostagni <[email protected]>
@BarthV
Copy link

BarthV commented Oct 11, 2024

Awesome! We are really interested in this annotation in order to recover all our existing volumes.

@Lyndon-Li
Copy link
Contributor

However, the underlying volumes from the CSI driver remain intact

Does this mean a coincident and unsteady situation -- the CSI driver just doesn't have chance to reclaim the storage volumes of the PVs because of the disaster?
If so, the consequence of volumes are decided by the specific storage and it is unsafe to reuse the volumes.

@Depyume
Copy link

Depyume commented Oct 12, 2024

However, the underlying volumes from the CSI driver remain intact

Does this mean a coincident and unsteady situation -- the CSI driver just doesn't have chance to reclaim the storage volumes of the PVs because of the disaster? If so, the consequence of volumes are decided by the specific storage and it is unsafe to reuse the volumes.

I think the goal is here to support cases where the entire kubernetes cluster is lost (e.g. burnt bare metal servers) and you still want to recover Volumes that may still exist on the storage backend, regardless Reclaim Policies of backuped volumes objects.

Currently Velero doesn't allow "recovering" / "reattaching" such volumes and either ignore them, or force empty volume recreation (expecting it to be recovered from a snapshot)

@reasonerjt reasonerjt added the Needs triage We need discussion to understand problem and decide the priority label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-changelog has-unit-tests Needs triage We need discussion to understand problem and decide the priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants