-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #5802
Comments
I have also opened an issue in velero: vmware-tanzu/velero#6179 |
Would you mind providing the support bundle which includes the logs of CSI sidecars? Maybe it contains any clues about CSI external-snapshotter was triggered by Velero deleting backups. |
@weizhe0422 here the support bundle. Thanks for any help.
supportbundle_a3236774-99ca-4ab5-a2a5-74c925273bb4_2023-05-01T07-20-00Z.zip |
Related to #5797 ? |
@tcoupin thanks but this is not a solution because if I use |
I am also facing the same issue with deleting velero backup is cleaning most of the resources related to corresponding backups like backups.longhorn.io, volumesnapshotcontent, volumesnapshot but snapshot.longhorn.io is still present in the system. And backup started failing when number of snapshot objects increased> ~250. Sharing both longhorn support bundle as well as snapshot controller logs longhorn-support-bundle_8afefff1-085e-4f4e-97ae-3f0a518555ab_2023-05-09T05-19-35Z.zip |
I do not use
|
@tcoupin, Thanks for your help, but this is just a workaround for me. |
When I create a backup via the Longhorn GUI and then delete this backup, no snapshots remain (everything works as expected). |
I reproduced the issue again by creating a VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn
namespace: longhorn-system
labels:
velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: bak VolumeSnapshot apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: new-snapshot-test
namespace: harbor
spec:
volumeSnapshotClassName: longhorn
source:
persistentVolumeClaimName: harbor-jobservice Remove the
|
Today I upgrade Longhorn from v1.4.1 to v1.4.2 and the issue still occurs. 😔😔 |
Today I noticed that I have deployed a snapshot controller like described in this documentation. VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn
namespace: longhorn-system
labels:
velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: bak VolumeSnapshot apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: new-snapshot-test
namespace: harbor
spec:
volumeSnapshotClassName: longhorn
source:
persistentVolumeClaimName: harbor-jobservice Remove the The same issue occurs but there is some interessting logs: |
After I upgrade to the newest RKE2 Helm charts, the error logs mentioned above "finalizers" no longer appears, but the issue still occurs. I upgrade the following Helm releases:
Here the log messages:
Following error looks interesting:
|
@vineetsingh5 do you found a solution for this issue? |
@weizhe0422 do you found something interessting in the support bundle? |
I'm having almost the same setup and versions and the same issue!
|
I update to Velero v1.12.2, using the velero-plugin-for-csi v0.6.2 & velero-plugin-for-aws v1.8.2, but still the same issue. |
I believe what Velero triggered is to delete the corresponding Longhorn backup (out-of-cluster) instead of the Longhorn snapshot (in-cluster, immutable COW layers for volume). In the current design, deleting a backup is independent of deleting the corresponding snapshot generated by that backup. What is your expectation here or a feature you are looking for? Please check if https://longhorn.io/docs/1.5.3/snapshots-and-backups/scheduling-backups-and-snapshots/ works for you, but this is our built-in mechanism nothing related to Velero. @mantissahz please follow up. |
@innobead thanks for your reply.
|
I think this issue can be simplified to completely exclude Velero. At the core the issue here is that Longhorn does not delete snapshots or backups when the backing CSI As a user of Longhorn that is interfacing with CSI and not native Longhorn resources, I expect the state of Longhorn resources to reflect the state of my CSI resources.
Therefore I think it's fair to state that Longhorn is currently only providing a partial implementation of the CSI interface/spec. Velero is just using this common CSI interface as it is intended to be used and expecting it to have the desired effect. This is not a Velero issue. Perhaps this should be opened as a new issue with a smaller scope (CSI spec conformance). |
Thanks for the valuable info. We will improve this, as it's quite important for space efficiency. |
Kind request here for a status. What's the status? This issue somewhat renders using Velero combing with Longhorn somewhat quirky. As we seems to have to clean up snapshots instead of every backup artefact being deleted when a Velero backup is removed by Velero's internal clean up jobs. |
VolumeSnapshot type: snapBoth creation and deletion works
Note that, Longhorn won't delete the latest snapshot which is just behind the VolumeSnapshot type: bakBoth creation and deletion works
As mentioned by @innobead above and @PhanLe1010 in another thread(vmware-tanzu/velero#6179), There are two ways to auto delete these snapshots.
VolumeSnapshot type: biBoth creation and deletion works
cc @R-Studio @julienvincent the csi volumesnapshot creation and deletion functions are implemented in Longhorn. I think we just have some confusion about Snapshot and Backup in Longhorn. |
@ChanYiLin thanks for the great summary. The new setting |
I am going to close the issue for now. |
Describe the bug (🐛 if you encounter this issue)
We are using Velero to create backups from the Kubernetes manifests and the persistent volumes (in our example we backup Harbor).
If we create a backup, Velero saves the K8s manifests to a Object Storage (MinIO) and creates snapshots resources to trigger Longhorn backups with the
velero-plugin-for-csi
. Longhorn writes the backups to another MinIO bucket.If we delete a Velero backup or the backup is expired, the snapshot (
snapshots.longhorn.io
) are not deleted:We are using Velero v1.9.4 with
EnableCSI
feature and the following plugins:We have the same issue in Velero v1.11.0 with
EnableCSI
feature and the following plugins:To Reproduce
Steps to reproduce the behavior:
Schedule
below):velero backup create --from-schedule harbor-daily-0200
velero backup delete <BACKUPNAME>
snapshots.longhorn.io
) is not deleted.Expected behavior
The snapshot is deleted.
Environment
Additional context
Velero Backup Schedule for Harbor
VolumeSnapshotClass
VolumeSnapshotClass
In our second cluster, with Velero v1.11.0 installed, we created the following resource (but same issue here):
VolumeSnapshotLocation
The text was updated successfully, but these errors were encountered: