[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #5802

R-Studio · 2023-04-24T14:07:24Z

Describe the bug (🐛 if you encounter this issue)

We are using Velero to create backups from the Kubernetes manifests and the persistent volumes (in our example we backup Harbor).
If we create a backup, Velero saves the K8s manifests to a Object Storage (MinIO) and creates snapshots resources to trigger Longhorn backups with the velero-plugin-for-csi. Longhorn writes the backups to another MinIO bucket.
If we delete a Velero backup or the backup is expired, the snapshot (snapshots.longhorn.io) are not deleted:

We are using Velero v1.9.4 with EnableCSI feature and the following plugins:

velero/velero-plugin-for-csi:v0.4.0
velero/velero-plugin-for-aws:v1.6.0

We have the same issue in Velero v1.11.0 with EnableCSI feature and the following plugins:

velero/velero-plugin-for-csi:v0.5.0
velero/velero-plugin-for-aws:v1.6.0

To Reproduce

Steps to reproduce the behavior:

Install the newest version of Velero and Rancher-Longhorn
In Longhorn configre a S3 Backup Target (we are usng MinIO for this)
Enable CSI Snapshot Support for Longhorn.
Create a backup (for example with the Schedule below): velero backup create --from-schedule harbor-daily-0200
Delete the backup velero backup delete <BACKUPNAME>
The snapshot (snapshots.longhorn.io) is not deleted.

Expected behavior

The snapshot is deleted.

Environment

Longhorn version: 102.2.0+up1.4.1
Velero version:
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher-Longhorn Helm Chart
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE2, v1.25.7+rke2r1
- Number of management node in the cluster: 1x
- Number of worker node in the cluster: 3x
Node config
- OS type and version: Ubuntu
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): VMs on Proxmox
Number of Longhorn volumes in the cluster: 17

Additional context

Velero Backup Schedule for Harbor

---
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: harbor-daily-0200
  namespace: velero #Must be the namespace of the Velero server
spec:
  schedule: 0 0 * * *
  template:
    includedNamespaces:
    - 'harbor'
    includedResources:
    - '*'
    snapshotVolumes: true
    storageLocation: minio
    volumeSnapshotLocations:
      - longhorn
    ttl: 168h0m0s #7 Days retention
    defaultVolumesToRestic: false
    hooks:
      resources:
        - name: postgresql
          includedNamespaces:
          - 'harbor'
          includedResources:
          - pods
          excludedResources: []
          labelSelector:
            matchLabels:
              statefulset.kubernetes.io/pod-name: harbor-database-0
          pre:
            - exec:
                container: database
                command:
                  - /bin/bash
                  - -c
                  - "psql -U postgres -c \"CHECKPOINT\";"
                onError: Fail
                timeout: 30s

VolumeSnapshotClass

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: driver.longhorn.io
deletionPolicy: Delete

VolumeSnapshotClass

In our second cluster, with Velero v1.11.0 installed, we created the following resource (but same issue here):

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

VolumeSnapshotLocation

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: longhorn
  namespace: velero
spec:
  provider: longhorn.io/longhorn

The text was updated successfully, but these errors were encountered:

R-Studio · 2023-04-24T14:26:02Z

I have also opened an issue in velero: vmware-tanzu/velero#6179

R-Studio · 2023-04-24T14:35:36Z

Maybe this issue is related to:

Can't delete snapshot after velero backup delete vmware-tanzu/velero#4383
Velero is not deleting CSI snapshots when it removes the backup after retention period vmware-tanzu/velero#3465
[BUG] Scheduled backups didn't complete and leave tons of snapshots behind #2029
[QUESTION] Delete snapshot after successful backup ? #5796

weizhe0422 · 2023-04-27T16:32:41Z

Would you mind providing the support bundle which includes the logs of CSI sidecars? Maybe it contains any clues about CSI external-snapshotter was triggered by Velero deleting backups.

R-Studio · 2023-05-01T07:24:08Z

@weizhe0422 here the support bundle. Thanks for any help.
Info

Start Backup: 2023-05-01 09:12:59 +0200 CEST
Complete Backup: 2023-05-01 09:14:29 +0200 CEST
Delete Backup: 2023-05-01 09:17:34 +0200 CEST

supportbundle_a3236774-99ca-4ab5-a2a5-74c925273bb4_2023-05-01T07-20-00Z.zip

tcoupin · 2023-05-06T15:12:35Z

Related to #5797 ?

R-Studio · 2023-05-08T09:53:50Z

@tcoupin thanks but this is not a solution because if I use --snapshot-volumes=false then velero does not trigger a backup for the persistent volumes. So Velero only backups the manifests/YAML's.

vineetsingh5 · 2023-05-09T06:16:55Z

I am also facing the same issue with
velero: v1.9.1
velero-plugin-for-csi: v0.4.1
longhorn version: 1.3.1

deleting velero backup is cleaning most of the resources related to corresponding backups like backups.longhorn.io, volumesnapshotcontent, volumesnapshot but snapshot.longhorn.io is still present in the system.

And backup started failing when number of snapshot objects increased> ~250.

Sharing both longhorn support bundle as well as snapshot controller logs

longhorn-support-bundle_8afefff1-085e-4f4e-97ae-3f0a518555ab_2023-05-09T05-19-35Z.zip

snapshot-controller.log

tcoupin · 2023-05-09T09:23:08Z

I do not use --snapshot-volumes=false but I add a cronjob who deletes the snapshot.longhorn.io refered by backups.longhorn.io.

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: longhorn-system
  name: snapshot-cleaner
rules:
- apiGroups:
  - longhorn.io
  resources:
  - backups
  verbs:
  - 'list'
- apiGroups:
  - longhorn.io
  resources:
  - snapshots
  verbs:
  - 'list'
  - 'delete'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: snapshot-cleaner
  namespace: longhorn-system
subjects:
- kind: ServiceAccount
  name: sa-snapshot-cleaner
  namespace: longhorn-system
roleRef:
  kind: Role
  name: snapshot-cleaner
  apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-snapshot-cleaner
  namespace: longhorn-system
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: snapshot-cleaner
  namespace: longhorn-system
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      activeDeadlineSeconds: 30
      template:
        metadata:
          creationTimestamp: null
        spec:
          affinity: {}
          containers:
            - args:
                - '-c'
                - >-
                  cat <(kubectl get backups.longhorn.io -n longhorn-system -o
                  custom-columns=SNAPSHOT:.spec.snapshotName | grep
                  '^snapshot-'|sort|uniq) <(kubectl get snapshot.longhorn.io -n
                  longhorn-system -o custom-columns=SNAPSHOT:.metadata.name |
                  grep '^snapshot-'|sort|uniq)|sort|uniq -c|awk '$1==2 {print
                  $2}'|grep -v '^\n$'|xargs kubectl delete snapshot.longhorn.io
                  -n longhorn-system --wait=false
              command:
                - /bin/bash
              image: bitnami/kubectl:latest
              imagePullPolicy: Always
              name: snapshot-cleaner
              resources: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccount: sa-snapshot-cleaner
          serviceAccountName: sa-snapshot-cleaner
          terminationGracePeriodSeconds: 30
  schedule: '*/5 * * * *'
  successfulJobsHistoryLimit: 3
  suspend: false

R-Studio · 2023-05-09T12:44:46Z

@tcoupin, Thanks for your help, but this is just a workaround for me.

R-Studio · 2023-05-15T08:39:30Z

When I create a backup via the Longhorn GUI and then delete this backup, no snapshots remain (everything works as expected).

R-Studio · 2023-05-15T09:04:12Z

I reproduced the issue again by creating a VolumeSnapshot resource and afterwards delete that and the issue occurs again.

VolumeSnapshotClass

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

VolumeSnapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-test
  namespace: harbor
spec:
  volumeSnapshotClassName: longhorn
  source:
    persistentVolumeClaimName: harbor-jobservice

Remove the VolumeSnapshot: kubectl delete volumesnapshot new-snapshot-test -n harbor

Why does Longhorn remain the snapshot?
Why does it work in the GUI but not without it?

R-Studio · 2023-05-15T09:50:42Z

I still found something interesting in the snapshot-controller logs:

But in the logs there is no error for deleting the snapshot:

R-Studio · 2023-06-05T06:39:36Z

Today I upgrade Longhorn from v1.4.1 to v1.4.2 and the issue still occurs. 😔😔

R-Studio · 2023-06-06T12:05:29Z

Today I noticed that I have deployed a snapshot controller like described in this documentation.
Although I already had an snapshot controller "rke2-snapshot-controller" on my cluster. I am not sure if this is comes with a rancher update or something. Anyway I removed my snapshot-controller and test the issue again.

VolumeSnapshotClass

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

VolumeSnapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-test
  namespace: harbor
spec:
  volumeSnapshotClassName: longhorn
  source:
    persistentVolumeClaimName: harbor-jobservice

Remove the VolumeSnapshot: kubectl delete volumesnapshot new-snapshot-test -n harbor

The same issue occurs but there is some interessting logs:

After I created the VolumeSnapshot the following logs are written:

R-Studio · 2023-06-06T12:39:56Z

After I upgrade to the newest RKE2 Helm charts, the error logs mentioned above "finalizers" no longer appears, but the issue still occurs.

I upgrade the following Helm releases:

rke2-snapshot-controller: 1.7.201 -> 1.7.202
rke2-snapshot-controller-crd: 1.7.201 -> 1.7.202
rke2-snapshot-validation.webhook: 1.7.200 -> 1.7.201

Here the log messages:

2023-06-06T14:36:39+02:00	I0606 12:36:39.815026       1 snapshot_controller_base.go:213] deletion of content "snapcontent-20d81f05-864b-489e-8875-3ea71832a743" was already processed
2023-06-06T14:36:38+02:00	E0606 12:36:38.814935       1 snapshot_controller_base.go:265] could not sync content "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": snapshot controller failed to update snapcontent-20d81f05-864b-489e-8875-3ea71832a743 on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshotcontents/snapcontent-20d81f05-864b-489e-8875-3ea71832a743, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d376de84-4aa4-426e-8c97-f96df3073b73, UID in object meta: 
2023-06-06T14:36:38+02:00	time="2023-06-06T12:36:38Z" level=info msg="DeleteSnapshot: rsp: {}"
2023-06-06T14:36:38+02:00	time="2023-06-06T12:36:38Z" level=info msg="DeleteSnapshot: req: {\"snapshot_id\":\"bak://pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\"}"
2023-06-06T14:35:12+02:00	I0606 12:35:12.301474       1 snapshot_controller.go:998] checkandRemovePVCFinalizer[new-snapshot-test]: Remove Finalizer for PVC harbor-jobservice as it is not used by snapshots in creation
2023-06-06T14:35:12+02:00	I0606 12:35:12.296417       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"harbor", Name:"new-snapshot-test", UID:"20d81f05-864b-489e-8875-3ea71832a743", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"181825436", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot harbor/new-snapshot-test is ready to use.
2023-06-06T14:35:12+02:00	I0606 12:35:12.296353       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"harbor", Name:"new-snapshot-test", UID:"20d81f05-864b-489e-8875-3ea71832a743", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"181825436", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot harbor/new-snapshot-test was successfully created by the CSI driver.
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=info msg="CreateSnapshot: rsp: {\"snapshot\":{\"creation_time\":{\"seconds\":1686054902},\"ready_to_use\":true,\"size_bytes\":1073741824,\"snapshot_id\":\"bak://pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\",\"source_volume_id\":\"pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c\"}}"
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=debug msg="ControllerServer CreateSnapshot rsp: snapshot:<size_bytes:1073741824 snapshot_id:\"bak://pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\" source_volume_id:\"pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c\" creation_time:<seconds:1686054902 > ready_to_use:true > "
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=info msg="createCSISnapshotTypeLonghornBackup: volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c backup backup-3e65b0ef494940b8 of snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743 in progress"
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=info msg="Backup backup-3e65b0ef494940b8 initiated for volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c for snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743"
2023-06-06T14:35:08+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:08Z" level=info msg="Done initiating backup creation, received backupID: backup-3e65b0ef494940b8"
2023-06-06T14:35:06+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:06Z" level=info msg="Loaded driver for s3://t1-longhorn-snapshots@minio/" pkg=s3
2023-06-06T14:35:06+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:06Z" level=info msg="Start creating backup backup-3e65b0ef494940b8" pkg=backup
2023-06-06T14:35:06+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:06Z" level=info msg="Initializing backup backup-3e65b0ef494940b8 for volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743" pkg=backup
2023-06-06T14:35:06+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:06Z" level=info msg="Backing up snapshot-20d81f05-864b-489e-8875-3ea71832a743 on tcp://10.42.1.154:10060, to s3://t1-longhorn-snapshots@minio/"
2023-06-06T14:35:06+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:06Z" level=info msg="Backing up snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743 to backup backup-3e65b0ef494940b8" serviceURL="10.42.2.8:10009"
2023-06-06T14:35:04+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:04Z" level=info msg="Done initiating backup creation, received backupID: backup-3e65b0ef494940b8"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Loaded driver for s3://t1-longhorn-snapshots@minio/" pkg=s3
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Start creating backup backup-3e65b0ef494940b8" pkg=backup
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Initializing backup backup-3e65b0ef494940b8 for volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743" pkg=backup
2023-06-06T14:35:02+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:02Z" level=info msg="Backing up snapshot-20d81f05-864b-489e-8875-3ea71832a743 on tcp://10.42.1.154:10060, to s3://t1-longhorn-snapshots@minio/"
2023-06-06T14:35:02+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:02Z" level=info msg="Backing up snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743 to backup backup-3e65b0ef494940b8" serviceURL="10.42.2.8:10009"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="createCSISnapshotTypeLonghornBackup: volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c initiating backup for snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished snapshot" snapshot=snapshot-20d81f05-864b-489e-8875-3ea71832a743 volume=pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished to snapshot: 10.42.3.102:10105 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Finished to snapshot: 10.42.1.154:10060 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Removing disk volume-head-004.img"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished creating disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-99a60236/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk checksum file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-99a60236/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.checksum before linking"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-46492401] time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk metadata file path /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-99a60236/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.meta before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Removing disk volume-head-004.img"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished creating disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-5286f5ef/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk checksum file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-5286f5ef/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.checksum before linking"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk metadata file path /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-5286f5ef/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.meta before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to create disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Finished to snapshot: 10.42.2.93:10045 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-46492401] time="2023-06-06T12:35:02Z" level=info msg="Replica server starts to snapshot [snapshot-20d81f05-864b-489e-8875-3ea71832a743] volume, user created true, created time 2023-06-06T12:35:02Z, labels map[type:bak]"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Removing disk volume-head-003.img"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished creating disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-90ade5bb/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk checksum file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-90ade5bb/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.checksum before linking"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-ff57e501] time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk metadata file path /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-90ade5bb/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.meta before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to create disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Replica server starts to snapshot [snapshot-20d81f05-864b-489e-8875-3ea71832a743] volume, user created true, created time 2023-06-06T12:35:02Z, labels map[type:bak]"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to create disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to snapshot: 10.42.1.154:10060 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Starting to snapshot: 10.42.2.93:10045 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-ff57e501] time="2023-06-06T12:35:02Z" level=info msg="Replica server starts to snapshot [snapshot-20d81f05-864b-489e-8875-3ea71832a743] volume, user created true, created time 2023-06-06T12:35:02Z, labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Starting to snapshot: 10.42.3.102:10105 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Requesting system sync before snapshot" snapshot=snapshot-20d81f05-864b-489e-8875-3ea71832a743 volume=pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Starting snapshot" snapshot=snapshot-20d81f05-864b-489e-8875-3ea71832a743 volume=pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c
2023-06-06T14:35:02+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:02Z" level=info msg="Snapshotting volume: snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743" serviceURL="10.42.2.8:10009"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="createCSISnapshotTypeLonghornBackup: volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c initiating snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="CreateSnapshot: req: {\"name\":\"snapshot-20d81f05-864b-489e-8875-3ea71832a743\",\"parameters\":{\"type\":\"bak\"},\"source_volume_id\":\"pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c\"}"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="GetPluginInfo: rsp: {\"name\":\"driver.longhorn.io\",\"vendor_version\":\"v1.4.2\"}"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="GetPluginInfo: req: {}"
2023-06-06T14:35:02+02:00	I0606 12:35:02.112599       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"harbor", Name:"new-snapshot-test", UID:"20d81f05-864b-489e-8875-3ea71832a743", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"181825428", FieldPath:""}): type: 'Normal' reason: 'CreatingSnapshot' Waiting for a snapshot harbor/new-snapshot-test to be created by the CSI driver.
2023-06-06T14:35:02+02:00	I0606 12:35:02.112195       1 snapshot_controller.go:291] createSnapshotWrapper: Creating snapshot for content snapcontent-20d81f05-864b-489e-8875-3ea71832a743 through the plugin ...
2023-06-06T14:35:02+02:00	I0606 12:35:02.106794       1 snapshot_controller.go:919] Added protection finalizer to persistent volume claim harbor/harbor-jobservice
2023-06-06T14:35:02+02:00	I0606 12:35:02.093901       1 snapshot_controller.go:638] createSnapshotContent: Creating content for snapshot harbor/new-snapshot-test through the plugin ...
2023-06-06T14:35:01+02:00	time="2023-06-06T12:35:01Z" level=debug msg="Setting allow-recurring-job-while-volume-detached is false

Following error looks interesting:

snapshot_controller_base.go:265] could not sync content "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": snapshot controller failed to update snapcontent-20d81f05-864b-489e-8875-3ea71832a743 on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshotcontents/snapcontent-20d81f05-864b-489e-8875-3ea71832a743, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d376de84-4aa4-426e-8c97-f96df3073b73, UID in object meta:`

R-Studio · 2023-07-17T06:31:34Z

@vineetsingh5 do you found a solution for this issue?

R-Studio · 2023-07-17T06:32:46Z

@weizhe0422 do you found something interessting in the support bundle?

anthony-pastor · 2023-10-06T08:26:09Z

I'm having almost the same setup and versions and the same issue!
One interesting log line found on longhorn-csi-plugin:

longhorn-csi-plugin-5k8lg longhorn-csi-plugin time="2023-10-06T08:12:20Z" level=info msg="DeleteSnapshot: req: {\"snapshot_id\":\"bak://pvc-c57da450-ce82-44c8-ac83-0a039634a334/backup-04db0d0fe4ef49f1\"}" longhorn-csi-plugin-5k8lg longhorn-csi-plugin time="2023-10-06T08:12:20Z" level=info msg="DeleteSnapshot: rsp: {}" csi-snapshotter-5d899fdcfc-xv627 csi-snapshotter E1006 08:12:20.143392 1 snapshot_controller_base.go:265] could not sync content "snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e": snapshot controller failed to update snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshotcontents/snapcontent-55c4399b-1dec-4cf2-b9bd-a4eff27f315e, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d4fa9b1d-416e-4df5-ad74-d3ac6bec3b66, UID in object meta:

R-Studio · 2023-12-05T11:42:56Z

I update to Velero v1.12.2, using the velero-plugin-for-csi v0.6.2 & velero-plugin-for-aws v1.8.2, but still the same issue.

innobead · 2023-12-05T13:43:00Z

Delete the backup velero backup delete
The snapshot (snapshots.longhorn.io) is not deleted.

I believe what Velero triggered is to delete the corresponding Longhorn backup (out-of-cluster) instead of the Longhorn snapshot (in-cluster, immutable COW layers for volume).

In the current design, deleting a backup is independent of deleting the corresponding snapshot generated by that backup. What is your expectation here or a feature you are looking for?

Please check if https://longhorn.io/docs/1.5.3/snapshots-and-backups/scheduling-backups-and-snapshots/ works for you, but this is our built-in mechanism nothing related to Velero.

@mantissahz please follow up.

R-Studio · 2023-12-05T15:44:35Z

@innobead thanks for your reply.
What I want: I have a Velero schedule that creates/triggers backup of my persistent volumes with a retention period of e.g. 7 days. After this retention period 7 days Velero deletes these backups, but the corresponding snapshots are not deleted and consumes disk space that I don't want.
As a workaround, I have a recurring job that deletes these snapshots (retain 7), but there are two disadvantages.

I'm using up disk space for snapshots I don't want and that are stored in my object store already.
For example, if I trigger 3 manual backups with Velero, the recurring job doesn't delete the snapshots based on the creation timestamp like Velero does. This means that I lose backup data that is older than 4 days.

julienvincent · 2023-12-26T16:16:33Z

@R-Studio @innobead

I think this issue can be simplified to completely exclude Velero.

At the core the issue here is that Longhorn does not delete snapshots or backups when the backing CSI VolumeSnapshot resource is deleted.

As a user of Longhorn that is interfacing with CSI and not native Longhorn resources, I expect the state of Longhorn resources to reflect the state of my CSI resources.

If I create a CSI VolumeSnapshot I expect Longhorn to create a snapshot/backup/bi. This works!
If I delete a CSI VolumeSnapshot I expect Longhorn to delete the backing snapshot/backup/bi that it created. This doesn't work.

Therefore I think it's fair to state that Longhorn is currently only providing a partial implementation of the CSI interface/spec.

Velero is just using this common CSI interface as it is intended to be used and expecting it to have the desired effect. This is not a Velero issue.

Perhaps this should be opened as a new issue with a smaller scope (CSI spec conformance).

innobead · 2023-12-26T16:58:22Z

Thanks for the valuable info.

We will improve this, as it's quite important for space efficiency.

larssb · 2024-04-22T12:59:11Z

Kind request here for a status. What's the status? This issue somewhat renders using Velero combing with Longhorn somewhat quirky. As we seems to have to clean up snapshots instead of every backup artefact being deleted when a Velero backup is removed by Velero's internal clean up jobs.

ChanYiLin · 2024-07-29T09:15:49Z

VolumeSnapshot type: snap

Both creation and deletion works

Create a CSI VolumeSnapshot => Longhorn creates a Snapshot
Delete the CSI VolumeSnapshot => Longhron deletes the Snapshot

Note that, Longhorn won't delete the latest snapshot which is just behind the volume-head and will only marks it as removed
Reference: https://longhorn.io/docs/1.6.2/concepts/#243-deleting-snapshots

VolumeSnapshot type: bak

Both creation and deletion works

Create a CSI VolumeSnapshot => Longhorn creates a Backup
Delete the CSI VolumeSnapshot => Longhron deletes the Backup

As mentioned by @innobead above and @PhanLe1010 in another thread(vmware-tanzu/velero#6179),
Longhorn first creates the Snapshot(Longhorn snapshot) and then creates the Backup(longhorn Backup) based on that snapshot.
After backup creation, the snpashot is no longer binding with the backup.
Thus, when users delete the CSI VolumeSnapshot, Longhron only deletes the backup(longhorn backup) but not snapshot(Longhorn Snapshot).
That is why we can see the snapshots(longhorn snapshot) remaining in this issue. And it is expected because a Longhorn snapshot can be corresponding to multiple Longhorn backups

There are two ways to auto delete these snapshots.

Setup a snapshot-delete recurring job to periodically delete the Longhron Snapshot.
In the following v1.7.0 release, we have new setting auto-cleanup-when-delete-backup which can auto clean up the snapshot when the backup is deleted. Reference: feat: remove related snapshot when removing backup longhorn-manager#2783

VolumeSnapshot type: bi

Both creation and deletion works

Create a CSI VolumeSnapshot => Longhorn creates a Backingimage
Delete the CSI VolumeSnapshot => Longhron deletes the BackingImage

cc @R-Studio @julienvincent the csi volumesnapshot creation and deletion functions are implemented in Longhorn. I think we just have some confusion about Snapshot and Backup in Longhorn.
cc @larssb , yes, since Longhorn Backup and Longhorn Snapshot is not binding together after creation, you have to delete the Longhorn Snapshot manually or using a snapshot-delete recurring job. Or after v1.7.0, we will have a new setting auto-cleanup-when-delete-backup to automatically clean up the Snapshot.

R-Studio · 2024-08-12T07:36:33Z

@ChanYiLin thanks for the great summary. The new setting auto-cleanup-when-delete-backup sounds really helpful and we will test it after Rancher releases the Helm chart for v1.7.0 (and close this issue if it works).
FYI: @lucatr

ChanYiLin · 2024-09-05T07:05:01Z

I am going to close the issue for now.
Feel free to open the issues if there is any new comments

R-Studio added the kind/bug label Apr 24, 2023

innobead added the area/upstream Upstream related like tgt upstream library label Dec 5, 2023

innobead added this to the v1.7.0 milestone Dec 26, 2023

innobead added kind/improvement Request for improvement of existing function area/snapshot Volume snapshot (in-cluster snapshot or external backup) labels Dec 26, 2023

github-actions bot mentioned this issue Dec 26, 2023

[TEST][BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #7466

Open

innobead added area/space-efficiency Space efficiency, especially for volume data usage require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Dec 26, 2023

derekbit assigned ChanYiLin May 17, 2024

derekbit modified the milestones: v1.7.0, v1.8.0 May 17, 2024

ChanYiLin closed this as completed Sep 5, 2024

ChanYiLin added the wontfix label Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #5802

[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #5802

R-Studio commented Apr 24, 2023 •

edited

Loading

R-Studio commented Apr 24, 2023

R-Studio commented Apr 24, 2023 •

edited

Loading

weizhe0422 commented Apr 27, 2023

R-Studio commented May 1, 2023

tcoupin commented May 6, 2023

R-Studio commented May 8, 2023

vineetsingh5 commented May 9, 2023 •

edited

Loading

tcoupin commented May 9, 2023

R-Studio commented May 9, 2023

R-Studio commented May 15, 2023

R-Studio commented May 15, 2023 •

edited

Loading

R-Studio commented May 15, 2023 •

edited

Loading

R-Studio commented Jun 5, 2023

R-Studio commented Jun 6, 2023

R-Studio commented Jun 6, 2023 •

edited

Loading

R-Studio commented Jul 17, 2023

R-Studio commented Jul 17, 2023

anthony-pastor commented Oct 6, 2023

R-Studio commented Dec 5, 2023

innobead commented Dec 5, 2023 •

edited

Loading

R-Studio commented Dec 5, 2023 •

edited

Loading

julienvincent commented Dec 26, 2023

innobead commented Dec 26, 2023

larssb commented Apr 22, 2024

ChanYiLin commented Jul 29, 2024 •

edited

Loading

R-Studio commented Aug 12, 2024 •

edited

Loading

ChanYiLin commented Sep 5, 2024

[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #5802

[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #5802

Comments

R-Studio commented Apr 24, 2023 • edited Loading

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Environment

Additional context

Velero Backup Schedule for Harbor

VolumeSnapshotClass

VolumeSnapshotClass

VolumeSnapshotLocation

R-Studio commented Apr 24, 2023

R-Studio commented Apr 24, 2023 • edited Loading

weizhe0422 commented Apr 27, 2023

R-Studio commented May 1, 2023

tcoupin commented May 6, 2023

R-Studio commented May 8, 2023

vineetsingh5 commented May 9, 2023 • edited Loading

tcoupin commented May 9, 2023

R-Studio commented May 9, 2023

R-Studio commented May 15, 2023

R-Studio commented May 15, 2023 • edited Loading

R-Studio commented May 15, 2023 • edited Loading

R-Studio commented Jun 5, 2023

R-Studio commented Jun 6, 2023

R-Studio commented Jun 6, 2023 • edited Loading

R-Studio commented Jul 17, 2023

R-Studio commented Jul 17, 2023

anthony-pastor commented Oct 6, 2023

R-Studio commented Dec 5, 2023

innobead commented Dec 5, 2023 • edited Loading

R-Studio commented Dec 5, 2023 • edited Loading

julienvincent commented Dec 26, 2023

innobead commented Dec 26, 2023

larssb commented Apr 22, 2024

ChanYiLin commented Jul 29, 2024 • edited Loading

VolumeSnapshot type: snap

VolumeSnapshot type: bak

VolumeSnapshot type: bi

R-Studio commented Aug 12, 2024 • edited Loading

ChanYiLin commented Sep 5, 2024

R-Studio commented Apr 24, 2023 •

edited

Loading

R-Studio commented Apr 24, 2023 •

edited

Loading

vineetsingh5 commented May 9, 2023 •

edited

Loading

R-Studio commented May 15, 2023 •

edited

Loading

R-Studio commented May 15, 2023 •

edited

Loading

R-Studio commented Jun 6, 2023 •

edited

Loading

innobead commented Dec 5, 2023 •

edited

Loading

R-Studio commented Dec 5, 2023 •

edited

Loading

ChanYiLin commented Jul 29, 2024 •

edited

Loading

R-Studio commented Aug 12, 2024 •

edited

Loading