Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Longhorn Snapshots are not deleted after expired Backups (Velero) #6179

Open
R-Studio opened this issue Apr 24, 2023 · 32 comments
Open
Assignees
Labels
Area/CSI Related to Container Storage Interface support

Comments

@R-Studio
Copy link

Describe the bug (🐛 if you encounter this issue)

We are using Velero to create backups from the Kubernetes manifests and the persistent volumes (in our example we backup Harbor).
If we create a backup, Velero saves the K8s manifests to a Object Storage (MinIO) and creates snapshots resources to trigger Longhorn backups with the velero-plugin-for-csi. Longhorn writes the backups to another MinIO bucket.
If we delete a Velero backup or the backup is expired, the snapshot (snapshots.longhorn.io) are not deleted:
image

We are using Velero v1.9.4 with EnableCSI feature and the following plugins:

  • velero/velero-plugin-for-csi:v0.4.0
  • velero/velero-plugin-for-aws:v1.6.0

We have the same issue in Velero v1.11.0 with EnableCSI feature and the following plugins:

  • velero/velero-plugin-for-csi:v0.5.0
  • velero/velero-plugin-for-aws:v1.6.0

To Reproduce

Steps to reproduce the behavior:

  1. Install the newest version of Velero and Rancher-Longhorn
  2. In Longhorn configre a S3 Backup Target (we are usng MinIO for this)
  3. Enable CSI Snapshot Support for Longhorn.
  4. Create a backup (for example with the Schedule below): velero backup create --from-schedule harbor-daily-0200
  5. Delete the backup velero backup delete <BACKUPNAME>
  6. The snapshot (snapshots.longhorn.io) is not deleted.

Expected behavior

The snapshot is deleted.

Environment

  • Longhorn version: 102.2.0+up1.4.1
  • Velero version:
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Rancher-Longhorn Helm Chart
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE2, v1.25.7+rke2r1
    • Number of management node in the cluster: 1x
    • Number of worker node in the cluster: 3x
  • Node config
    • OS type and version: Ubuntu
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): VMs on Proxmox
  • Number of Longhorn volumes in the cluster: 17
  • Velero features (use velero client config get features):

Additional context

Velero Backup Schedule for Harbor

---
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: harbor-daily-0200
  namespace: velero #Must be the namespace of the Velero server
spec:
  schedule: 0 0 * * *
  template:
    includedNamespaces:
    - 'harbor'
    includedResources:
    - '*'
    snapshotVolumes: true
    storageLocation: minio
    volumeSnapshotLocations:
      - longhorn
    ttl: 168h0m0s #7 Days retention
    defaultVolumesToRestic: false
    hooks:
      resources:
        - name: postgresql
          includedNamespaces:
          - 'harbor'
          includedResources:
          - pods
          excludedResources: []
          labelSelector:
            matchLabels:
              statefulset.kubernetes.io/pod-name: harbor-database-0
          pre:
            - exec:
                container: database
                command:
                  - /bin/bash
                  - -c
                  - "psql -U postgres -c \"CHECKPOINT\";"
                onError: Fail
                timeout: 30s

VolumeSnapshotClass

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: driver.longhorn.io
deletionPolicy: Delete

VolumeSnapshotClass

In our second cluster, with Velero v1.11.0 installed, we created the following resource (but same issue here):

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

VolumeSnapshotLocation

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: longhorn
  namespace: velero
spec:
  provider: longhorn.io/longhorn
@ywk253100 ywk253100 added the Area/CSI Related to Container Storage Interface support label Apr 26, 2023
@ywk253100
Copy link
Contributor

Is the VolumeSnapshot and VolumeSnapshotContent removed after the backup being deleted? I'm feeling this is more likely a bug of the csi driver rather than Velero

@R-Studio
Copy link
Author

R-Studio commented May 1, 2023

@ywk253100 yes the VolumeSnapshot and VolumeSnapshotContent removed after backup deleted.
If it is more like a bug of the csi driver, is there a way to address this?

@draghuram
Copy link
Contributor

draghuram commented May 1, 2023

Firstly, there are couple of points I would like to highlight about your setup:

  1. I see that the VolumeSnapshotClass's "deletionPolicy" is set to "Delete". This is dangerous because if the namespace is deleted or if VolumeSnapshot resources are deleted, it will trigger deletion of VolumeSnapshotContent and that of storage snapshot itself. This is probably not what you intend so it is advisable to set "deletionPolicy" to "Retain". Note that this will not prevent Velero from cleaning up the snapshots when backup expires.
  2. I see the following Longhorn specific config in Volume snapshot class:
parameters:
  type: bak

The value "bak" tells Longhorn driver to do actual "backup" when a CSI snapshot is taken. This was the default behavior of Longhorn CSI driver until version 1.3. Since then, there is a different value you can use called "snap". This causes CSI driver to take a real "snapshot" without triggering data movement. Just wanted to mention it in case you want to use this feature. See https://longhorn.io/docs/1.4.1/snapshots-and-backups/csi-snapshot-support/csi-volume-snapshot-associated-with-longhorn-snapshot/ for details.

Now, coming to the actual snapshot deletion, if VolumeSnapshot and VolumeSnapshotContent resources are gone and if storage snapshots remain, most probable cause would be an issue with CSI driver. You should check Longhorn CSI driver logs and verify if there are any messages corresponding to the VolumeSnapshotContent that was deleted. You can also try to reproduce the problem by creating a VolumeSnapshot manually and then deleting it to see what happens. We, at CloudCasa, have seen snapshot deletion issues with Longhorn but the driver version was pre-1.3. You use 1.4.1?

Thanks, Raghu (https://cloudcasa.io).

@ywk253100 ywk253100 self-assigned this May 8, 2023
@R-Studio
Copy link
Author

R-Studio commented May 15, 2023

@draghuram Thanks for your tipps! 👍🏽

The same issue occurs when I creating a VolumeSnapshot manually and then deleting it. In the logs I can't find any useful informations.

VolumeSnapshotClass

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

VolumeSnapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-test
  namespace: harbor
spec:
  volumeSnapshotClassName: longhorn
  source:
    persistentVolumeClaimName: harbor-jobservice

image

Remove the VolumeSnapshot: kubectl delete volumesnapshot new-snapshot-test -n harbor

image

  • Why does Longhorn remain the snapshot?
  • Why does it work in the GUI but not without it?

We use Longhorn v1.4.1 and the velero-plugin-for-csi:v0.5.0.

@R-Studio
Copy link
Author

R-Studio commented May 15, 2023

I still found something interesting in the snapshot-controller logs:
image
But in the logs there is no error for deleting the snapshot:
image

@draghuram
Copy link
Contributor

Interesting. From the logs, it does seem that deletion logic is kicking in and I even see the attempt to remove finalizer. Can you post VolumeSnapshot yaml after the deletion? I want to see what finalizers are listed there.

@R-Studio
Copy link
Author

R-Studio commented May 22, 2023

@draghuram When I create a backup from a velero schedule I can't see any VolumeSnapshots beeing created, respectively the VolumeSnaphots are deleted directly after successful backup:
image
image

Here is my backup schedule:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: harbor-daily-0200
  namespace: velero #Must be the namespace of the Velero server
spec:
  schedule: 0 0 * * * #IMPORTANT: Velero Pod has UTC time so CH-Time -2h
  template:
    includedNamespaces:
    - 'harbor'
    includedResources:
    - '*'
    snapshotVolumes: true
    storageLocation: minio
    volumeSnapshotLocations:
      - longhorn
    ttl: 168h0m0s #7 Days retention
    defaultVolumesToRestic: false
    hooks:
      resources:
        - name: postgresql
          includedNamespaces:
          - 'harbor'
          includedResources:
          - pods
          excludedResources: []
          labelSelector:
            matchLabels:
              statefulset.kubernetes.io/pod-name: harbor-database-0
          pre:
            - exec:
                container: database
                command:
                  - /bin/bash
                  - -c
                  - "psql -U postgres -c \"CHECKPOINT\";"
                onError: Fail
                timeout: 30s

@R-Studio
Copy link
Author

R-Studio commented Jun 5, 2023

Today I upgrade Longhorn from v1.4.1 to v1.4.2 and the issue still occurs. 😔😔

@R-Studio
Copy link
Author

R-Studio commented Jun 6, 2023

Today I noticed that I have deployed a snapshot controller like described in this documentation.
Although I already had an snapshot controller "rke2-snapshot-controller" on my cluster. I am not sure if this is comes with a rancher update or something. Anyway I removed my snapshot-controller and test the issue again.

VolumeSnapshotClass

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: longhorn
  namespace: longhorn-system
  labels:
    velero.io/csi-volumesnapshot-class: 'true'
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

VolumeSnapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: new-snapshot-test
  namespace: harbor
spec:
  volumeSnapshotClassName: longhorn
  source:
    persistentVolumeClaimName: harbor-jobservice

image
image

Remove the VolumeSnapshot: kubectl delete volumesnapshot new-snapshot-test -n harbor
image

The same issue occurs but there is some interessting logs:

  • After I created the VolumeSnapshot the following logs are written:
    image
    image

@R-Studio
Copy link
Author

R-Studio commented Jun 6, 2023

After I upgrade to the newest RKE2 Helm charts, the error logs mentioned above "finalizers" no longer appears, but the issue still occurs.

I upgrade the following Helm releases:

  • rke2-snapshot-controller: 1.7.201 -> 1.7.202
  • rke2-snapshot-controller-crd: 1.7.201 -> 1.7.202
  • rke2-snapshot-validation.webhook: 1.7.200 -> 1.7.201

Here the log messages:

2023-06-06T14:36:39+02:00	I0606 12:36:39.815026       1 snapshot_controller_base.go:213] deletion of content "snapcontent-20d81f05-864b-489e-8875-3ea71832a743" was already processed
2023-06-06T14:36:38+02:00	E0606 12:36:38.814935       1 snapshot_controller_base.go:265] could not sync content "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": snapshot controller failed to update snapcontent-20d81f05-864b-489e-8875-3ea71832a743 on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshotcontents/snapcontent-20d81f05-864b-489e-8875-3ea71832a743, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d376de84-4aa4-426e-8c97-f96df3073b73, UID in object meta: 
2023-06-06T14:36:38+02:00	time="2023-06-06T12:36:38Z" level=info msg="DeleteSnapshot: rsp: {}"
2023-06-06T14:36:38+02:00	time="2023-06-06T12:36:38Z" level=info msg="DeleteSnapshot: req: {\"snapshot_id\":\"bak://pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\"}"
2023-06-06T14:35:12+02:00	I0606 12:35:12.301474       1 snapshot_controller.go:998] checkandRemovePVCFinalizer[new-snapshot-test]: Remove Finalizer for PVC harbor-jobservice as it is not used by snapshots in creation
2023-06-06T14:35:12+02:00	I0606 12:35:12.296417       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"harbor", Name:"new-snapshot-test", UID:"20d81f05-864b-489e-8875-3ea71832a743", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"181825436", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot harbor/new-snapshot-test is ready to use.
2023-06-06T14:35:12+02:00	I0606 12:35:12.296353       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"harbor", Name:"new-snapshot-test", UID:"20d81f05-864b-489e-8875-3ea71832a743", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"181825436", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot harbor/new-snapshot-test was successfully created by the CSI driver.
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=info msg="CreateSnapshot: rsp: {\"snapshot\":{\"creation_time\":{\"seconds\":1686054902},\"ready_to_use\":true,\"size_bytes\":1073741824,\"snapshot_id\":\"bak://pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\",\"source_volume_id\":\"pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c\"}}"
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=debug msg="ControllerServer CreateSnapshot rsp: snapshot:<size_bytes:1073741824 snapshot_id:\"bak://pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\" source_volume_id:\"pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c\" creation_time:<seconds:1686054902 > ready_to_use:true > "
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=info msg="createCSISnapshotTypeLonghornBackup: volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c backup backup-3e65b0ef494940b8 of snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743 in progress"
2023-06-06T14:35:12+02:00	time="2023-06-06T12:35:12Z" level=info msg="Backup backup-3e65b0ef494940b8 initiated for volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c for snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743"
2023-06-06T14:35:08+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:08Z" level=info msg="Done initiating backup creation, received backupID: backup-3e65b0ef494940b8"
2023-06-06T14:35:06+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:06Z" level=info msg="Loaded driver for s3://t1-longhorn-snapshots@minio/" pkg=s3
2023-06-06T14:35:06+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:06Z" level=info msg="Start creating backup backup-3e65b0ef494940b8" pkg=backup
2023-06-06T14:35:06+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:06Z" level=info msg="Initializing backup backup-3e65b0ef494940b8 for volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743" pkg=backup
2023-06-06T14:35:06+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:06Z" level=info msg="Backing up snapshot-20d81f05-864b-489e-8875-3ea71832a743 on tcp://10.42.1.154:10060, to s3://t1-longhorn-snapshots@minio/"
2023-06-06T14:35:06+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:06Z" level=info msg="Backing up snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743 to backup backup-3e65b0ef494940b8" serviceURL="10.42.2.8:10009"
2023-06-06T14:35:04+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:04Z" level=info msg="Done initiating backup creation, received backupID: backup-3e65b0ef494940b8"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Loaded driver for s3://t1-longhorn-snapshots@minio/" pkg=s3
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Start creating backup backup-3e65b0ef494940b8" pkg=backup
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Initializing backup backup-3e65b0ef494940b8 for volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743" pkg=backup
2023-06-06T14:35:02+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:02Z" level=info msg="Backing up snapshot-20d81f05-864b-489e-8875-3ea71832a743 on tcp://10.42.1.154:10060, to s3://t1-longhorn-snapshots@minio/"
2023-06-06T14:35:02+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:02Z" level=info msg="Backing up snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743 to backup backup-3e65b0ef494940b8" serviceURL="10.42.2.8:10009"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="createCSISnapshotTypeLonghornBackup: volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c initiating backup for snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished snapshot" snapshot=snapshot-20d81f05-864b-489e-8875-3ea71832a743 volume=pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished to snapshot: 10.42.3.102:10105 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Finished to snapshot: 10.42.1.154:10060 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Removing disk volume-head-004.img"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished creating disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-99a60236/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk checksum file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-99a60236/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.checksum before linking"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-46492401] time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk metadata file path /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-99a60236/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.meta before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Removing disk volume-head-004.img"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished creating disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-5286f5ef/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk checksum file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-5286f5ef/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.checksum before linking"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk metadata file path /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-5286f5ef/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.meta before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to create disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Finished to snapshot: 10.42.2.93:10045 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-46492401] time="2023-06-06T12:35:02Z" level=info msg="Replica server starts to snapshot [snapshot-20d81f05-864b-489e-8875-3ea71832a743] volume, user created true, created time 2023-06-06T12:35:02Z, labels map[type:bak]"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Removing disk volume-head-003.img"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Finished creating disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-90ade5bb/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk checksum file /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-90ade5bb/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.checksum before linking"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-ff57e501] time="2023-06-06T12:35:02Z" level=info msg="Cleaning up new disk metadata file path /host/var/lib/longhorn/replicas/pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-90ade5bb/volume-snap-snapshot-20d81f05-864b-489e-8875-3ea71832a743.img.meta before linking"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to create disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-62d5874e] time="2023-06-06T12:35:02Z" level=info msg="Replica server starts to snapshot [snapshot-20d81f05-864b-489e-8875-3ea71832a743] volume, user created true, created time 2023-06-06T12:35:02Z, labels map[type:bak]"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to create disk" disk=snapshot-20d81f05-864b-489e-8875-3ea71832a743
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="Starting to snapshot: 10.42.1.154:10060 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Starting to snapshot: 10.42.2.93:10045 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-r-ff57e501] time="2023-06-06T12:35:02Z" level=info msg="Replica server starts to snapshot [snapshot-20d81f05-864b-489e-8875-3ea71832a743] volume, user created true, created time 2023-06-06T12:35:02Z, labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Starting to snapshot: 10.42.3.102:10105 snapshot-20d81f05-864b-489e-8875-3ea71832a743 UserCreated true Created at 2023-06-06T12:35:02Z, Labels map[type:bak]"
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Requesting system sync before snapshot" snapshot=snapshot-20d81f05-864b-489e-8875-3ea71832a743 volume=pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c
2023-06-06T14:35:02+02:00	[pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c-e-2e5a625b] time="2023-06-06T12:35:02Z" level=info msg="Starting snapshot" snapshot=snapshot-20d81f05-864b-489e-8875-3ea71832a743 volume=pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c
2023-06-06T14:35:02+02:00	[longhorn-instance-manager] time="2023-06-06T12:35:02Z" level=info msg="Snapshotting volume: snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743" serviceURL="10.42.2.8:10009"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="createCSISnapshotTypeLonghornBackup: volume pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c initiating snapshot snapshot-20d81f05-864b-489e-8875-3ea71832a743"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="CreateSnapshot: req: {\"name\":\"snapshot-20d81f05-864b-489e-8875-3ea71832a743\",\"parameters\":{\"type\":\"bak\"},\"source_volume_id\":\"pvc-21bf4b76-ac60-48d9-b5ba-fe15b50dd87c\"}"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="GetPluginInfo: rsp: {\"name\":\"driver.longhorn.io\",\"vendor_version\":\"v1.4.2\"}"
2023-06-06T14:35:02+02:00	time="2023-06-06T12:35:02Z" level=info msg="GetPluginInfo: req: {}"
2023-06-06T14:35:02+02:00	I0606 12:35:02.112599       1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"harbor", Name:"new-snapshot-test", UID:"20d81f05-864b-489e-8875-3ea71832a743", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"181825428", FieldPath:""}): type: 'Normal' reason: 'CreatingSnapshot' Waiting for a snapshot harbor/new-snapshot-test to be created by the CSI driver.
2023-06-06T14:35:02+02:00	I0606 12:35:02.112195       1 snapshot_controller.go:291] createSnapshotWrapper: Creating snapshot for content snapcontent-20d81f05-864b-489e-8875-3ea71832a743 through the plugin ...
2023-06-06T14:35:02+02:00	I0606 12:35:02.106794       1 snapshot_controller.go:919] Added protection finalizer to persistent volume claim harbor/harbor-jobservice
2023-06-06T14:35:02+02:00	I0606 12:35:02.093901       1 snapshot_controller.go:638] createSnapshotContent: Creating content for snapshot harbor/new-snapshot-test through the plugin ...
2023-06-06T14:35:01+02:00	time="2023-06-06T12:35:01Z" level=debug msg="Setting allow-recurring-job-while-volume-detached is false

Following error looks interesting:

snapshot_controller_base.go:265] could not sync content "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": snapshot controller failed to update snapcontent-20d81f05-864b-489e-8875-3ea71832a743 on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-20d81f05-864b-489e-8875-3ea71832a743": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshotcontents/snapcontent-20d81f05-864b-489e-8875-3ea71832a743, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: d376de84-4aa4-426e-8c97-f96df3073b73, UID in object meta:`

image

@R-Studio
Copy link
Author

@draghuram do you have any idea what could be the root cause of the issue?

@draghuram
Copy link
Contributor

Yes, that error message looks interesting. Following that line are these two lines:

2023-06-06T14:36:38+02:00       time="2023-06-06T12:36:38Z" level=info msg="DeleteSnapshot: rsp: {}"

2023-06-06T14:36:38+02:00       time="2023-06-06T12:36:38Z" level=info msg="DeleteSnapshot: req: {\"snapshot_id\":\"bak://pvc-21b
f4b76-ac60-48d9-b5ba-fe15b50dd87c/backup-3e65b0ef494940b8\"}"

So I guess the driver is getting the request to delete the snapshot though nothing else seem to happen. I am going to do the same test and see what other logs are produced by the driver. In the meanwhile, can you use "type: snap" in volume snapshot class and re-do the test?

I must also note that the main problem appears to be in either CSI driver or Longhorn itself. From Velero's point of view, it is issuing the delete request. So you may have better luck pursuing this in Longhorn forums by describing how you deleted VolumeSnapshot class and how it didn't delete Longhorn snapshot.

@R-Studio
Copy link
Author

R-Studio commented Jul 24, 2023

@draghuram thanks with type: snap Velero removes that snapshot as expected but no backup was created (only a snapshot).
Were you able to test this already and do you found something interessting?

@draghuram
Copy link
Contributor

@R-Studio, in our tests, we see that Longhorn snapshots are deleted as expected. You should use "type: snap" and then decide what type of Velero backups you want. The most basic is snapshots which you are already doing. The second option is do file system backups which will transfer the contents of Longhorn PVs to object storage. But they read data from live PV. A new feature is coming in 1.12 (slated to be released by end of August) that will take PV snapshot first and then backup data in the snapshot to object storage.

@R-Studio
Copy link
Author

R-Studio commented Aug 7, 2023

@draghuram thanks for your help, but as I said, with type: snap velero only triggers a Longhorn snapshot and not a Longhorn backup. The difference is that a longhorn snapshot is only stored locally and a longhorn backup is written to a s3 bucket (external).
If I understand Velero, the CSI plugin and Longhorn correctly, I would expect Velero to store the Kubernetes manifests (YAML's), create CSI snapshots through the plugin and Longhorn to notice the CSI snapshot and create the Longhorn backup.
Or have I misunderstood something?

@Satsank
Copy link

Satsank commented Aug 7, 2023 via email

@R-Studio
Copy link
Author

R-Studio commented Dec 5, 2023

I update to Velero v1.12.2, using the velero-plugin-for-csi v0.6.2 & velero-plugin-for-aws v1.8.2, but still the same issue.

@lucatr
Copy link

lucatr commented Dec 12, 2023

@Satsank thanks for your inputs. Point 5 catched my attention. I'm having the same issue and tested Velero 1.12.2 data mover as a workaround. Overall goal is to reduce data consumption on Longhorn by moving snapshot/backup data directly to S3. My tests failed because Velero wasn't able to mount the snapshots, see error below. My assumption is that this is because Longhorn snapshots are incremental, can therefore not be mounted just like that. Is there a way to make this work with Longhorn?

longhorn-csi-plugin time="2023-12-12T16:39:47Z" level=info msg="NodeStageVolume: req: {\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"xfs\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"dataLocality\":\"disabled\",\"dataSource\":\"snap://pvc-542dfa21-ec13-4aa5-a81a-e772587c6acf/snapshot-3d6fe83a-0350-4dd6-8fd0-0961570bb3b9\",\"fromBackup\":\"\",\"fsType\":\"xfs\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1702042978770-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-8ef7248c-d20b-4065-b38e-69734e3a629f\"}" func=csi.logGRPC file="server.go:132"
longhorn-csi-plugin time="2023-12-12T16:39:47Z" level=info msg="Volume pvc-8ef7248c-d20b-4065-b38e-69734e3a629f using user and longhorn provided xfs fs creation params: -ssize=4096 -bsize=4096" func="csi.(*NodeServer).getMounter" file="node_server.go:763"
longhorn-csi-plugin time="2023-12-12T16:39:47Z" level=info msg="Volume pvc-8ef7248c-d20b-4065-b38e-69734e3a629f device /dev/longhorn/pvc-8ef7248c-d20b-4065-b38e-69734e3a629f contains filesystem of format xfs" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:421" component=csi-node-server function=NodeStageVolume
longhorn-csi-plugin time="2023-12-12T16:39:47Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount" func=csi.ensureMountPoint file="util.go:288"
longhorn-csi-plugin time="2023-12-12T16:39:47Z" level=info msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount try opening and syncing dir to make sure it's healthy" func=csi.ensureMountPoint file="util.go:296"
longhorn-csi-plugin E1212 16:39:47.578599    6145 mount_linux.go:232] Mount failed: exit status 32                                                                                                                                            
longhorn-csi-plugin Mounting command: mount
longhorn-csi-plugin Mounting arguments: -t xfs -o defaults /dev/longhorn/pvc-8ef7248c-d20b-4065-b38e-69734e3a629f /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount
longhorn-csi-plugin Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount: wrong fs type, bad option, bad superblock on /dev/longhorn/pvc-8ef7248c-d20b-4065-b38e-69734e3a629f, missing codepage or helper program, or other error.
longhorn-csi-plugin time="2023-12-12T16:39:47Z" level=error msg="NodeStageVolume: err: rpc error: code = Internal desc = mount failed: exit status 32\nMounting command: mount\nMounting arguments: -t xfs -o defaults /dev/longhorn/pvc-8ef7248c-d20b-4065-b38e-69734e3a629f /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount\nOutput: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/5ba5d59b93a7a2e3a3a87645184e0d0d4faccb58c8efe12e31d46981cc70454c/globalmount: wrong fs type, bad option, bad superblock on /dev/longhorn/pvc-8ef7248c-d20b-4065-b38e-69734e3a629f, missing codepage or helper program, or other error.\n" func=csi.logGRPC file="server.go:138"

@draghuram
Copy link
Contributor

@lucatr PVCs can be created from Longhorn snapshots and mounted on data mover pods. We do something similar in CloudCasa and it works. However, the point is that when a PVC is created from Longhorn snapshot, Longhorn creates a new volume and "copies" data from snapshot which is very inefficient because the PVC is going to be deleted as soon as backup is done. The best one can do is to configure minimum replicas (1 really) and minimize copying. CloudCasa does provide this option but I really hope that Longhorn optimizes creation of volumes from snapshots by totally eliminating copy.

Having said that, the error in your case seems to be different so it may be better to open a separate issue. When exactly are you seeing this error? Attaching Velero backup bundle may be useful.

Feel free to contact is at CloudCasa as we have lot of experience with Longhorn PV backups.

@lucatr
Copy link

lucatr commented Dec 13, 2023

@draghuram thanks for the feedback. When I kick off the Velero Backup from schedule [1], a snapshot is created successfully, status changes to "ready to use". PVC is created as well, events say "successfully provisioned". PV looks fine as well, same is true for the volume in Longhorn GUI (says healthy, ready). But the snapshot-exposer pods are stuck in ContainerCreating status. Pod events show the same error about mounting issues I also see in the longhorn-csi-plugin pod logs. It's stuck like this for about 30 min, before pods are killed and backup is marked as PartiallyFailed in Velero.

Not sure what the Velero backup bundle is, or what other logs might be interesting in this case.

As suggested I'll go ahead and create a separate issue for this later this week.

[1]

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: harbor-daily-0200
  namespace: velero
spec:
  schedule: 30 0 * * *
  template:
    includedNamespaces:
    - 'harbor'
    includedResources:
    - '*'
    snapshotVolumes: true
    storageLocation: minio
    volumeSnapshotLocations:
      - longhorn
    snapshotMoveData: true
    datamover: velero
    ttl: 168h0m0s

@PhanLe1010
Copy link

PhanLe1010 commented Dec 22, 2023

Hi everyone! I am from the Longhorn team. It is a great discussion so far in this thread and I would like to join the conversation.

First of all, like others already mentioned, a CSI VolumeSnapshot (this is a Kubernetes upstream CRD) can be associated with either a Longhorn snapshot (live inside the cluster data) or a Longhorn backup (live outside of the cluster in S3 endpoint). For example, the CSI VolumeSnapshot created by this VolumeSnapshotClass corresponds to a Longhorn snapshot (link):

kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
  name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: snap

and the CSI VolumeSnapshot created by this VolumeSnapshotClass corresponds to a Longhorn backup (link):

kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
  name: longhorn-backup-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: bak

CSI VolumeSnapshot of type: snap

When you create a CSI VolumeSnapshot of type: snap, Longhorn provisions a in-cluster Longhorn snapshot (the ones shown in this picture https://user-images.githubusercontent.com/7498854/234019298-e7cd2853-199b-4702-b020-24227c0a13bc.png). When you delete this CSI VolumeSnapshot, Longhorn will delete the snapshot in this picture. The benefit of this approach us there is no leftover resource in this case. However, as pointed out by others, this CSI VolumeSnapshot is local inside the cluster, it is not backed up to the remote S3 endpoint. If someone (like the Velero 1.12.2 data mover) try to mount (also means clone) this CSI VolumeSnapshot and upload the data to S3 endpoint, Longhorn will need to first fully copy the data to a new PVC, then the data move can upload data to the S3 endpoint. Finally, data mover can delete the newly cloned PVC at the end. This is a costly operation and doesn't seems fit this backup use-case much. This feature (clone a new PVC from a CSI VolumeSnapshot) was intended for the use-case likes VM cloning (in Harvester) in which a brand new VM and its data is cloned from another VM.

CSI VolumeSnapshot of type: bak

On the other hand, when you create a CSI VolumeSnapshot of type: bak, Longhorn will:

  1. Take a in-cluster Longhorn snapshot (the ones shown in this picture https://user-images.githubusercontent.com/7498854/234019298-e7cd2853-199b-4702-b020-24227c0a13bc.png).
  2. Create and upload a backup from that Longhorn snapshot to S3 endpoint

When you delete a CSI VolumeSnapshot of type: bak, Longhorn will:

  1. Delete the backup from S3 endpoint
  2. However, Longhorn will NOT delete the in-cluster Longhorn snapshot (the ones shown in this picture https://user-images.githubusercontent.com/7498854/234019298-e7cd2853-199b-4702-b020-24227c0a13bc.png). The reason for this behavior is simply that after the backup finished, a Longhorn snapshot is no longer link to the backup. A Longhorn snapshot can be corresponding to multiple Longhorn backups. Therefore, currently, we don't delete the Longhorn snapshot when a backup is deleted. However, I do see an opportunity to make improvement here, maybe Longhorn should allow the user a setting to specifically find and clean up the Longhorn snapshot when a Longhorn backup is deleted.

The downside of this CSI VolumeSnapshot of type: bak currently is that there are leftover Longhorn snapshot (the ones shown in this picture https://user-images.githubusercontent.com/7498854/234019298-e7cd2853-199b-4702-b020-24227c0a13bc.png) after deleting the CSI VolumeSnapshot. However, the upside is huge, this method is the native way to backup data to S3 endpoint in Longhorn. It is fast and efficient.

Conclusion:

I would recommend using the CSI VolumeSnapshot of type: bak as it is the native way to backup data to S3 endpoint in Longhorn. It is fast and efficient. To overcome its limitation (leftover Longhorn snapshot), I suggest:

  1. Setup a snapshot-delete recurring job to periodically clean up the leftover Longhorn snapshots
  2. Create an improvement ticket to Longhorn repo to ask Longhorn allow the user a setting to specifically find and clean up the Longhorn snapshot when a Longhorn backup is deleted.

@draghuram
Copy link
Contributor

Hi @PhanLe1010, Thanks for detailed information. It is very helpful.

I personally think it is better not to use "type: bak" snapshots as a way of backup because this is Longhorn specific. One may easily have multiple clusters/CSI drivers and ideally, you need a unified backup strategy (such as the one provided by Velero or CloudCasa) that works across different storage types. In that respect, all you need from the storage is an efficient way to snapshot PVs and also create PVs from snapshots.

I think Longhorn already took a step in this direction by implementing "true" snapshots (starting from 1.3). It will be nice if copy can be avoided when a PVC is created from snapshot but it looks like it is not in the roadmap?

@R-Studio
Copy link
Author

R-Studio commented Jan 8, 2024

@PhanLe1010 Thanks for your great explaination.

  1. We already use snapshot-delete recurring job as a workaround, but it is pain because we want to keep as few snapshots as possible but the retention of the velero snapshots is defined by the application owner.
  2. How can we create a improvement ticket?

@PhanLe1010
Copy link

Hi @PhanLe1010, Thanks for detailed information. It is very helpful.

I personally think it is better not to use "type: bak" snapshots as a way of backup because this is Longhorn specific. One may easily have multiple clusters/CSI drivers and ideally, you need a unified backup strategy (such as the one provided by Velero or CloudCasa) that works across different storage types. In that respect, all you need from the storage is an efficient way to snapshot PVs and also create PVs from snapshots.

Hi @draghuram, I see your point about the unified backup strategy 👍. This becomes a choice for users to choose between a unified solution and a native solution with the tradeoff between convenience and performance.

Btw, can this unified backup strategy backup/restore a volume in block mode currently?

I think Longhorn already took a step in this direction by implementing "true" snapshots (starting from 1.3). It will be nice if copy can be avoided when a PVC is created from snapshot but it looks like it is not in the roadmap?

This will require a big effort from the Longhorn side. Could you create a GitHub ticket at https://github.com/longhorn/longhorn/issues/new/choose so that Longhorn PM can evaluate whether they want to proceed

@PhanLe1010
Copy link

PhanLe1010 commented Jan 9, 2024

Hi @R-Studio

  1. We already use snapshot-cleanup recurring job as a workaround, but it is pain because we want to keep as few snapshots as possible but the retention of the velero snapshots is defined by the application owner.

I think there is a bit of misunderstanding here. When using the type: bak, the number of velero snapshots (which is the CSI snapshots) shouldn't be affected by the number of Longhorn snapshots. The number of velero snapshots is equal to the number of Longhorn backups instead. So the flow is:

  1. Velero creates a CSI snapshots
  2. Longhorn creates a Longhorn snapshot
  3. Longhorn creates a Longhorn backup from the above Longhorn snapshot
  4. The Longhorn snapshot is no longer needed, you can delete this one by the snapshot-delete recurring job.
  5. From now on, velero snapshot and Longhorn backup are bonded to each other. Deleting velero snapshot will lead to the deletion of Longhorn backup. Restore velero snapshot will lead to the restoration of Longhorn backup
  1. How can we create a improvement ticket?

You can create a GitHub ticket at https://github.com/longhorn/longhorn/issues/new/choose. The idea for the improvement may be:
Create an improvement ticket to Longhorn repo to ask Longhorn allow the user a setting to specifically find and clean up the Longhorn snapshot when a Longhorn backup is deleted.

@R-Studio
Copy link
Author

R-Studio commented Jan 9, 2024

@PhanLe1010
We tested the your suggestion and it works but not with snapshot-cleanup recurring job, we had to use snapshot-delete recurring job. And then it works like you described including restore with Velero, but we noticed in the Longhorn GUI of the PVC the backups are missing (indepentend of the deletionPolicy of the VolumeSnapshotClass:
image
But in the Backup tab it is there:
image

Anyway thanks for the hint, now we are able to save more storage space.

@PhanLe1010
Copy link

PhanLe1010 commented Jan 9, 2024

Hi @R-Studio

We tested the your suggestion and it works but not with snapshot-cleanup recurring job, we had to use snapshot-delete recurring job

Sorry for the mistake! Yes, it should be snapshot-delete recurring job instead of snapshot-cleanup recurring job

And then it works like you described including restore with Velero, but we noticed in the Longhorn GUI of the PVC the backups are missing (indepentend of the deletionPolicy of the VolumeSnapshotClass:

I see the confusion. Yes, the volume detail page (the first picture) only shows Longhorn backups that have an existing Longhorn snapshot. The backup page (the second picture) shows all Longhorn backups

@draghuram
Copy link
Contributor

draghuram commented Jan 9, 2024

@PhanLe1010 Yes, Velero does support backups of BLOCK type PVs (CloudCasa contributed code for that recently). I will open a github request for copy-less creation of PVCs from Longhorn snapshots.

@PhanLe1010
Copy link

@PhanLe1010 Yes, Velero does support backups of BLOCK type PVs (CloudCasa contributed code for that recently). I will open a github request for copy-less creation of PVCs from Longhorn snapshots.

Awesome! Thanks @draghuram !

@PhanLe1010
Copy link

Hi @draghuram Have you create the ticket on Longhorn repo yet?

@draghuram
Copy link
Contributor

Just opened the feature request: longhorn/longhorn#7794.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CSI Related to Container Storage Interface support
Projects
None yet
Development

No branches or pull requests

6 participants