Possible pod volume backup failure if velero is installed in multiple namespaces. #6519

sseago · 2023-07-18T13:58:36Z

What steps did you take and what happened:
It looks like we may have a regresssion in pod volume backup functionality when velero is installed in more than one namespace. It may have been introduced with the kopia integration effort. The expectation is that if velero is installed in more than one namespace, each velero instance only looks at velero CRs in its own namespace, and ignores them in other namespaces. In the past we've introduced bugs in this area with controller refactoring, but the ones previously observed have been fixed.

From the report in vmware-tanzu/helm-charts#477 it looks like the pod volume backup controller is not properly limiting itself to PVBs in its own namespace:

Actually it seems that having Velero setuped in multiple namespace is not working. I think the Velero Pod can not make the difference between node agent pods of each namespace.

For me, when i create a backup with Restic Volume Backup on namespace A, Velero tried to use the node-agent pods of the namespace B

Error observed;

{
  "backup": "velero-syno/test-12",
  "error.file": "/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:250",
  "error.function": "github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes",
  "error.message": "pod volume backup failed: error creating uploader: failed to connect repository: error running command=restic snapshots --repo=s3:https://minio.example.com:10000/velero-bucket/restic/test-ceph-rbd --password-file=/tmp/credentials/velero-ovh/velero-repo-credentials-repository-password --cache-dir=/scratch/.cache/restic --latest=1 --insecure-tls=true, stdout=, stderr=Fatal: unable to open config file: Stat: The Access Key Id you provided does not exist in our records.\nIs there a repository at the following location?\ns3:https://minio.example.com:10000/velero-bucket/restic/test-ceph-rbd\n: exit status 1",
  "level": "error",
  "logSource": "pkg/backup/backup.go:435",
  "msg": "Error backing up item",
  "name": "test-585dbb4d95-zh4gb",
  "time": "2023-07-12T08:27:18Z"
}

What did you expect to happen:
restic backup should have worked properly in both velero instances without credentials errors.
The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>

Anything else you would like to add:

Environment:

Velero version (use velero version):
Velero features (use velero client config get features):
Kubernetes version (use kubectl version):
Kubernetes installer & version:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

Lyndon-Li mentioned this issue Jul 19, 2023

Restrict namespace to node-agent cache #6523

Merged

Lyndon-Li self-assigned this Jul 19, 2023

Lyndon-Li added this to the v1.12 milestone Jul 19, 2023

reasonerjt added the Bug label Jul 19, 2023

Lyndon-Li added target/1.10.4 target/1.11.2 labels Jul 20, 2023

This was referenced Jul 20, 2023

[Cherry-pick] Restict namespace to node-agent cache #6527

Merged

[Cherry-pick] Restict namespace to node-agent cache #6528

Merged

Lyndon-Li closed this as completed Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible pod volume backup failure if velero is installed in multiple namespaces. #6519

Possible pod volume backup failure if velero is installed in multiple namespaces. #6519

sseago commented Jul 18, 2023

Possible pod volume backup failure if velero is installed in multiple namespaces. #6519

Possible pod volume backup failure if velero is installed in multiple namespaces. #6519

Comments

sseago commented Jul 18, 2023