-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to take snapshot of the volume -- failed with error: error parsing volume id -- should at least contain two # #8122
Comments
I'm a little confused about your scenario. For the error, I think there is a limitation on the format of the shared file ID in the Azure File CSI code. |
Seems Azure thinks if a PV is provisioned by |
Thank you for your response. The PV is provisioned statically using a manifest with a specific name. The PVC then defines use of the PV.
Thank you for the link to the Azure file CSI code which shows the limitation on the format of the volume handle. Could you please let me know your thoughts on what I could do in this case ? |
Due to the CSI Azure File snapshotter limitation, and the volume was not created by the CSI way, I think we cannot back up the volume by the CSI way. |
Thank you for the update. Yes, I plan to test the filesystem backup method for the problematic volumes. I plan to annotate the deployments using the said volumes with the below, so that filesystem backup is done for those only:
|
Please note that I annotated the pods with the below annotation:
On running a fresh backup, I see that it completed with the status "PartiallyFailed" and have the below errors on a
Not sure I understand this error. Could you please let me know your thoughts on the same ? |
@sivarama-p-raju Are you running the node agent? You told velero to use fs-backup for those pods, but if you're using fs-backup with kopia (or restic), then you need to run the node agent daemonset. From the error message, either the node agent isn't running at all, or it's not running on the nodes with your pods for some reason. |
@sseago Thank you for your reply. We did not have node agent running on the particular cluster. I enabled deploy of node agent and triggered a new backup after this. There are new errors this time:
On searching more on this, I found this issue, and tried doing the same. The backups now complete successfully. However, I notice that there are
I needed your help with the below queries:
Thanks a lot in advance. |
@sseago @blackpiglet |
@sivarama-p-raju Kopia must be able to write to $HOME -- in the usual default configurations this should be possible. Is the root filesystem mounted read-only? |
@sivarama-p-raju In any case, here is the fix:
et voilà. @sivarama-p-raju I am not sure you understand but there is a reason why everyone is trying to run your pods with |
@bernardgut Thank you for the provided details. As @sseago mentioned, it should work in the usual default configurations. However, I did not get chance to work on fixing this and I removed use of Kopia completely in our AKS clusters. We still have a couple of volumes that are provisioned the way I decribed earlier (not dynamically provisioned) and we are still trying to figure out a way to back those up. I will close this issue for now and try out the fix you mentioned. |
@sivarama-p-raju np. Also here is my current config, in case it helps you or anyone else who needs to run velero without any priviledges successfully :
podSecurityContext:
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containerSecurityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
# add extra volumes and volume mounts for running with
# read-only root filesystem:
extraVolumes:
- emptyDir: {}
name: udmrepo
- emptyDir: {}
name: cache
extraVolumeMounts:
- mountPath: /udmrepo
name: udmrepo
- mountPath: /.cache
name: cache
kubectl:
containerSecurityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.11.0
volumeMounts:
- mountPath: /target
name: plugins
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
configuration:
features: EnableCSI,EnableAPIGroupVersions
backupStorageLocation:
- name: s3.REDACTEDDOMAIN
provider: aws
bucket: p0-backup
default: true
config:
region: minio
s3ForcePathStyle: "true"
s3Url: https://s3.REDACTEDDOMAIN
volumeSnapshotLocation:
- name: s3.REDACTEDDOMAIN
provider: aws
config:
region: minio
deployNodeAgent: true
nodeAgent:
priorityClassName: system-node-critical
podSecurityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 0
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containerSecurityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
extraArgs:
- --node-agent-configmap=concurrency-config
{
"loadConcurrency": {
"globalConfig": 2
}
} install procedure:
I havent found a way to run the Right now they bypass kubernetes API and therefore security (RBAC) entirely by mounting a If I have time I might create an issue as I think this is a huge security risk. But right now I am too busy, Cheers. |
What steps did you take and what happened:
When a normal scheduled backup is run, the backup completes with state "PartiallyFailed". On reviewing the description of the backup, the below errors were found repeating many times:
The volume in question "-main-dev" is using the storageclass "azurefile-csi", but is not a dynamically provisioned volume.
There are other volumes using the same storageclass but are dynamically provisioned volumes, and the volume handle of those volumes contains at least two #, and so the requirement seems to be met.
Is this really a hard requirement ? And is this requirement only applicable to the volumes using "azurefile-csi" storageclass ?
What did you expect to happen:
Expect the backup to complete successfully without the errors.
Anything else you would like to add:
This is on AKS cluster running Kubernetes v1.29.4. We have a similar issue on multiple AKS clusters.
Environment:
velero version
): v1.14.0velero client config get features
):kubectl version
): v1.29.4/etc/os-release
): NAVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: