Impossible to restore PVC using CSI data mover on OKD cluster #8178

vincmarz · 2024-09-02T13:32:33Z

What steps did you take and what happened:

We are using velero 1.14.1 on OKD, with the data mover feature but our restore are partially failed:

$ velero restore describe nginx-dev-2024-09-02-restore --details
Name:         nginx-dev-2024-09-02-restore
Namespace:    openshift-adp
Labels:       <none>
Annotations:  <none>

Phase:                       PartiallyFailed (run 'velero restore logs nginx-dev-2024-09-02-restore' for more information)
Total items to be restored:  43
Items restored:              43

Started:    2024-09-02 10:49:44 +0200 CEST
Completed:  2024-09-02 14:59:49 +0200 CEST

Warnings:
  Velero:     <none>
  Cluster:  could not restore, CustomResourceDefinition "volumesnapshots.snapshot.storage.k8s.io" already exists. Warning: the in-cluster version is different than the backed-up version
  Namespaces:
    nginx-dev:  could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, ConfigMap "openshift-service-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "admin" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version
                could not restore, RoleBinding "system:openshift:scc:anyuid" already exists. Warning: the in-cluster version is different than the backed-up version

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    nginx-dev:  fail to patch dynamic PV, err: context deadline exceeded, PVC: nginx-pvc, PV: pvc-a0074f98-1d0b-47bd-a794-267b3bc510b9

Backup:  book-dev-test-2024-09-02-datamove

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Or label selector:  <none>

Restore PVs:  auto

CSI Snapshot Restores:
  nginx-dev/nginx-pvc:
    Data Movement:
      Operation ID: dd-a27e631f-76f0-4761-9834-61d13ea30280.a0074f98-1d0b-47b4fc259
      Data Mover: velero
      Uploader Type: kopia

Existing Resource Policy:   <none>
ItemOperationTimeout:       4h0m0s

Preserve Service NodePorts:  auto

Uploader config:

Restore Item Operations:
  Operation for persistentvolumeclaims nginx-dev/nginx-pvc:
    Restore Item Action Plugin:  velero.io/csi-pvc-restorer
    Operation ID:                dd-a27e631f-76f0-4761-9834-61d13ea30280.a0074f98-1d0b-47b4fc259
    Phase:                       Failed
    Operation Error:             Asynchronous action timed out
    Progress description:        Accepted
    Created:                     2024-09-02 10:49:48 +0200 CEST

HooksAttempted:   0
HooksFailed:      0

What did you expect to happen:

Restore to complete successfully.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>

Anything else you would like to add:

Environment:

Velero version (use velero version): 1.14.1
Velero features (use velero client config get features): features: EnableCSI
Kubernetes version (use kubectl version): v1.28.7+6e2789b
Kubernetes installer & version: OpenSHift OKD 4.15.0-0.okd-2024-03-10-010116
Cloud provider or hardware configuration: Microsoft HyperV
OS (e.g. from /etc/os-release):
NAME="Fedora Linux"
VERSION="39.20240210.3.0 (CoreOS)"
ID=fedora
VERSION_ID=39
VERSION_CODENAME=""
PLATFORM_ID="platform:f39"
PRETTY_NAME="Fedora CoreOS 39.20240210.3.0"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:39"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=39
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=39
SUPPORT_END=2024-11-12
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='39.20240210.3.0'

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

Lyndon-Li · 2024-09-03T02:20:28Z

Could you describe the restored PVC and PV, looks like they are not in bound state.

vincmarz · 2024-09-03T12:49:59Z

Hi! Thanks for your reply.
I retry and I'll describe step by step my procedure.
While restore is in progress, I get:

1. Restore

$ velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
nginx-dev-2024-09-02-datamove-restore nginx-dev-2024-09-02-datamove WaitingForPluginOperations 2024-09-03 09:35:24 +0200 CEST 0 10 2024-09-03 09:35:24 +0200 CEST

2. List of PVC

$ oc get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-dev nginx-pvc Pending px-csi-db 87m
openshift-adp nginx-dev-2024-09-02-datamove-restore-fhjgf Bound pvc-8775feb4-61fe-4496-b14d-10f79da07fd4 1Gi RWO px-csi-db 87m

3. Kubernetes events

$ oc -n nginx-dev get ev
LAST SEEN TYPE REASON OBJECT MESSAGE
55m Warning FailedScheduling pod/nginx-deployment-76484dcb9d-2g2cw 0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling..
4m51s Warning FailedScheduling pod/nginx-deployment-76484dcb9d-2g2cw 0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling..
55m Warning ProvisioningFailed persistentvolumeclaim/nginx-pvc Error saving claim: Operation cannot be fulfilled on persistentvolumeclaims "nginx-pvc": the object has been modified; please apply your changes to the latest version and try again
2m47s Normal Provisioning persistentvolumeclaim/nginx-pvc External provisioner is provisioning volume for claim "nginx-dev/nginx-pvc"
32m Warning ProvisioningFailed persistentvolumeclaim/nginx-pvc failed to provision volume with StorageClass "px-csi-db": claim Selector is not supported
18s Normal ExternalProvisioning persistentvolumeclaim/nginx-pvc Waiting for a volume to be created either by the external provisioner 'pxd.portworx.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

4. PVC details

$ oc -n nginx-dev describe pvc nginx-pvc
Name: nginx-pvc
Namespace: nginx-dev
StorageClass: px-csi-db
Status: Pending
Volume:
Labels: velero.io/backup-name=nginx-dev-2024-09-02-datamove
velero.io/restore-name=nginx-dev-2024-09-02-datamove-restore
velero.io/volume-snapshot-name=velero-nginx-pvc-4xlv8
Annotations: backup.velero.io/must-include-additional-items: true
velero.io/csi-volumesnapshot-class: vsnapclasspxd
volume.beta.kubernetes.io/storage-provisioner: pxd.portworx.com
volume.kubernetes.io/storage-provisioner: pxd.portworx.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: nginx-deployment-76484dcb9d-2g2cw
Events:
Type Reason Age From Message

Warning ProvisioningFailed 56m persistentvolume-controller Error saving claim: Operation cannot be fulfilled on persistentvolumeclaims "nginx-pvc": the object has been modified; please apply your changes to the latest version and try again
Warning ProvisioningFailed 33m (x14 over 56m) pxd.portworx.com_px-csi-ext-5bf5fb4cdb-wb5cj_172f4298-15f8-4dff-9e07-1dbd6fd9e692 failed to provision volume with StorageClass "px-csi-db": claim Selector is not supported
Normal Provisioning 3m30s (x22 over 56m) pxd.portworx.com_px-csi-ext-5bf5fb4cdb-wb5cj_172f4298-15f8-4dff-9e07-1dbd6fd9e692 External provisioner is provisioning volume for claim "nginx-dev/nginx-pvc"
Normal ExternalProvisioning 61s (x227 over 56m) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'pxd.portworx.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

5. PV details

$ oc describe pv pvc-8775feb4-61fe-4496-b14d-10f79da07fd4
Name: pvc-8775feb4-61fe-4496-b14d-10f79da07fd4
Labels:
Annotations: pv.kubernetes.io/provisioned-by: pxd.portworx.com
volume.kubernetes.io/provisioner-deletion-secret-name:
volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers: [kubernetes.io/pv-protection]
StorageClass: px-csi-db
Status: Bound
Claim: openshift-adp/nginx-dev-2024-09-02-datamove-restore-fhjgf
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 1Gi
Node Affinity:
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: pxd.portworx.com
FSType: ext4
VolumeHandle: 1075204168189915209
ReadOnly: false
VolumeAttributes: attached=ATTACH_STATE_INTERNAL_SWITCH
error=
parent=
readonly=false
secure=false
shared=false
sharedv4=false
state=VOLUME_STATE_DETACHED
storage.kubernetes.io/csiProvisionerIdentity=1724280818749-2702-pxd.portworx.com
Events:

6. Restore PartiallyFailed

After 4 hours I get:

$ velero restore describe nginx-dev-2024-09-02-datamove-restore --details
Name: nginx-dev-2024-09-02-datamove-restore
Namespace: openshift-adp
Labels:
Annotations:

Phase: PartiallyFailed (run 'velero restore logs nginx-dev-2024-09-02-datamove-restore' for more information)
Total items to be restored: 42
Items restored: 42

Started: 2024-09-03 09:35:24 +0200 CEST
Completed: 2024-09-03 13:45:35 +0200 CEST

Warnings:
Velero:
Cluster:
Namespaces:
nginx-dev: could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, ConfigMap "openshift-service-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "admin" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version
could not restore, RoleBinding "system:openshift:scc:anyuid" already exists. Warning: the in-cluster version is different than the backed-up version

Errors:
Velero:
Cluster:
Namespaces:
nginx-dev: fail to patch dynamic PV, err: context deadline exceeded, PVC: nginx-pvc, PV: pvc-ab8333bf-3f92-4685-bf4d-24234abca090

Backup: nginx-dev-2024-09-02-datamove

Namespaces:
Included: all namespaces found in the backup
Excluded:

Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
Cluster-scoped: auto

Namespace mappings:

Label selector:

Or label selector:

Restore PVs: auto

CSI Snapshot Restores:
nginx-dev/nginx-pvc:
Data Movement:
Operation ID: dd-7d42f4bd-971e-406d-bfd6-a1159da0a98e.ab8333bf-3f92-4689ca2a1
Data Mover: velero
Uploader Type: kopia

Existing Resource Policy:
ItemOperationTimeout: 4h0m0s

Preserve Service NodePorts: auto

Uploader config:

Restore Item Operations:
Operation for persistentvolumeclaims nginx-dev/nginx-pvc:
Restore Item Action Plugin: velero.io/csi-pvc-restorer
Operation ID: dd-7d42f4bd-971e-406d-bfd6-a1159da0a98e.ab8333bf-3f92-4689ca2a1
Phase: Failed
Operation Error: Asynchronous action timed out
Progress description: Accepted
Created: 2024-09-03 09:35:28 +0200 CEST

HooksAttempted: 0
HooksFailed: 0

Resource List:
apps/v1/Deployment:
- nginx-dev/nginx-deployment(created)
apps/v1/ReplicaSet:
- nginx-dev/nginx-deployment-76484dcb9d(created)
authorization.openshift.io/v1/RoleBinding:
- nginx-dev/admin(created)
- nginx-dev/system:deployers(failed)
- nginx-dev/system:image-builders(failed)
- nginx-dev/system:image-pullers(failed)
- nginx-dev/system:openshift:scc:anyuid(created)
discovery.k8s.io/v1/EndpointSlice:
- nginx-dev/nginx-4pxlb(created)
rbac.authorization.k8s.io/v1/RoleBinding:
- nginx-dev/admin(failed)
- nginx-dev/system:deployers(failed)
- nginx-dev/system:image-builders(failed)
- nginx-dev/system:image-pullers(failed)
- nginx-dev/system:openshift:scc:anyuid(failed)
route.openshift.io/v1/Route:
- nginx-dev/nginx(created)
v1/ConfigMap:
- nginx-dev/kube-root-ca.crt(failed)
- nginx-dev/openshift-service-ca.crt(failed)
v1/Endpoints:
- nginx-dev/nginx(created)
v1/Namespace:
- nginx-dev(created)
v1/PersistentVolume:
- pvc-ab8333bf-3f92-4685-bf4d-24234abca090(skipped)
v1/PersistentVolumeClaim:
- nginx-dev/nginx-pvc(created)
v1/Pod:
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw(created)
v1/Secret:
- nginx-dev/builder-dockercfg-55s2p(created)
- nginx-dev/builder-dockercfg-cqcc5(created)
- nginx-dev/builder-dockercfg-r8gv5(created)
- nginx-dev/builder-dockercfg-rx2qd(created)
- nginx-dev/builder-token-5qq56(skipped)
- nginx-dev/default-dockercfg-2clrp(created)
- nginx-dev/default-dockercfg-44cts(created)
- nginx-dev/default-dockercfg-q9kzq(created)
- nginx-dev/default-token-6h7z5(skipped)
- nginx-dev/deployer-dockercfg-hbrfm(created)
- nginx-dev/deployer-dockercfg-hfg7t(created)
- nginx-dev/deployer-dockercfg-snh9f(created)
- nginx-dev/deployer-dockercfg-zmsbw(created)
- nginx-dev/deployer-token-bd7jl(skipped)
- nginx-dev/nginx-dockercfg-9t92h(created)
v1/Service:
- nginx-dev/nginx(created)
v1/ServiceAccount:
- nginx-dev/builder(updated)
- nginx-dev/default(updated)
- nginx-dev/deployer(updated)
- nginx-dev/nginx(created)
velero.io/v2alpha1/DataUpload:
- openshift-adp/nginx-dev-2024-09-02-datamove-v9rdd(skipped)

The previous PVC into openshift-adp namespace is disappeared:

$ oc get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-dev nginx-pvc Pending px-csi-db 5h4m

Also PV pvc-8775feb4-61fe-4496-b14d-10f79da07fd4 is disappeared.

The PVC is always in pending state:

oc -n nginx-dev describe pvc nginx-pvc
Name: nginx-pvc
Namespace: nginx-dev
StorageClass: px-csi-db
Status: Pending
Volume:
Labels: velero.io/backup-name=nginx-dev-2024-09-02-datamove
velero.io/restore-name=nginx-dev-2024-09-02-datamove-restore
velero.io/volume-snapshot-name=velero-nginx-pvc-4xlv8
Annotations: backup.velero.io/must-include-additional-items: true
velero.io/csi-volumesnapshot-class: vsnapclasspxd
volume.beta.kubernetes.io/storage-provisioner: pxd.portworx.com
volume.kubernetes.io/storage-provisioner: pxd.portworx.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: nginx-deployment-76484dcb9d-2g2cw
Events:
Type Reason Age From Message

Normal Provisioning 3m37s (x89 over 5h7m) pxd.portworx.com_px-csi-ext-5bf5fb4cdb-wb5cj_172f4298-15f8-4dff-9e07-1dbd6fd9e692 External provisioner is provisioning volume for claim "nginx-dev/nginx-pvc"
Normal ExternalProvisioning 2m7s (x1252 over 5h7m) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'pxd.portworx.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

Lyndon-Li · 2024-09-03T14:06:14Z

How much data is to be restored?

Lyndon-Li · 2024-09-03T14:07:31Z

Please share the velero log bundle by running velero debug

vincmarz · 2024-09-03T14:45:53Z

Hi! This my backup:
$ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
nginx-dev-2024-09-02-datamove Completed 0 0 2024-09-03 09:14:13 +0200 CEST 29d default

velero backup describe nginx-dev-2024-09-02-datamove --details
Name: nginx-dev-2024-09-02-datamove
Namespace: openshift-adp
Labels: velero.io/storage-location=default
Annotations: velero.io/resource-timeout=10m0s
velero.io/source-cluster-k8s-gitversion=v1.28.2-3598+6e2789bbd58938-dirty
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=28+

Phase: Completed

Namespaces:
Included: nginx-dev
Excluded:

Resources:
Included: *
Excluded:
Cluster-scoped: auto

Label selector:

Or label selector:

Storage Location: default

Velero-Native Snapshot PVs: true
Snapshot Move Data: true
Data Mover: velero

TTL: 720h0m0s

CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 4h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2024-09-03 09:14:13 +0200 CEST
Completed: 2024-09-03 09:14:34 +0200 CEST

Expiration: 2024-10-03 09:14:13 +0200 CEST

Total items to be backed up: 111
Items backed up: 111

Backup Item Operations:
Operation for persistentvolumeclaims nginx-dev/nginx-pvc:
Backup Item Action Plugin: velero.io/csi-pvc-backupper
Operation ID: du-15c8f52f-bf10-4b40-b26b-e01bf1bcdd3a.ab8333bf-3f92-468fe9997
Items to Update:
datauploads.velero.io openshift-adp/nginx-dev-2024-09-02-datamove-v9rdd
Phase: Completed
Progress: 617 of 617 complete (Bytes)
Progress description: Completed
Created: 2024-09-03 09:14:19 +0200 CEST
Started: 2024-09-03 09:14:29 +0200 CEST
Updated: 2024-09-03 09:14:32 +0200 CEST
Resource List:
apps/v1/Deployment:
- nginx-dev/nginx-deployment
apps/v1/ReplicaSet:
- nginx-dev/nginx-deployment-76484dcb9d
authorization.openshift.io/v1/RoleBinding:
- nginx-dev/admin
- nginx-dev/system:deployers
- nginx-dev/system:image-builders
- nginx-dev/system:image-pullers
- nginx-dev/system:openshift:scc:anyuid
discovery.k8s.io/v1/EndpointSlice:
- nginx-dev/nginx-4pxlb
rbac.authorization.k8s.io/v1/RoleBinding:
- nginx-dev/admin
- nginx-dev/system:deployers
- nginx-dev/system:image-builders
- nginx-dev/system:image-pullers
- nginx-dev/system:openshift:scc:anyuid
route.openshift.io/v1/Route:
- nginx-dev/nginx
v1/ConfigMap:
- nginx-dev/kube-root-ca.crt
- nginx-dev/openshift-service-ca.crt
v1/Endpoints:
- nginx-dev/nginx
v1/Event:
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1612ffdea76a6
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a75ea05c047b
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a7610d889411
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a7610e19d318
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a7614abef33e
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a7614da5277e
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a761d6dba899
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a761dd6deb0a
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw.17f1a761df0bdeda
- nginx-dev/nginx-pvc.17f16132268c095a
- nginx-dev/nginx-pvc.17f1613226922609
- nginx-dev/nginx-pvc.17f1a75ea20267e4
- nginx-dev/nginx-pvc.17f1a75ea22c0555
- nginx-dev/nginx-pvc.17f1a75ec2f3f365
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f0f5824742
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f0f77902ea
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f0f7cd23ae
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f13953f0c0
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f1395410c8
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f139b12188
- nginx-dev/velero-nginx-pvc-2qmpw.17f1a7f139b16710
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7a0c0fc56
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7a1ac69f0
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7a5827096
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7e3d4a20f
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7e3d4eb1b
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7e41529a9
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7e4155ec9
- nginx-dev/velero-nginx-pvc-6jwg6.17f1a7c7e6df82c1
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23a2afe645
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23a3fc29d3
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23a47800fa
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23e7148c4b
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23e714b3bf
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23e735086a
- nginx-dev/velero-nginx-pvc-7bfhm.17f1aa23e73548de
- nginx-dev/velero-nginx-pvc-7z4jn.17f1aa70b6e25bd2
- nginx-dev/velero-nginx-pvc-7z4jn.17f1aa70b9e483fc
- nginx-dev/velero-nginx-pvc-7z4jn.17f1aa70bb0ae21f
- nginx-dev/velero-nginx-pvc-7z4jn.17f1aa70fb129f78
- nginx-dev/velero-nginx-pvc-7z4jn.17f1aa70fb12d948
- nginx-dev/velero-nginx-pvc-7z4jn.17f1aa70fe99daa4
- nginx-dev/velero-nginx-pvc-d8znc.17f1a78077541df8
- nginx-dev/velero-nginx-pvc-d8znc.17f1a78079e476ac
- nginx-dev/velero-nginx-pvc-d8znc.17f1a780bc37d807
- nginx-dev/velero-nginx-pvc-d8znc.17f1a780bc380877
- nginx-dev/velero-nginx-pvc-f49rd.17f1a7b2114a6f39
- nginx-dev/velero-nginx-pvc-f49rd.17f1a7b212a2d1da
- nginx-dev/velero-nginx-pvc-f49rd.17f1a7b2acafec24
- nginx-dev/velero-nginx-pvc-f49rd.17f1a7b2acb02b6c
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cf74068711
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cf75d318c3
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cf785abf05
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cfba9f1533
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cfba9f359f
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cfbb2a8961
- nginx-dev/velero-nginx-pvc-jmrqw.17f1a7cfbb2ad849
- nginx-dev/velero-nginx-pvc-txpxl.17f1a82edb737c69
- nginx-dev/velero-nginx-pvc-txpxl.17f1a82edcb4db07
- nginx-dev/velero-nginx-pvc-txpxl.17f1a82f1efdb2bb
- nginx-dev/velero-nginx-pvc-txpxl.17f1a82f1efdea97
- nginx-dev/velero-nginx-pvc-txpxl.17f1a82f1f636905
- nginx-dev/velero-nginx-pvc-txpxl.17f1a82f1f63a2d5
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fdbe580d31
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fdc22c03aa
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fdc3225095
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fe074acad3
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fe074b0cd7
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fe07ac581b
- nginx-dev/velero-nginx-pvc-z6f9t.17f1a7fe07ac8cd7
v1/Namespace:
- nginx-dev
v1/PersistentVolume:
- pvc-ab8333bf-3f92-4685-bf4d-24234abca090
v1/PersistentVolumeClaim:
- nginx-dev/nginx-pvc
v1/Pod:
- nginx-dev/nginx-deployment-76484dcb9d-2g2cw
v1/Secret:
- nginx-dev/builder-dockercfg-55s2p
- nginx-dev/builder-dockercfg-cqcc5
- nginx-dev/builder-dockercfg-r8gv5
- nginx-dev/builder-dockercfg-rx2qd
- nginx-dev/builder-token-5qq56
- nginx-dev/default-dockercfg-2clrp
- nginx-dev/default-dockercfg-44cts
- nginx-dev/default-dockercfg-q9kzq
- nginx-dev/default-token-6h7z5
- nginx-dev/deployer-dockercfg-hbrfm
- nginx-dev/deployer-dockercfg-hfg7t
- nginx-dev/deployer-dockercfg-snh9f
- nginx-dev/deployer-dockercfg-zmsbw
- nginx-dev/deployer-token-bd7jl
- nginx-dev/nginx-dockercfg-9t92h
v1/Service:
- nginx-dev/nginx
v1/ServiceAccount:
- nginx-dev/builder
- nginx-dev/default
- nginx-dev/deployer
- nginx-dev/nginx

Backup Volumes:
Velero-Native Snapshots:

CSI Snapshots:
nginx-dev/nginx-pvc:
Data Movement:
Operation ID: du-15c8f52f-bf10-4b40-b26b-e01bf1bcdd3a.ab8333bf-3f92-468fe9997
Data Mover: velero
Uploader Type: kopia
Moved data Size (bytes): 617

Pod Volume Backups:

HooksAttempted: 0
HooksFailed: 0

MinIO check object store contents:

$ mc ls --summarize --recursive okdminio/okd-oadp-velero/kopia
[2024-09-03 09:14:30 CEST] 771B STANDARD nginx-dev/_log_20240903071430_6d08_1725347670_1725347670_1_e31b312ab7c3137b04e16d84c40629a2
[2024-09-03 09:14:32 CEST] 1.3KiB STANDARD nginx-dev/_log_20240903071431_9af2_1725347671_1725347672_1_f71e7c8d2880e324544acb29c822592f
[2024-09-03 09:14:30 CEST] 30B STANDARD nginx-dev/kopia.blobcfg
[2024-09-03 09:14:30 CEST] 1.0KiB STANDARD nginx-dev/kopia.repository
[2024-09-03 09:14:32 CEST] 4.2KiB STANDARD nginx-dev/pa594dbabea29edeff9ec798780c48033-s63e0fd2f73f3729d12c
[2024-09-03 09:14:32 CEST] 4.2KiB STANDARD nginx-dev/q31cacb7fe126bbb989b60d4bc01f1c1f-s63e0fd2f73f3729d12c
[2024-09-03 09:14:31 CEST] 4.2KiB STANDARD nginx-dev/q83600f71258357d3583a50ed10b0a053-s9fccfdd5cd56691312c
[2024-09-03 09:14:30 CEST] 4.2KiB STANDARD nginx-dev/qb967aba306e0236894cff8e5c9508e78-s02a22606d6d2c56412c
[2024-09-03 09:14:31 CEST] 143B STANDARD nginx-dev/xn0_2fab18c0367eee8c99e5efbfc10578e2-s9fccfdd5cd56691312c-c1
[2024-09-03 09:14:30 CEST] 143B STANDARD nginx-dev/xn0_324dcc1bfea5de9569bc56e6757cce53-s02a22606d6d2c56412c-c1
[2024-09-03 09:14:32 CEST] 311B STANDARD nginx-dev/xn0_d25cf318d416145514270921f9bce4f4-s63e0fd2f73f3729d12c-c1

Total Size: 21 KiB
Total Objects: 11

$ mc ls --summarize --recursive okdminio/okd-oadp-velero/backups/nginx-dev-2024-09-02-datamove/
[2024-09-03 09:14:19 CEST] 29B STANDARD nginx-dev-2024-09-02-datamove-csi-volumesnapshotclasses.json.gz
[2024-09-03 09:14:19 CEST] 29B STANDARD nginx-dev-2024-09-02-datamove-csi-volumesnapshotcontents.json.gz
[2024-09-03 09:14:19 CEST] 29B STANDARD nginx-dev-2024-09-02-datamove-csi-volumesnapshots.json.gz
[2024-09-03 09:14:33 CEST] 386B STANDARD nginx-dev-2024-09-02-datamove-itemoperations.json.gz
[2024-09-03 09:14:19 CEST] 13KiB STANDARD nginx-dev-2024-09-02-datamove-logs.gz
[2024-09-03 09:14:19 CEST] 29B STANDARD nginx-dev-2024-09-02-datamove-podvolumebackups.json.gz
[2024-09-03 09:14:19 CEST] 1.1KiB STANDARD nginx-dev-2024-09-02-datamove-resource-list.json.gz
[2024-09-03 09:14:19 CEST] 49B STANDARD nginx-dev-2024-09-02-datamove-results.gz
[2024-09-03 09:14:34 CEST] 425B STANDARD nginx-dev-2024-09-02-datamove-volumeinfo.json.gz
[2024-09-03 09:14:19 CEST] 29B STANDARD nginx-dev-2024-09-02-datamove-volumesnapshots.json.gz
[2024-09-03 09:14:34 CEST] 111KiB STANDARD nginx-dev-2024-09-02-datamove.tar.gz
[2024-09-03 09:14:34 CEST] 3.4KiB STANDARD velero-backup.json

Total Size: 129 KiB
Total Objects: 12

$ velero debug --backup nginx-dev-2024-09-02-datamove --restore nginx-dev-2024-09-02-datamove-restore
2024/09/03 16:37:04 Collecting velero resources in namespace: openshift-adp
2024/09/03 16:37:05 Collecting velero deployment logs in namespace: openshift-adp
2024/09/03 16:37:06 Collecting log and information for backup: nginx-dev-2024-09-02-datamove
2024/09/03 16:37:07 Collecting log and information for restore: nginx-dev-2024-09-02-datamove-restore
2024/09/03 16:37:07 Generated debug information bundle: /home/okdadmin/bundle-2024-09-03-16-37-04.tar.gz
bundle-2024-09-03-16-37-04.tar.gz

Lyndon-Li · 2024-09-04T03:06:32Z

From the log, I see a DD created at 2024-09-03T07:35:28Z but not handled by any node-agent in 4 hours so it was cancelled:

            "metadata": {
                "creationTimestamp": "2024-09-03T07:35:28Z",
                "generateName": "nginx-dev-2024-09-02-datamove-restore-",

            "status": {
                "completionTimestamp": "2024-09-03T11:35:35Z",
                "phase": "Canceled",
                "progress": {},
                "startTimestamp": "2024-09-03T11:35:35Z"
            }

Looks like this DD was never handled by any controller and finally timeout.

Lyndon-Li · 2024-09-04T06:34:51Z

From the node-agent log, the restorePod never got to running status, so the data movement had never started:

time="2024-09-03T07:35:28Z" level=info msg="Accepting data download nginx-dev-2024-09-02-datamove-restore-fhjgf" controller=DataDownload logSource="pkg/controller/data_download_controller.go:667"
time="2024-09-03T07:35:28Z" level=info msg="This datadownload has been accepted by infra-03.ocp4.policlinico.org" DataDownload=nginx-dev-2024-09-02-datamove-restore-fhjgf controller=DataDownload logSource="pkg/controller/data_download_controller.go:692"
time="2024-09-03T07:35:28Z" level=info msg="Data download is accepted" controller=datadownload datadownload=openshift-adp/nginx-dev-2024-09-02-datamove-restore-fhjgf logSource="pkg/controller/data_download_controller.go:167"
time="2024-09-03T07:35:28Z" level=info msg="Target PVC is consumed" logSource="pkg/exposer/generic_restore.go:84" owner=nginx-dev-2024-09-02-datamove-restore-fhjgf selected node= source namespace=nginx-dev target PVC=nginx-pvc
time="2024-09-03T07:35:28Z" level=info msg="Restore pod is created" logSource="pkg/exposer/generic_restore.go:95" owner=nginx-dev-2024-09-02-datamove-restore-fhjgf pod name=nginx-dev-2024-09-02-datamove-restore-fhjgf source namespace=nginx-dev target PVC=nginx-pvc
time="2024-09-03T07:35:28Z" level=info msg="Restore PVC is created" logSource="pkg/exposer/generic_restore.go:108" owner=nginx-dev-2024-09-02-datamove-restore-fhjgf pvc name=nginx-dev-2024-09-02-datamove-restore-fhjgf source namespace=nginx-dev target PVC=nginx-pvc
time="2024-09-03T07:35:28Z" level=info msg="Restore is exposed" controller=datadownload datadownload=openshift-adp/nginx-dev-2024-09-02-datamove-restore-fhjgf logSource="pkg/controller/data_download_controller.go:195"

@vincmarz Could you check the status of the restorePod created during the data movement? If it is not running, just describe it and see what problem blocks its running.

vincmarz · 2024-09-04T13:58:33Z

Hi! It's the first time we're using CSI storage so we're exploring new possibility about CSI data moving. I retryed and I got the same results after 4 hours timeout:

1. Pod is pending

$ oc get all -n nginx-dev
W0904 15:41:34.167338 575746 warnings.go:70] apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
NAME READY STATUS RESTARTS AGE
pod/nginx-deployment-7754db9f48-zw2h5 0/1 Pending 0 4h13m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx ClusterIP 172.30.252.244 80/TCP 4h13m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-deployment 0/1 1 0 4h13m

NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-deployment-76484dcb9d 0 0 0 4h13m
replicaset.apps/nginx-deployment-7754db9f48 1 1 0 4h13m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/nginx nginx-nginx-dev.apps.ocp4.policlinico.org nginx 80 None

$ oc describe po nginx-deployment-7754db9f48-zw2h5 -n nginx-dev
Name: nginx-deployment-7754db9f48-zw2h5
Namespace: nginx-dev
Priority: 0
Node:
Labels: app=nginx
pod-template-hash=7754db9f48
velero.io/backup-name=nginx-dev-2024-09-02-datamove
velero.io/restore-name=nginx-dev-2024-09-02-datamove-restore
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.129.2.30/23"],"mac_address":"0a:58:0a:81:02:1e","gateway_ips":["10.129.2.1"],"routes":[{"dest":"10.128.0.0...
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.129.2.30"
],
"mac": "0a:58:0a:81:02:1e",
"default": true,
"dns": {}
}]
openshift.io/scc: anyuid
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/nginx-deployment-7754db9f48
Containers:
container:
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Environment:
Mounts:
/var/log/nginx from nginx-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pr2p2 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
nginx-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nginx-pvc
ReadOnly: false
kube-api-access-pr2p2:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional:
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message

Warning FailedScheduling 4h17m stork 0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling..

$ oc get ev -n nginx-dev
LAST SEEN TYPE REASON OBJECT MESSAGE
22m Warning FailedScheduling pod/nginx-deployment-7754db9f48-zw2h5 0/10 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/10 nodes are available: 10 Preemption is not helpful for scheduling..
4m23s Normal Provisioning persistentvolumeclaim/nginx-pvc External provisioner is provisioning volume for claim "nginx-dev/nginx-pvc"
2m53s Normal ExternalProvisioning persistentvolumeclaim/nginx-pvc Waiting for a volume to be created either by the external provisioner 'pxd.portworx.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

2. PVC is pending

$ oc get pvc -n nginx-dev
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-pvc Pending px-csi-db 4h18m

$ oc get pvc -n nginx-dev
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-pvc Pending px-csi-db 4h18m
[okdadmin@puidc1vokdbast01 demo-nginx]$ oc describe pvc nginx-pvc -n nginx-dev
Name: nginx-pvc
Namespace: nginx-dev
StorageClass: px-csi-db
Status: Pending
Volume:
Labels: velero.io/backup-name=nginx-dev-2024-09-02-datamove
velero.io/restore-name=nginx-dev-2024-09-02-datamove-restore
velero.io/volume-snapshot-name=velero-nginx-pvc-bq79n
Annotations: backup.velero.io/must-include-additional-items: true
velero.io/csi-volumesnapshot-class: vsnapclasspxd
volume.beta.kubernetes.io/storage-provisioner: pxd.portworx.com
volume.kubernetes.io/storage-provisioner: pxd.portworx.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: nginx-deployment-7754db9f48-zw2h5
Events:
Type Reason Age From Message

Normal ExternalProvisioning 4m50s (x1047 over 4h19m) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'pxd.portworx.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal Provisioning 80s (x77 over 4h19m) pxd.portworx.com_px-csi-ext-5bf5fb4cdb-wb5cj_172f4298-15f8-4dff-9e07-1dbd6fd9e692 External provisioner is provisioning volume for claim "nginx-dev/nginx-pvc"

3. Velero status

$ oc -n openshift-adp get all
W0904 15:48:38.709766 575927 warnings.go:70] apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
NAME READY STATUS RESTARTS AGE
pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 0/1 Completed 0 158m
pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt 0/1 Completed 0 98m
pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv 0/1 Completed 0 38m
pod/node-agent-bcvsf 1/1 Running 0 6h18m
pod/node-agent-c2xp2 1/1 Running 0 6h18m
pod/node-agent-cffjh 1/1 Running 0 6h18m
pod/node-agent-dfph5 1/1 Running 0 6h18m
pod/node-agent-kpm9w 1/1 Running 0 6h18m
pod/node-agent-pdjst 1/1 Running 0 6h18m
pod/node-agent-v42gx 1/1 Running 0 6h18m
pod/velero-86c6c965fd-g8rvw 1/1 Running 0 6h18m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-agent 7 7 7 7 7 6h18m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/velero 1/1 1 1 6h18m

NAME DESIRED CURRENT READY AGE
replicaset.apps/velero-86c6c965fd 1 1 1 6h18m

NAME COMPLETIONS DURATION AGE
job.batch/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165 1/1 4s 158m
job.batch/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174 1/1 4s 98m
job.batch/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182 1/1 5s 38m

$ oc -n openshift-adp get ev
LAST SEEN TYPE 26m 160m 160m 160m 160m 160m 160m 160m 100m 100m 100m 100m 100m 100m 100m 40m 40m 40m 40m 40m 40m 40m REASON OBJECT MESSAGE
Warning FailedMount pod/nginx-dev-2024-09-02-datamove-restore-blwm8 MountVolume.MountDevice failed for volume "pvc-18f94adb-76f6-40b3-a114-e107f85bace7" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name pxd.portworx.com not found in the list of registered CSI drivers
Normal Scheduled pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 Successfully assigned openshift-adp/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 to worker-02.ocp4.policlinico.org
Normal AddedInterface pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 Add eth0 [10.131.0.127/23] from ovn-kubernetes
Normal Pulled pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 Container image "velero/velero:v1.14.1" already present on machine
Normal Created pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 Created container velero-repo-maintenance-container
Normal Started pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4 Started container velero-repo-maintenance-container
Normal SuccessfulCreate job/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165 Created pod: nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165-h9ws4
Normal Completed job/nginx-dev-default-kopia-9mdpf-maintain-job-1725448191165 Job completed
Normal Scheduled pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt Successfully assigned openshift-adp/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt to worker-02.ocp4.policlinico.org
Normal AddedInterface pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt Add eth0 [10.131.0.128/23] from ovn-kubernetes
Normal Pulled pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt Container image "velero/velero:v1.14.1" already present on machine
Normal Created pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt Created container velero-repo-maintenance-container
Normal Started pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt Started container velero-repo-maintenance-container
Normal SuccessfulCreate job/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174 Created pod: nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174-5dlxt
Normal Completed job/nginx-dev-default-kopia-9mdpf-maintain-job-1725451791174 Job completed
Normal Scheduled pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv Successfully assigned openshift-adp/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv to worker-04.ocp4.policlinico.org
Normal AddedInterface pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv Add eth0 [10.128.4.236/23] from ovn-kubernetes
Normal Pulled pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv Container image "velero/velero:v1.14.1" already present on machine
Normal Created pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv Created container velero-repo-maintenance-container
Normal Started pod/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv Started container velero-repo-maintenance-container
Normal SuccessfulCreate job/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182 Created pod: nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182-wn2kv
Normal Completed job/nginx-dev-default-kopia-9mdpf-maintain-job-1725455391182 Job completed

4. Velero log

bundle-2024-09-04-15-52-57.tar.gz

Lyndon-Li · 2024-09-04T15:01:12Z

@vincmarz
By restorePod, I am not meaning the pod to be restored, but the intermediate pod created in Velero namespace during the data mover restore. I suspect that pod is not in running state until timeout. So please describe that pod.

vincmarz · 2024-09-04T15:22:28Z

Hi! This is what happened.
When I launch the restore, a new intermediate pod is created in openshift-adp namespace:

POD

oc -n openshift-adp get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-dev-2024-09-02-datamove-restore-rf4zg 0/1 ContainerCreating 0 33s worker-04.ocp4.policlinico.org

PVC

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
persistentvolumeclaim/nginx-dev-2024-09-02-datamove-restore-g6cmn Bound pvc-055de3fa-b711-4b0d-aff7-e93b9b4c80aa 1Gi RWO px-csi-db 43s Filesystem

But the pod still remain in ContainerCreating state because the chosen node is not appropriate for restore because is it a storageless node (e.g in my scenario, only infra node are with storage).

So you can close or merge this issue because I see there is another issue about node selector useful during restore with data moving #8186.

Thanks for your support.

Best regards!

Lyndon-Li · 2024-09-05T02:29:55Z

@vincmarz
Could you describe more of your cluster architecture? How are the infra node and storageless node organized, what are their difference in terms of usage? And what is the volume mode (i.e., WaitForFirstConsumer, Immediate) for your PVC/PV?

I think this add a new use case to #8186, we want to have more inputs to prioritize our tasks.

vincmarz · 2024-09-05T08:29:24Z

Hi @Lyndon-Li this is my cluster:

$ oc get nodes
NAME STATUS ROLES AGE VERSION
infra-01.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
infra-02.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
infra-03.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
master-01.ocp4.policlinico.org Ready control-plane,master 68d v1.28.7+6e2789b
master-02.ocp4.policlinico.org Ready control-plane,master 68d v1.28.7+6e2789b
master-03.ocp4.policlinico.org Ready control-plane,master 68d v1.28.7+6e2789b
worker-01.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
worker-02.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
worker-03.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
worker-04.ocp4.policlinico.org Ready worker 36d v1.28.7+6e2789b

We have 3 node for Portworx Cluster:
infra-01.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
infra-02.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
infra-03.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b

And 4 storageless nodes:
worker-01.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
worker-02.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
worker-03.ocp4.policlinico.org Ready worker 68d v1.28.7+6e2789b
worker-04.ocp4.policlinico.org Ready worker 36d v1.28.7+6e2789b

For these node we use a server NFS for any storage needs.

For Portworx nodes, we use the following CSI storage class:

$ oc get sc px-csi-db -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
params/aggregation_level: Specifies the number of replication sets the volume
can be aggregated from
params/block_size: Block size
params/docs: https://docs.portworx.com/scheduler/kubernetes/dynamic-provisioning.html
params/fs: 'Filesystem to be laid out: none|xfs|ext4'
params/io_profile: 'IO Profile can be used to override the I/O algorithm Portworx
uses for the volumes: db|sequential|random|cms'
params/journal: Flag to indicate if you want to use journal device for the volume's
metadata. This will use the journal device that you used when installing Portworx.
It is recommended to use a journal device to absorb PX metadata writes
params/priority_io: 'IO Priority: low|medium|high'
params/repl: 'Replication factor for the volume: 1|2|3'
params/secure: 'Flag to create an encrypted volume: true|false'
params/shared: 'Flag to create a globally shared namespace volume which can be
used by multiple pods: true|false'
params/sticky: Flag to create sticky volumes that cannot be deleted until the
flag is disabled
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2024-07-11T10:25:41Z"
name: px-csi-db
resourceVersion: "29055571"
uid: 2977239a-793d-4765-b3c9-abe305f500a3
parameters:
io_profile: db_remote
repl: "3"
provisioner: pxd.portworx.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

github-actions · 2024-11-09T01:58:11Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

github-actions · 2024-11-23T02:02:23Z

This issue was closed because it has been stalled for 14 days with no activity.

Lyndon-Li added the area/datamover label Sep 4, 2024

Lyndon-Li mentioned this issue Sep 4, 2024

Restore finalizing should not run if the async restore operation is failed/cancelled #8182

Closed

Lyndon-Li self-assigned this Sep 4, 2024

github-actions bot added the staled label Nov 9, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impossible to restore PVC using CSI data mover on OKD cluster #8178

Impossible to restore PVC using CSI data mover on OKD cluster #8178

vincmarz commented Sep 2, 2024 •

edited by blackpiglet

Loading

Lyndon-Li commented Sep 3, 2024

vincmarz commented Sep 3, 2024

Lyndon-Li commented Sep 3, 2024

Lyndon-Li commented Sep 3, 2024

vincmarz commented Sep 3, 2024

Lyndon-Li commented Sep 4, 2024

Lyndon-Li commented Sep 4, 2024

vincmarz commented Sep 4, 2024

Lyndon-Li commented Sep 4, 2024

vincmarz commented Sep 4, 2024 •

edited

Loading

Lyndon-Li commented Sep 5, 2024

vincmarz commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Nov 9, 2024

github-actions bot commented Nov 23, 2024

Impossible to restore PVC using CSI data mover on OKD cluster #8178

Impossible to restore PVC using CSI data mover on OKD cluster #8178

Comments

vincmarz commented Sep 2, 2024 • edited by blackpiglet Loading

Lyndon-Li commented Sep 3, 2024

vincmarz commented Sep 3, 2024

1. Restore

2. List of PVC

3. Kubernetes events

4. PVC details

5. PV details

6. Restore PartiallyFailed

Lyndon-Li commented Sep 3, 2024

Lyndon-Li commented Sep 3, 2024

vincmarz commented Sep 3, 2024

Lyndon-Li commented Sep 4, 2024

Lyndon-Li commented Sep 4, 2024

vincmarz commented Sep 4, 2024

1. Pod is pending

2. PVC is pending

3. Velero status

4. Velero log

Lyndon-Li commented Sep 4, 2024

vincmarz commented Sep 4, 2024 • edited Loading

POD

PVC

Lyndon-Li commented Sep 5, 2024

vincmarz commented Sep 5, 2024 • edited Loading

github-actions bot commented Nov 9, 2024

github-actions bot commented Nov 23, 2024

vincmarz commented Sep 2, 2024 •

edited by blackpiglet

Loading

vincmarz commented Sep 4, 2024 •

edited

Loading

vincmarz commented Sep 5, 2024 •

edited

Loading