Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csiVolumeSnapshotsAttempted/Completed not correctly populated in backup status #7163

Closed
rnarenpujari opened this issue Nov 29, 2023 · 10 comments · Fixed by #7184
Closed

csiVolumeSnapshotsAttempted/Completed not correctly populated in backup status #7163

rnarenpujari opened this issue Nov 29, 2023 · 10 comments · Fixed by #7184
Assignees
Milestone

Comments

@rnarenpujari
Copy link

What steps did you take and what happened:

The backup CR's fields .status.csiVolumeSnapshotsAttempted & .status.csiVolumeSnapshotsCompleted do not get populated when the backup completes even when csi snapshots are taken (not using data mover). Seems to be working on 1.12.1 so probably a regression somewhere?

ime="2023-11-29T07:54:54Z" level=info msg="Starting Velero server main (7320bb76744bc7052d839644fcbe34eb746a0f20-dirty)" logSource="pkg/cmd/server/server.go:189"

What did you expect to happen:

Fields to have non zero values.

The following information will help us better understand what's going on:

bk-1.txt
details.txt
logs.txt

Anything else you would like to add:

NA

Environment:

  • Velero version (use velero version): commit 7320bb7
  • Velero features (use velero client config get features): EnableCSI
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@kaovilai

This comment was marked as outdated.

@kaovilai

This comment was marked as outdated.

@kaovilai

This comment was marked as outdated.

@kaovilai
Copy link
Contributor

kaovilai commented Dec 2, 2023

Tried to reproduce.. looks like the CSI status were added but removed in finalizing phase.

@kaovilai
Copy link
Contributor

kaovilai commented Dec 2, 2023

Regression sideeffect of #7111 which attempt to make another status update during finalize.

During finalize, there is code in CSI plugin to delete finished snapshots

	if backup.Status.Phase == velerov1api.BackupPhaseFinalizing || backup.Status.Phase == velerov1api.BackupPhaseFinalizingPartiallyFailed {
		p.Log.WithField("Backup", fmt.Sprintf("%s/%s", backup.Namespace, backup.Name)).
			WithField("BackupPhase", backup.Status.Phase).Debugf("Clean VolumeSnapshots.")
		util.DeleteVolumeSnapshot(vs, *vsc, backup, snapshotClient.SnapshotV1(), p.Log)
		return item, nil, "", nil, nil
	}

https://github.com/vmware-tanzu/velero-plugin-for-csi/blob/main/internal/backup/volumesnapshot_action.go#L104-L109

So it seems like to get reliable update of CSI snapshot status would be to track the snapshots in memory vs listing in cluster. Looking into if snapshots are only deleted upon successful CSI snapshot.

@kaovilai
Copy link
Contributor

kaovilai commented Dec 2, 2023

Might be wise to retrieve info from CSI plugin via OperationProgress in order to fix this and #7047 permanently.

One rough idea #7172

@blackpiglet
Copy link
Contributor

Agree, because the CSI plugin will delete the VolumeSnapshot after backing up the volume, depending on VolumeSnapshot to count the metric is not right.

@kaovilai
Copy link
Contributor

kaovilai commented Dec 6, 2023

If any plugin could add to the backup status that would also be cool.

blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 6, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finlize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 6, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 6, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 7, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
@blackpiglet
Copy link
Contributor

Reopen this issue to trace the v1.12.3 back port task.

@blackpiglet blackpiglet reopened this Dec 7, 2023
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 12, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 12, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 12, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
blackpiglet pushed a commit to blackpiglet/velero that referenced this issue Dec 18, 2023
Update CSIVolumeSnapshotsCompleted in backup's status and the metric
during backup finalize stage according to async operations content.

Signed-off-by: Xun Jiang <[email protected]>
@blackpiglet
Copy link
Contributor

Close because the release-1.12 cherry-pick PR already merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants