Silence "deletion in progress" errors when deleting VRGs #1689

nirs · 2024-12-01T18:26:19Z

Fixing vrg delete order added log of errors about "VRG deletion in progress". This change convert the errors (with stacktrace) to INFO message.

During DRPC delete we expect to see these logs one or more times:

2024-12-02T00:29:23.243Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "d944591e-fcf5-4882-8143-2e2a8f0351fa", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:23.358Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "364798af-15c0-4bba-829e-ae9928fe2383", "reason": "secondary VRG manifestwork deletion in progress"}
...
2024-12-02T00:30:05.230Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "3353357f-3cf8-4a6f-bb16-8aa95f24aec1", "reason": "primary VRG manifestwork deletion in progress"}

nirs · 2024-12-01T19:54:21Z

Bogus errors in delete flow

We have 18 errors - 15 errors for the secondary vrg:

2024-12-01T19:37:37.543Z ERROR controller/controller.go:316 Reconciler error {"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "3041f0da-e705-450c-bdd1-e159ffcecef7", "error": "secondary VRG manifestwork deletion in progress"}

And 3 errors for the primary vrg:

2024-12-01T19:38:32.151Z ERROR controller/controller.go:316 Reconciler error {"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "0f352732-07e2-40d5-bd3f-a9a97181d94a", "error": "primary VRG manifestwork deletion in progress"}

Examples errors for single workload during e2e flow:

% grep -E 'ERROR.+subscr-deploy-rbd-busybox' hub.log | grep 'in progress'
2024-12-01T19:37:37.543Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "3041f0da-e705-450c-bdd1-e159ffcecef7", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:37.563Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "54d47c7a-8cf9-4728-aa47-2f00c1fb8397", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:37.577Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "48d03d0f-6659-4a02-81ca-0fc52b80d298", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:37.623Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "22188374-06f7-4b6c-96c9-f108acd86632", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:37.637Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "84a59ea7-62a2-40d5-98c5-e25e9d56279a", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:37.681Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "3cca19b4-0747-4544-a3bb-ea34c9b727e5", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:37.852Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "76b0b07b-69aa-40a9-a94c-f3acd574945f", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:38.184Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "5e8cb110-d849-4664-9a5e-b002ed75433f", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:38.837Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "c789aa29-75ad-48c5-9914-6ec6d5e93e5a", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:40.133Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "60124950-d22f-4078-a0e2-385b6697e8bb", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:42.712Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "ed43d9d3-25b0-4839-b6b9-773dbf0a3361", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:47.845Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "f77bcf91-6d06-44af-8677-84c65ffcab94", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:37:58.101Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "b6baa97d-a743-4eed-9a4b-4f9e44241011", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:38:02.134Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "951f3169-cc3d-4719-9194-902e8aaf10c4", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:38:18.591Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "d9fd5fdb-3bab-45c4-9b10-8df9421f9f6e", "error": "secondary VRG manifestwork deletion in progress"}
2024-12-01T19:38:32.151Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "0f352732-07e2-40d5-bd3f-a9a97181d94a", "error": "primary VRG manifestwork deletion in progress"}
2024-12-01T19:38:32.235Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "f331b6f2-7d28-45b0-b535-55f9c7de4c7d", "error": "primary VRG manifestwork deletion in progress"}
2024-12-01T19:38:32.531Z	ERROR	controller/controller.go:316	Reconciler error	{"controller": "drplacementcontrol", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRPlacementControl", "DRPlacementControl": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "namespace": "subscr-deploy-rbd-busybox", "name": "subscr-deploy-rbd-busybox", "reconcileID": "44c5037e-1924-4d32-b052-3f87fc333520", "error": "primary VRG manifestwork deletion in progress"}

internal/controller/drplacementcontrol_controller.go

nirs · 2024-12-02T00:39:29Z

Example deleting drpc progress logs

% grep -E 'Deleting DRPC in progress.+subscr-deploy-rbd-' hub.log 
2024-12-02T00:29:23.243Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "d944591e-fcf5-4882-8143-2e2a8f0351fa", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:23.358Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "364798af-15c0-4bba-829e-ae9928fe2383", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:23.542Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "54cce07c-3416-4935-b3e0-dc9b2015d1e1", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:23.641Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "c6854a8d-3668-4bf2-ae3f-0afdf04a444d", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:23.667Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "a822bf0a-d9fa-472e-b4df-54e0a69bf605", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:23.848Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "94d38462-f725-407b-8045-5943af05f423", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:24.026Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "52ab9cdc-a3e6-4cd6-b43b-1159e9280fcf", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:24.369Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "08283771-99bb-4a97-8aa4-136eab0edf1c", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:25.149Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "7d3bc103-246d-4d09-a98a-fc9bf77c35bf", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:26.560Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "f6bec8be-13c7-45f4-8d8b-59e626fc2631", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:29.341Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "a4345a4d-47c2-4321-a2c7-e722691109a1", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:34.480Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "3ab27ce7-fb93-47c3-a0c0-cb939447729f", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:44.736Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "bfb20d7c-2de2-4588-b425-7e0da9483d5f", "reason": "secondary VRG manifestwork deletion in progress"}
2024-12-02T00:29:47.470Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "2d33dca8-11e6-4979-adfe-8486247f7b36", "reason": "primary VRG manifestwork deletion in progress"}
2024-12-02T00:29:47.764Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "667ff490-205c-4f0f-aff3-6bfa8f2c18a1", "reason": "primary VRG manifestwork deletion in progress"}
2024-12-02T00:30:05.230Z	INFO	controllers.DRPlacementControl	controller/drplacementcontrol_controller.go:191	Deleting DRPC in progress	{"DRPC": {"name":"subscr-deploy-rbd-busybox","namespace":"subscr-deploy-rbd-busybox"}, "rid": "3353357f-3cf8-4a6f-bb16-8aa95f24aec1", "reason": "primary VRG manifestwork deletion in progress"}

internal/controller/drplacementcontrol_controller.go

BenamarMk · 2024-12-02T12:32:52Z

internal/controller/drplacementcontrol_controller.go

+// DeleteInProgress error is returned during DRPC deletion when we need to wait and try later.
+type deleteInProgress struct{ string }
+
+func (e deleteInProgress) Error() string { return e.string }


This approach works and looks elegant, but why introduce the complexity of defining a new type? Why not simply use errors.New() similar to these ? One might argue it’s for errors.Is() support, but is it necessary to check for this specific error type at line 189? Instead, consider returning a custom error directly. The reconciler will eventually retry, and the error will still be logged.

This approach works and looks elegant, but why introduce the complexity of defining a new type? Why not simply use errors.New() similar to these ?

Using a sentinel error works when there one special error like io.EOF. But we have multiple reasons to requeue the request during delete:

adopting a vrg

waiting until secondary is deleted

waiting until primary is deleted

vrg count > 0 (maybe not possible now)

Using a a type matching any error of the same type, we can return multiple errors have the same handling.

We already use this in

ramen/internal/controller/kubeobjects/requests.go

Line 141 in 6c96132

func RequestProcessingErrorCreate(s string) RequestProcessingError { return RequestProcessingError{s} }

I started with a local error in this package, but this can move to a shared package and used in all controllers.

One might argue it’s for errors.Is() support, but is it necessary to check for this specific error type at line 189?

Yes, see comment bellow.

Instead, consider returning a custom error directly. The reconciler will eventually retry, and the error will still be logged.

This is what the first commit is doing, using fmt.Errorf() - the result is having 18 errors with stacktrace for every disable dr. This is huge amount of noise and it is incorrect to log a normal condition as an error.

See the errors generated by the first commit - each one has a 10 lines stacktrace that I did not copy:
#1689 (comment)

This is huge amount of noise and it is incorrect to log a normal condition as an error

Since deleteInProgress implements the Error interface, it is inherently treated as an error object. This creates a bias in how it is perceived. While it may not represent an error condition for the reconcile server, within the context of Ramen, it is treated as one. Therefore, the classification as an error is intentional.

Using an error is just a limit of the current code, that have single error return value for the entire finalizeDRPC code path. We can change this but this is a big change that will be hard to backport, and I don't think it will be better to return multiple arguments like we have in other places. Using specific error type for the common case of "not ready yet" seems better.

This approach works and looks elegant, but why introduce the complexity of defining a new type?

Using existing type now.

BenamarMk · 2024-12-02T12:34:01Z

internal/controller/drplacementcontrol_controller.go

+				return ctrl.Result{Requeue: true}, nil
+			}
+
+			// Unexpected error.
 			return ctrl.Result{}, err


I mean just let this be returned and remove the block starting from line 189 to 195 inclusive.

The result will be ERROR log with a stacktrace - we see 18 of these for every delete operation. Since this is an expected condition, it is wrong to treat this as an error.

Since this is an expected condition, it is wrong to treat this as an error.

Same as my comment above

internal/controller/drplacementcontrol_controller.go

BenamarMk

I like the change.

BenamarMk · 2024-12-02T19:53:07Z

internal/controller/util/errors.go

+
+// IsOperationInProgress returns true if err or error wrapped by it is an OperationInProgress error.
+func IsOperationInProgress(err error) bool {
+	return errors.Is(err, OperationInProgress(""))


One question though, is this call going to propagate back to line 15? I doubt it. errors.Is(...) is a free function.

yes, see errors.Is() implementation:
https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/errors/wrap.go;l=58

I think it does here:

if x, ok := err.(interface{ Is(error) bool }); ok && x.Is(target) { return true }

All good

ShyamsundarR · 2024-12-03T14:54:42Z

internal/controller/drplacementcontrol_controller.go

 			}

+			// Is this an expected condition?
+			if rmnutil.IsOperationInProgress(err) {


Instead of passing back errors, which seems to be of limited use, and defining an error internally that needs to be honored in other error cases and also existing cases (which adds to code awareness across the functions when returning errors), I would instead just not return error and log it for troubleshooting use.

That way we are not defining an error that other parts of the code needs to return. Current code requests a reconcile using a requeue or an error, which for us is the same. So this change would impact future work.

When starting to write the code initially there was some confusion in this regard, and errors were returned. I suggest we do not return errors, but request a requeue via the result.

The current solution is a local change in the in some functions inside the flow, and a local change in Reconcile. This should be easy to backport (and we must backport it to 4.18 since this is a regression added recently).

If we don't use the error, we need to change the all the function signatures in delete flow. We can return requeue, error, but this does not allow code to specify a RequeAfter, and even if we don't have code that needs it right now in this flow, we may have code that needs it in other flows or in this flow in the future, so this signature is not future proof. So looks like the new signature should be cntl.Result, error.

But this does not match other controllers, and we probably want a consistent way to return results from controller functions. Do you suggest to refactor entire ramen to return cntl.Result, error?

For reference, searching for (ctrl.Result, error) show that only few functions use this return value.

Results (without Reconcile functions):

(r DRClusterReconciler) processCreateOrUpdate

(r DRClusterReconciler) processDeletion

(r *DRClusterConfigReconciler) processDeletion

(r *DRClusterConfigReconciler) processCreateOrUpdate

(r *DRPlacementControlReconciler) reconcileDRPCInstance

(r *DRPolicyReconciler) reconcile

Using this signature for deletion flow seems common.

This is a well-known and long-standing debate in software design. It's similar to classic discussions around handling conditions like a "file-not-found.", whether you throw an exception or not... Is it an error/exception or simply an expected state? Similarly, in the context of a reconciler, when attempting to fetch a resource, receiving a NotFound isn't necessarily a failure but rather an expected condition. But it is still treated as an error.

In our case, the processDeletion function returns an error. If it doesn't complete fully, there's clearly a reason for it. This reason holds valuable context that can help the caller logic make informed decisions. Simply logging it without passing it back discards this context.

By returning the error, we make the caller to decide whether to log, requeue, or take any other corrective action. I also believe that Result{} should not be propagated from function to function. Only the reconciler function should fill-in that structure with the information returned from the helper functions.

I am stating simply that we update result to requeue as true on errors and not propogate the Result to other functions. The reconciler is just a loop with an end to a call requesting a requeue or not, the error does not hold much value.

If we add reasons for errors and start making smart decisions we will need to use this pattern consistently, which is a burden, IMHO. If that will not be the case then in this narrow use-case we can do what is done here, but I worry that the pattern will be enforced in the future for no particular gain.

I am stating simply that we update result to requeue as true on errors and not propogate the Result to other functions.

So you suggest to change functions in processDeletion flow to behave like this?

func deletteExample(...) bool { // Expected to succeed during deletion if err := trySomething(...); err != nil { log.Error(err, "failed to ...") return false } // Expected to return false for a while during deletion if !readyForNextStep(...) { return false } // Completed return true }

So we log an INFO message in ensureDeletedVRGs if we the VRG was not deleted yet, and return false.

If we add reasons for errors and start making smart decisions we will need to use this pattern consistently.

This pattern of trying to do the next step in the way to the desired state is pretty common in ramen. It could be very useful if we use this pattern everywhere. It will make the code much more clear, and avoid all the bogus errors that we have today. This change make it easy to use this pattern everywhere gradually, without changing lot of code.

If that will not be the case then in this narrow use-case we can do what is done here, but I worry that the pattern will be enforced in the future for no particular gain.

How about using this pattern only in this controller for now? We need a way to fix the regression caused by adding the vrg on the secondary cluster. This should be backported to 4.18 soon before it breaks QE tests in the same way it breaks our tests.

I can also split the fix (the first commit) so we can merge it now. It will cause a lot of noise in the logs during deletion, but we can fix this in the next step when we converge on what is the right way to handle "not ready" state during reconcile.

nirs · 2024-12-09T16:55:56Z

@BenamarMk @ShyamsundarR I splited the fix to #1709, so can fix the failing builds quickly, and have more time to think about the best way to handle operation in progress.

This error is used to signal that an operation is in progress, and the caller should log the error in normal log level and try again later. This is a common concept for reconciling, and we can sue the same error in other packages. Changes: - Rename the error to OperationInProgress - Simplify to a string instead of embedding a string - update the tests to reflect the common usage with errors.Is(). Signed-off-by: Nir Soffer <[email protected]>

We want to use it more in ramen, to remove bugus errors for logs. To encourage use this this pattern and make it easier to use correctly, add util.IsOperationInProgress() helper. Improve the test to use the new IsOperationInProgress() and test also wrapped OperationInProgress error. Signed-off-by: Nir Soffer <[email protected]>

When deleting the DRPC we may need to adopt the VRG, delete the secondary VRG, wait until the secondary VRG is deleted, delete the primary VRG, and wait until the primary VRG is deleted. This takes 60-90 seconds and many reconciles (18 seen in e2e test), and creates huge amount of noise in the log. Suppress the noise using util.OperationInProgress error. When the reconcile is successful but it is still in progress, we return a util.OperationInProgress error describing the current progression. The top level error handler logs an INFO message and requeue the request. With this change we will see multiple logs for the secondary VRG: INFO Deleting DRPC in progress {"reason", "secondary VRG deletion in progress"} ... And finally more logs for the primary VRG: INFO Deleting DRPC in progress {"reason", "primary VRG deletion in progress"} ... Notes: - We logged errors during finalizeDRPC twice; once as INFO log, and once as ERROR with a stacktrace when we return error from the reconcile. Remove the duplicate INFO log. - The linter is not happy about the new nested if. We can avoid this by extracting a helper to handle finalize errors, but I want to keep the change minimal for easy backport. We can improve this later upstream. Signed-off-by: Nir Soffer <[email protected]>

nirs requested a review from BenamarMk December 1, 2024 18:26

nirs force-pushed the vrg-delete-order branch 2 times, most recently from efa29e0 to a684fe8 Compare December 1, 2024 18:34

nirs requested a review from Madhu-1 December 1, 2024 20:03

nirs force-pushed the vrg-delete-order branch 2 times, most recently from 82b60e2 to ffc3b3b Compare December 1, 2024 22:57

nirs commented Dec 1, 2024

View reviewed changes

internal/controller/drplacementcontrol_controller.go Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

nirs force-pushed the vrg-delete-order branch from ffc3b3b to 4bfc77f Compare December 2, 2024 00:19

This comment was marked as outdated.

Sign in to view

nirs force-pushed the vrg-delete-order branch from 4bfc77f to 9487e4e Compare December 2, 2024 01:17

BenamarMk reviewed Dec 2, 2024

View reviewed changes

nirs force-pushed the vrg-delete-order branch from 9487e4e to 4ffd7ee Compare December 2, 2024 14:34

nirs marked this pull request as ready for review December 2, 2024 15:24

nirs mentioned this pull request Dec 2, 2024

Support workdir venv #1548

Merged

nirs force-pushed the vrg-delete-order branch from 4ffd7ee to 299ec4f Compare December 2, 2024 18:38

nirs requested review from BenamarMk, raghavendra-talur and ShyamsundarR December 2, 2024 18:40

nirs force-pushed the vrg-delete-order branch from 299ec4f to e66f588 Compare December 2, 2024 19:31

nirs requested a review from asn1809 December 2, 2024 19:33

BenamarMk reviewed Dec 2, 2024

View reviewed changes

BenamarMk approved these changes Dec 2, 2024

View reviewed changes

BenamarMk reviewed Dec 2, 2024

View reviewed changes

ShyamsundarR reviewed Dec 3, 2024

View reviewed changes

nirs marked this pull request as draft December 9, 2024 16:50

nirs changed the title ~~Delete secondary VRG before primary~~ Silence "deletion in progress" errors when deleting VRGs Dec 9, 2024

nirs force-pushed the vrg-delete-order branch from e66f588 to 2d6d2cc Compare December 9, 2024 16:52

nirs mentioned this pull request Dec 9, 2024

Delete secondary VRG before primary #1709

Merged

nirs added 3 commits December 12, 2024 13:17

nirs force-pushed the vrg-delete-order branch from 2d6d2cc to ed17713 Compare December 12, 2024 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silence "deletion in progress" errors when deleting VRGs #1689

Silence "deletion in progress" errors when deleting VRGs #1689

nirs commented Dec 1, 2024 •

edited

Loading

nirs commented Dec 1, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

nirs commented Dec 2, 2024

BenamarMk Dec 2, 2024

nirs Dec 2, 2024

nirs Dec 2, 2024

BenamarMk Dec 2, 2024

nirs Dec 2, 2024

nirs Dec 2, 2024

BenamarMk Dec 2, 2024

nirs Dec 2, 2024

BenamarMk Dec 2, 2024

BenamarMk left a comment

BenamarMk Dec 2, 2024 •

edited

Loading

nirs Dec 2, 2024

BenamarMk Dec 2, 2024

ShyamsundarR Dec 3, 2024

nirs Dec 3, 2024

nirs Dec 3, 2024

BenamarMk Dec 3, 2024

ShyamsundarR Dec 4, 2024

nirs Dec 5, 2024

nirs commented Dec 9, 2024

Silence "deletion in progress" errors when deleting VRGs #1689

Are you sure you want to change the base?

Silence "deletion in progress" errors when deleting VRGs #1689

Conversation

nirs commented Dec 1, 2024 • edited Loading

nirs commented Dec 1, 2024 • edited Loading

Bogus errors in delete flow

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

nirs commented Dec 2, 2024

Example deleting drpc progress logs

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenamarMk left a comment

Choose a reason for hiding this comment

BenamarMk Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirs commented Dec 9, 2024

nirs commented Dec 1, 2024 •

edited

Loading

nirs commented Dec 1, 2024 •

edited

Loading

BenamarMk Dec 2, 2024 •

edited

Loading