-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate retrying of failed nodes by default. #13692
Comments
A little confused by the title vs description here -- do you mean all failed nodes or only ones using the aforementioned features? The "all" scenario raises the question of "then what should the default behavior of retry be?"
This is already possible with |
The current retry code attempts to retry all failed nodes. |
If that's the main thing you were referencing, this sounds like a duplicate of #12543. You actually reviewed the PR for it there back in Feb when I asked someone with more knowledge of the retry logic to take a look, particularly because of potential breakage |
Ah yes I remember this issue now, I don't think the existing behaviour is a bug by looking at the code anyway. I rewrote the retry logic from scratch (because it had the wrong approach) and kept this behaviour. I think regardless of whether its a bug or not doesn't matter, this retry behaviour has existed since 2017-2018 and effectively is a feature now. I suggest we keep the existing beahviour in $NEW_VERSION and deprecate it, then remove it entirely in $NEW_VERSION + 1. |
The author illuminated this to me in #12553 (comment) too and asked about a refactor in #12553 (comment). Well that and the other bugs with it made me take a look and get really confused by the approach too
It seemed like I agree with other folks that the current behavior feels pretty confusing though.
Yea that's more or less what I suggested in #12553 (comment) but minus a deprecation; i.e. a breaking fix in the same spirit as #11005 |
I would say we should close this out as duplicate of #12543 and make a breaking fix to |
I disagree with a breaking fix here, we should do a deprecation. I am sure there are some users that have now grown used to the retry button as a way to restart all failed nodes. |
To be clear, the breaking fix I'm advocating for would only impact |
Summary
Currently a user might have a
continueOn: failed
ordepends: $TASKNAME.Failed
and they might consider this node to have succeeded from their point of view (In the sense that the failure is no big deal or that it was expected).Currently the retry logic attempts to reset these nodes, which is an introduction of policy that is forced upon the user.
We should in the very least be able to opt out of this behaviour.
Use Cases
This allows one to precisely retry a single node without worrying about a failed node also retrying.
The text was updated successfully, but these errors were encountered: