-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support afterexc
dependency scheme
#6566
Conversation
Problem: The job manager keeps a copy of the event that caused a job to transition to CLEANUP, but this is not shared with jobtap plugins. Add job->end_event, if set, to the jobtap_call() plugin args.
bc210ea
to
4758c93
Compare
Heh, I read Or I could clean my glasses :-) |
Problem: The availability of `end_event` in jobtap args is not documented. Document it in `flux-jobtap-plugins(7)`.
Problem: For an unknown reason, if FLUX_JOBTAP_CURRENT_JOB is passed to jobtap_lookup_jobid(), or the id argument matches the current jobid, errno is set to EINVAL before returning the current job. Possibly this is just a cut-and-paste error from the original implementation of the function, since it doesn't make sense. Do not set errno in this case of successful return from jobtap_lookup_jobid().
4758c93
to
2b8d848
Compare
Ok, switched to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Did a quick manual test too, and it worked as advertised.
Just one nit.
* was terminated by a fatal exception: | ||
*/ | ||
rc = streq (name, "exception"); | ||
fprintf (stderr, "end_event=%s, rc=%d\n", name, rc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sheeesh, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed that leftover debug line and forced a push. Thanks! Will set MWP.
Problem: The dependency-after plugin supports `afternotok`, which runs a job after the antecedent fails for any reason. However, in many cases it may be more practical to only run a job after a job fails with a fatal job exception, such as a node failure or timeout condition. Add support for the `afterexcept` dependency scheme, which is only satisfied when the antecedent jobs fails with a fatal exception.
Problem: There are no test of the dependency `afterexcept` scheme. Add some tests to t2271-job-dependency-after.t.
Problem: The `afterexcept` dependency scheme is undocumented. Add it to common/job-dependencies.rst.
2b8d848
to
e9178b1
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6566 +/- ##
=======================================
Coverage 79.44% 79.45%
=======================================
Files 531 531
Lines 88254 88280 +26
=======================================
+ Hits 70112 70141 +29
+ Misses 18142 18139 -3
|
Darn. Forgot to update the PR title. Will try to remember to at least get it right in the release notes. 🤦 |
This PR adds support for an
afterexc
dependency scheme as suggested in #6564.This is similar to
afternotok
, but the dependency is only satisfied if a fatal job exception was the event the caused the job to transition to the CLENAUP state.