Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support afterexc dependency scheme #6566

Merged
merged 6 commits into from
Jan 21, 2025

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Jan 21, 2025

This PR adds support for an afterexc dependency scheme as suggested in #6564.
This is similar to afternotok, but the dependency is only satisfied if a fatal job exception was the event the caused the job to transition to the CLENAUP state.

Problem: The job manager keeps a copy of the event that caused a
job to transition to CLEANUP, but this is not shared with jobtap
plugins.

Add job->end_event, if set, to the jobtap_call() plugin args.
@grondo grondo force-pushed the dependency-afterexc branch from bc210ea to 4758c93 Compare January 21, 2025 00:26
@garlick
Copy link
Member

garlick commented Jan 21, 2025

Heh, I read afterexc as afterexec at first. Would afterexception or afterexcept be slightly less easy to misread?

Or I could clean my glasses :-)

Problem: The availability of `end_event` in jobtap args is not
documented.

Document it in `flux-jobtap-plugins(7)`.
Problem: For an unknown reason, if FLUX_JOBTAP_CURRENT_JOB is passed to
jobtap_lookup_jobid(), or the id argument matches the current jobid,
errno is set to EINVAL before returning the current job. Possibly
this is just a cut-and-paste error from the original implementation
of the function, since it doesn't make sense.

Do not set errno in this case of successful return from
jobtap_lookup_jobid().
@grondo grondo force-pushed the dependency-afterexc branch from 4758c93 to 2b8d848 Compare January 21, 2025 15:00
@grondo
Copy link
Contributor Author

grondo commented Jan 21, 2025

Ok, switched to afterexcept (forcing users to type afterexception seemed just mean).

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Did a quick manual test too, and it worked as advertised.

Just one nit.

* was terminated by a fatal exception:
*/
rc = streq (name, "exception");
fprintf (stderr, "end_event=%s, rc=%d\n", name, rc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sheeesh, good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that leftover debug line and forced a push. Thanks! Will set MWP.

Problem: The dependency-after plugin supports `afternotok`, which runs
a job after the antecedent fails for any reason. However, in many cases
it may be more practical to only run a job after a job fails with a
fatal job exception, such as a node failure or timeout condition.

Add support for the `afterexcept` dependency scheme, which is only
satisfied when the antecedent jobs fails with a fatal exception.
Problem: There are no test of the dependency `afterexcept` scheme.

Add some tests to t2271-job-dependency-after.t.
Problem: The `afterexcept` dependency scheme is undocumented.

Add it to common/job-dependencies.rst.
@grondo grondo force-pushed the dependency-afterexc branch from 2b8d848 to e9178b1 Compare January 21, 2025 15:11
@mergify mergify bot merged commit 482028e into flux-framework:master Jan 21, 2025
35 checks passed
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 3 lines in your changes missing coverage. Please review.

Project coverage is 79.45%. Comparing base (8a18be7) to head (e9178b1).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/job-manager/plugins/dependency-after.c 93.10% 2 Missing ⚠️
src/modules/job-manager/jobtap.c 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6566   +/-   ##
=======================================
  Coverage   79.44%   79.45%           
=======================================
  Files         531      531           
  Lines       88254    88280   +26     
=======================================
+ Hits        70112    70141   +29     
+ Misses      18142    18139    -3     
Files with missing lines Coverage Δ
src/modules/job-manager/jobtap.c 84.64% <75.00%> (-0.07%) ⬇️
src/modules/job-manager/plugins/dependency-after.c 83.05% <93.10%> (+0.82%) ⬆️

... and 5 files with indirect coverage changes

@grondo grondo deleted the dependency-afterexc branch January 21, 2025 16:22
@grondo
Copy link
Contributor Author

grondo commented Jan 21, 2025

Darn. Forgot to update the PR title. Will try to remember to at least get it right in the release notes. 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants