Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline runner improvements #258

Merged
merged 4 commits into from
May 24, 2024
Merged

Pipeline runner improvements #258

merged 4 commits into from
May 24, 2024

Conversation

ljgray
Copy link
Contributor

@ljgray ljgray commented Dec 21, 2023

This is a pretty involved PR, so it may take some time to review.

Features

This PR addresses most of the issues in #181 and #152. In particular:

  • The pipeline runner uses a priority system to choose the next task to run, based on
    • Number of times a task can run and its net consumption per iteration
    • Base priority as set by the user
    • Pipeline config order
  • A task which is a pure producer (i.e. has no input keys) will never be available, meaning that these will only run when nothing else is able to do anything
  • A task will not be able to run next if setup did not run successfully
  • Separate the selection of task to run next from the main run loop, which makes it easier to implement different task selection methods
  • Include a legacy option which will mimic the current pipeline behaviour
  • Add two extra task config features:
    • limit_outputs: restrict the number of times a task can produce output in the next state. Once this limit is reached, the task will immediately advance its state.
    • base_priority: priority modifier which is added to the calculated priority. Can be negative.

Two other features are added which are helpful for memory analysis:

  1. A function to find the total memory footprint of an object, including any objects it references.
  2. A monitoring function in PSUtilProfiler which checks memory usage every 0.5 seconds in an attempt to log peak memory used by a task

Refactoring

  • Additional refactoring of parts of pipeline.Manager class related to initial task loading and pipeline creation. This does not change any behaviour, but slightly improves the distribution of responsibility between methods
  • Update to use f-strings

Remaining:

  • Input broadcasting. This is tricky as the task would likely have to know how many total inputs to expect for each queue (i.e. how many total combinations of inputs), so it would have to wait for all inputs to accumulate instead of clearing them as soon as possible

Note: this re-implements the multi-output functionality of #263, but that has some extra tests

@ljgray ljgray force-pushed the pipeline-run-order branch 3 times, most recently from 41508a1 to 42e8cfb Compare December 21, 2023 22:51
@ljgray ljgray force-pushed the pipeline-run-order branch 10 times, most recently from 48f50cd to bbb48dc Compare January 10, 2024 19:34
@ljgray ljgray force-pushed the pipeline-run-order branch 12 times, most recently from 2a822b3 to c6bdf71 Compare January 15, 2024 23:20
@ljgray ljgray changed the title Pipeline run order Pipeline runner priority updates Jan 15, 2024
@ljgray ljgray force-pushed the pipeline-run-order branch from c6bdf71 to 2f7ef4e Compare January 16, 2024 00:14
@ljgray
Copy link
Contributor Author

ljgray commented Jan 16, 2024

I think this is ready for a review and some thoughts on the priority implementation. I've been testing the daily pipeline config with good results, but I'm still trying to think of edge cases where this might fail/behave worse than the old implementation. Input is welcome.

The memory tools seem to be working as expected.

@ljgray ljgray marked this pull request as ready for review January 16, 2024 00:53
@ljgray ljgray requested review from jrs65 and rikvl January 16, 2024 00:53
@ljgray ljgray force-pushed the pipeline-run-order branch from 2f7ef4e to c33948f Compare February 13, 2024 18:01
@ljgray ljgray force-pushed the pipeline-run-order branch from c33948f to b32ccff Compare May 1, 2024 23:19
@ljgray ljgray requested a review from ketiltrout May 1, 2024 23:33
@ljgray ljgray changed the title Pipeline runner priority updates Pipeline runner improvements May 1, 2024
@ljgray ljgray force-pushed the pipeline-run-order branch from 90d5f95 to 81b7f74 Compare May 1, 2024 23:44
@ljgray ljgray force-pushed the pipeline-run-order branch from 81b7f74 to 361800f Compare May 10, 2024 16:33
Copy link
Member

@ketiltrout ketiltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any real show-stoppers here. I think the new priority system is well designed: it's both simple (to explain/implement) but flexible enough to provide users the necessary tools to adjust things when necessary.

Some extra documentation on how it works may help.

You're right that there may be some edge cases that come up in production, but I don't see anything so obvious just looking at the PR.

The legacy ordering at least means the previous behaviour can be returned to until things are fixed, if everything goes pear-shaped. And I don't think we're locking ourselves into anything that's going to get us into trouble and we couldn't fix if needed.

I have left some comments which I think are mostly minor in scale.

caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
@ljgray ljgray force-pushed the pipeline-run-order branch from 361800f to 6c2a65e Compare May 13, 2024 19:21
@ljgray ljgray requested a review from ketiltrout May 13, 2024 19:21
@ljgray ljgray force-pushed the pipeline-run-order branch 4 times, most recently from c55d7f5 to f1afa3d Compare May 16, 2024 20:03
@ljgray ljgray force-pushed the pipeline-run-order branch 2 times, most recently from 4a738f9 to 6e7f028 Compare May 24, 2024 02:40
Copy link
Member

@ketiltrout ketiltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable. It's difficult, with static analysis, to anticipate what sort of corner cases (if any) we might get ourselves into with this change, but I think this is implemented well enough that's it's a good place to start.

I found a couple of typos:

caput/pipeline.py Outdated Show resolved Hide resolved
caput/pipeline.py Outdated Show resolved Hide resolved
@ljgray ljgray force-pushed the pipeline-run-order branch 3 times, most recently from 6128269 to 41e4a13 Compare May 24, 2024 22:20
@ljgray ljgray force-pushed the pipeline-run-order branch from 41e4a13 to 38a9a2c Compare May 24, 2024 22:22
@ljgray ljgray merged commit fe5b8bb into master May 24, 2024
5 checks passed
@ljgray ljgray deleted the pipeline-run-order branch May 24, 2024 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants