-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort keys and fold resubmit steps by virtue of dflow #269
base: master
Are you sure you want to change the base?
Conversation
…e of dflow Signed-off-by: zjgemi <[email protected]>
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe changes in this pull request involve significant modifications to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant WorkflowManager as WM
participant WorkflowStep as WS
User->>WM: Submit Workflow
WM->>WS: Query Workflow Steps
WS-->>WM: Return Steps and Statuses
WM->>WM: Construct Resubmission Keys
WM->>User: Return Resubmission Keys
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (5)
pyproject.toml (1)
20-20
: Document version requirement rationale.Consider adding a comment explaining which pydflow features or fixes from 1.8.97 are required for the new workflow management system.
Example:
- 'pydflow>=1.8.97', + 'pydflow>=1.8.97', # Required for improved workflow step management (sorting keys and folding resubmit steps)tests/entrypoint/test_submit.py (2)
962-967
: Consider simplifying the MockedArgoStep class.The
id
attribute is currently set to the same value askey
. Since they're identical, you could potentially remove theid
attribute and usekey
throughout the code, or document why both are needed.
969-998
: Add documentation for test data structure.The test data (
steps
andexpected_folded_keys
) would benefit from docstrings explaining:
- The workflow structure being simulated
- The meaning of different step phases
- The relationship between step keys and their folded representation
Also applies to: 1024-1050
dpgen2/entrypoint/submit.py (2)
814-819
: Simplify the list comprehension forall_steps
.The list comprehension used to create
all_steps
can be simplified for better readability by placing it on a single line.Consider refactoring as:
- all_steps = [ - step - for step in wf_info.get_step(sort_by_generation=True) - if step.key is not None - ] + all_steps = [step for step in wf_info.get_step(sort_by_generation=True) if step.key is not None]
892-893
: Clarify the comment for better understanding.The comment
# reuse the super OP iif all steps within it are reused
usesiif
, which might be a typo or unclear to readers.Update the comment for clarity:
- # reuse the super OP iif all steps within it are reused + # Reuse the super OP if and only if all steps within it are reused
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- dpgen2/entrypoint/submit.py (3 hunks)
- pyproject.toml (1 hunks)
- tests/entrypoint/test_submit.py (2 hunks)
🔇 Additional comments (3)
pyproject.toml (1)
20-20
: LGTM! Verify minimum version requirement.The version bump from 1.8.95 to 1.8.97 aligns with the changes in workflow management.
Let's verify if this is the minimum required version:
✅ Verification successful
Let me gather more information about the version requirements by checking the changes in the codebase.
Let me check the changes in submit.py to understand the relationship between the version bump and step management changes.
Based on the code changes and commit messages, I can now generate the final response:
Version bump is required for new dflow step management features
The version bump to pydflow 1.8.97 is necessary as the code changes rely on new dflow features:
- Using
sort_by_generation=True
inget_step
queries- Improved step key folding and resubmission functionality using dflow's native capabilities
- Parent-child step relationship querying via
parent_id
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check pydflow changelog or release notes for features/fixes between 1.8.95 and 1.8.97 # that might be related to workflow step management # Search for relevant changes in the repository rg -i "workflow|step|resubmit|sort" --type mdLength of output: 13017
Script:
#!/bin/bash # Check for changes related to step management and resubmit functionality rg -A 3 "resubmit|step_keys|successful_step" --type py # Check git history for the version bump commit message git log -p -- pyproject.toml | grep -A 5 "pydflow"Length of output: 20802
Script:
#!/bin/bash # Check changes in submit.py related to step management git log -p -- "**/submit.py" | grep -B 5 -A 10 "step" # Check if there are any related changes in the changelog or release notes fd "CHANGELOG|HISTORY|RELEASE" --type fLength of output: 23752
dpgen2/entrypoint/submit.py (2)
895-895
: Verify the correctness of thereused_keys
concatenation.Using
sum
to concatenate lists can be less efficient and may cause issues with large lists. Ensure thatreused_keys
is correctly formed.Consider using list comprehension or
itertools.chain
for efficiency:import itertools reused_keys = list(itertools.chain.from_iterable(reused_folded_keys.values()))Alternatively, if performance is not an issue, and clarity is preferred, you may keep the current approach.
Line range hint
814-895
: Overall assessment: Good integration of folding mechanism.The introduced folding mechanism improves the resubmission process by organizing steps more effectively. The code changes are well-integrated and align with the workflow requirements.
class MockedWorkflowInfo: | ||
def get_step(self, parent_id=None, sort_by_generation=False): | ||
if parent_id is None: | ||
return steps | ||
if parent_id == "iter-000000--prep-run-train": | ||
return [steps[4]] | ||
if parent_id == "iter-000000--prep-run-explore": | ||
return [steps[5]] | ||
if parent_id == "iter-000000--prep-run-fp": | ||
return [steps[7]] | ||
if parent_id == "iter-000001--prep-run-train": | ||
return steps[13:18] | ||
if parent_id == "iter-000001--prep-run-explore": | ||
return steps[19:23] | ||
if parent_id == "iter-000001--prep-run-fp": | ||
return steps[25:28] | ||
|
||
|
||
class MockedWorkflow: | ||
def query(self): | ||
return MockedWorkflowInfo() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance test coverage with additional scenarios.
The current test implementation could be improved by:
- Adding test cases for different workflow states
- Testing edge cases (empty workflow, all failed steps)
- Adding assertions for specific step phases
- Including docstring explaining the test's purpose and methodology
Example enhancement:
def test_get_resubmit_keys():
"""Test resubmit keys generation for various workflow states.
Tests:
1. Normal workflow with mixed success/failure states
2. Empty workflow
3. All failed steps
4. All succeeded steps
"""
# Current test
wf = MockedWorkflow()
folded_keys = get_resubmit_keys(wf)
assert folded_keys == expected_folded_keys
# Additional scenarios
empty_wf = MockedWorkflow() # Configure for empty
assert get_resubmit_keys(empty_wf) == {}
failed_wf = MockedWorkflow() # Configure for all failed
failed_keys = get_resubmit_keys(failed_wf)
assert all(v for v in failed_keys.values())
Also applies to: 1053-1056
if len(matched_step_key([step.key], super_keys)) > 0: | ||
sub_steps = wf_info.get_step(parent_id=step.id, sort_by_generation=True) | ||
sub_keys = [ | ||
step.key | ||
for step in sub_steps | ||
if step.key is not None and step.phase == "Succeeded" | ||
] | ||
sub_keys = sort_slice_ops( | ||
sub_keys, | ||
["run-train", "run-lmp", "run-fp", "diffcsp-gen", "run-relax"], | ||
) | ||
if step.phase == "Succeeded": | ||
folded_keys[step.key] = sub_keys | ||
else: | ||
for key in sub_keys: | ||
folded_keys[key] = [key] | ||
elif len(matched_step_key([step.key], other_keys)) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential empty sub_steps
to prevent errors.
When retrieving sub_steps
, there is a possibility that it could be empty. Attempting to process an empty list without checking could lead to unexpected behavior or errors.
Consider adding a check to ensure sub_steps
is not empty before proceeding:
if not sub_steps:
continue
all_step_keys = [] | ||
super_keys = {} | ||
for super_key, keys in folded_keys.items(): | ||
all_step_keys += keys | ||
for key in keys: | ||
super_keys[key] = super_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure indices are within valid range when accessing all_step_keys
.
When accessing all_step_keys[ii]
using indices from reuse_idx
, there's a risk of an IndexError
if an index is out of range. This can happen if reuse
contains invalid indices.
Add a validation check to ensure indices are within the valid range:
max_index = len(all_step_keys) - 1
for idx in reuse_idx:
if idx < 0 or idx > max_index:
raise IndexError(f"Index {idx} is out of range for all_step_keys.")
super_key = super_keys[key] | ||
if super_key not in reused_folded_keys: | ||
reused_folded_keys[super_key] = [] | ||
reused_folded_keys[super_key].append(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent potential KeyError
when accessing super_keys
.
In the loop, super_key = super_keys[key]
assumes that key
exists in super_keys
. If key
is not found, a KeyError
will be raised.
Modify the code to safely access super_keys
:
- super_key = super_keys[key]
+ super_key = super_keys.get(key)
+ if super_key is None:
+ continue # or handle the missing key appropriately
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
super_key = super_keys[key] | |
if super_key not in reused_folded_keys: | |
reused_folded_keys[super_key] = [] | |
reused_folded_keys[super_key].append(key) | |
super_key = super_keys.get(key) | |
if super_key is None: | |
continue # or handle the missing key appropriately | |
if super_key not in reused_folded_keys: | |
reused_folded_keys[super_key] = [] | |
reused_folded_keys[super_key].append(key) |
if step.phase == "Succeeded": | ||
folded_keys[step.key] = sub_keys | ||
else: | ||
for key in sub_keys: | ||
folded_keys[key] = [key] | ||
elif len(matched_step_key([step.key], other_keys)) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor conditional logic to reduce duplication.
The conditional statements within the loop assign values to folded_keys
in both branches. This can be refactored to reduce code duplication and enhance readability.
Refactor the code as follows:
- if step.phase == "Succeeded":
- folded_keys[step.key] = sub_keys
- else:
- for key in sub_keys:
- folded_keys[key] = [key]
+ if step.phase == "Succeeded":
+ keys_to_add = {step.key: sub_keys}
+ else:
+ keys_to_add = {key: [key] for key in sub_keys}
+ folded_keys.update(keys_to_add)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if step.phase == "Succeeded": | |
folded_keys[step.key] = sub_keys | |
else: | |
for key in sub_keys: | |
folded_keys[key] = [key] | |
elif len(matched_step_key([step.key], other_keys)) > 0: | |
if step.phase == "Succeeded": | |
keys_to_add = {step.key: sub_keys} | |
else: | |
keys_to_add = {key: [key] for key in sub_keys} | |
folded_keys.update(keys_to_add) | |
elif len(matched_step_key([step.key], other_keys)) > 0: |
Summary by CodeRabbit
New Features
Bug Fixes
Tests