Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Pydantic I/O types in workflow context #1189

Closed

Conversation

alicederyn
Copy link
Collaborator

@alicederyn alicederyn commented Sep 4, 2024

Pull Request Checklist

Description of PR
Currently, @script-decorated functions that use Pydantic I/O types must still use pre-Pydantic syntax when creating workflows. This makes it harder to migrate to the new dag and steps decorators, as more code must be migrated in a single step before the YAML can be regenerated and tested.

This PR allows Pydantic input types to be passed in when in an active context, and will then configure the returned subnode to mimic the Pydantic output type. Tasks using fields from the returned object will automatically have a dependency added, if no dependency is explicitly specified.

@alicederyn alicederyn added type:enhancement A general enhancement semver:minor A change requiring a minor version bump labels Sep 4, 2024
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 93.84615% with 4 lines in your changes missing coverage. Please review.

Project coverage is 82.0%. Comparing base (d553546) to head (4d247fd).
Report is 136 commits behind head on main.

Files with missing lines Patch % Lines
src/hera/workflows/_meta_mixins.py 81.8% 1 Missing and 1 partial ⚠️
src/hera/workflows/script.py 91.6% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1189     +/-   ##
=======================================
+ Coverage   81.7%   82.0%   +0.2%     
=======================================
  Files         54      61      +7     
  Lines       4208    4778    +570     
  Branches     889    1021    +132     
=======================================
+ Hits        3439    3919    +480     
- Misses       574     625     +51     
- Partials     195     234     +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -539,8 +545,7 @@ def _create_subnode(
assert _context.pieces

template_ref = None
_context.declaring = False
if _context.pieces[0] != self and isinstance(self, WorkflowTemplate):
if _context.pieces[0] is not self and isinstance(self, WorkflowTemplate):
Copy link
Collaborator Author

@alicederyn alicederyn Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Pydantic's != implementation raises an "incorrect type" validation error here due to having been returned a string from TemplateInvocatorSubNodeMixin's modified __getattribute__ implementation. We could temporarily disable the active context instead, but switching to is not is simpler (and faster). As a side-benefit, we can remove the temporary assignment of False to _context.declaring, which was previously working around this issue.

@alicederyn alicederyn force-pushed the use.pydantic.types.in.context branch 2 times, most recently from c6ea26f to a6eab0f Compare September 4, 2024 23:06
@@ -760,6 +755,27 @@ def script_wrapper(func: Callable[FuncIns, FuncR]) -> Callable:
def task_wrapper(*args, **kwargs) -> Union[FuncR, Step, Task, None]:
"""Invokes a `Script` object's `__call__` method using the given SubNode (Step or Task) args/kwargs."""
if _context.active:
if len(args) == 1 and isinstance(args[0], (InputV1, InputV2)):
Copy link
Collaborator Author

@alicederyn alicederyn Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this new behaviour only affects calls made in an active context to a @script-decorated function when passing a Pydantic I/O object by argument, which previously would not have functioned meaningfully (the object was ignored by s.__call__).

src/hera/workflows/script.py Show resolved Hide resolved
src/hera/workflows/script.py Show resolved Hide resolved
Comment on lines -44 to +80
if _context.declaring:
if _context.active:
# Intercept the declaration to avoid validation on the templated strings
# We must then turn off declaring mode to be able to "construct" an instance
# We must then disable the active context to be able to "construct" an instance
# of the InputMixin subclass.
_context.declaring = False
instance = cls.construct(**kwargs)
_context.declaring = True
return instance
with no_active_context():
return cls.construct(**kwargs)
Copy link
Collaborator

@jeongukjae jeongukjae Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand hera's context management 100% (especially active & declaring), so deferring mixins' review to others.
For me, it seems refine the context state checking, and looks fine.

Comment on lines 375 to 380
# Add dependencies based on context if not explicitly provided
current_task_depends = _context.pieces[-1]._current_task_depends
if current_task_depends and "depends" not in kwargs:
kwargs["depends"] = " && ".join(sorted(current_task_depends))
current_task_depends.clear()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure.
But this seems more discussions are needed.

e.g. in this pr's example, following code works as expected

        cut_task = cut(CutInput(strings=["hello", "world", "it's", "hera"], cut_after=1))
        join_task = join(JoinInput(strings=cut_task.first_strings, joiner=" "))

        cut_task >> join_task

This code can possibly restrict the construction of DAGs.
for example, let's think about following example

A = task_a_fn(SomePydanticClass1(...))
B = task_b_fn(SomePydanticClass2(arg_1=A.return_1, ...)) # B depends on A's output
C = task_c_fn(SomePydanticClass3(...)) # C doesn't depends on A's output.

# But user want to construct DAG as followings.
A >> [B, C]

In above case, hera will raise an error. (e.g. already in some task's depends)

elif self.name in other._get_dependency_tasks():
raise ValueError(f"{self.name} already in {other.name}'s depends: {other.depends}")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example, you could just write A >> C instead of A >> [B, C]

However, I agree that this is an existing limitation of the new Pydantic I/O approach which will need solving separately. My preference would be for user-provided dependency edges to not raise duplication errors (like in your example — it should be fine to be explicit), and to override automatic ones (e.g. if you do A.on_success(B), that should overwrite the existing dependency between A and B). In the meantime, users can:

  • use the old A.get_parameter API to avoid adding dependency edges
  • pass in requires="" to disable the automatic generation of requires from dependency edges
  • do B.requires="" later

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, but for me it looks better to have same/expected behavior even with Pydantic I/O.
User might expect to be able to write A >> [B, C], but this raises an error.
I assume (checked very limited cases) other nodes are also doesn't generate depends attribute based on the returns/arguments.

How about separating this feature to another PR, and support this for regardless of the pydantic io?
Maybe @elliotgunton can give more opinions?

Copy link
Collaborator

@elliotgunton elliotgunton Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User might expect to be able to write A >> [B, C]

I agree - it would be good to write all my dependencies in one place (and as a bonus: in one statement) rather than spread out

user-provided dependency edges to not raise duplication errors

Yes I think it might make sense (not sure about precedent/existing patterns) to make dependency-setting idempotent, and then use the last call (ignoring whether it was automatic or not):

A.on_success(B)
A.on_failure(B)

Will end up only with B's depends: A.Failed.

@alicederyn alicederyn force-pushed the use.pydantic.types.in.context branch 3 times, most recently from 9caeb6d to e22dd8d Compare September 5, 2024 11:41
@@ -372,6 +372,12 @@ def __call__(self, *args, **kwargs) -> Union[None, Step, Task]:
return Step(template=self, **kwargs)

if isinstance(_context.pieces[-1], DAG):
# Add dependencies based on context if not explicitly provided
Copy link
Collaborator Author

@alicederyn alicederyn Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Note this is different from the implementation in the experimental decorator code, which does not allow a depends to be passed at all:
    depends=" && ".join(sorted(_context.pieces[-1]._current_task_depends)) or None,
    **kwargs,

    This is to allow more gradual migration to the new syntax. Would it be reasonable for the existing code to change to match?
  • I think this logic should probably be made common -- perhaps moved into DAG?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable for the existing code to change to match?

Yes that would be good! 🙏

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

current_task_depends = _context.pieces[-1]._current_task_depends
if current_task_depends and "depends" not in kwargs:
kwargs["depends"] = " && ".join(sorted(current_task_depends))
current_task_depends.clear()
Copy link
Collaborator Author

@alicederyn alicederyn Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing issue: if the user creates a Task explicitly rather than via a decorated function, the dependencies list won't be cleared. Perhaps this should be moved to the point where a Task is registered?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense to move it to DAG's _add_sub

def _add_sub(self, node: Any):
if not isinstance(node, Task):
raise InvalidType(type(node))
if node.name in self._node_names:
raise NodeNameConflict(f"Found multiple Task nodes with name: {node.name}")
self._node_names.add(node.name)
self.tasks.append(node)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if not output_class or output_class is NoneType:
return None

if not issubclass(output_class, (OutputV1, OutputV2)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be a requirement, especially for gradual migrations and issues like #1191 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to return that would match the type signature though. Perhaps this can be addressed in a later PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the subnode._build_obj means users can do my_task.parameter_name when passes values between tasks - that behaviour should really only be available for the new decorator contexts as it's very experimental and unproven (see #1162). I think this PR is mixing up the two features which should be kept separate IMO, so I'm struggling to understand the motivation of the PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the new behaviour into the decorator_syntax experimental feature for now.

Copy link
Collaborator

@elliotgunton elliotgunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss the motivation behind this PR and how we might tidy it up as it's mixing up the two experimental features in a way that is non-obvious

Comment on lines 27 to 29
result = triple(IntInput(field=5)).field

assert result == "{{steps.triple.outputs.parameters.field}}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what the Pydantic IO feature flag was designed for and shouldn't be possible in a normal Steps context

Copy link
Collaborator Author

@alicederyn alicederyn Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved behind the decorator_syntax flag

@elliotgunton
Copy link
Collaborator

I think I just have 2 issues to resolve for the general direction of the PR - the main one being the code implementing the new decorator behaviour i.e. result = triple(IntInput(field=5)).field style, should be asserting on the "new decorators" feature flag.

The second is just if you could motivate the PR with an issue showing how the change in the existing example would work here

pydantic_io(
arguments=[
write_step.get_artifact("int-artifact").with_name("artifact-input"),
{
"param_int": 101,
"an_object": MyObject(a_dict={"my-new-key": "my-new-value"}),
},
]
)

I think the feature is a good stepping stone for users to migrate, but shouldn't be backported to the script_pydantic_io flag (as that's been out a while, whereas our funky new syntax is kinda buggy, and I actually wouldn't recommend migrating yet, having tried it out myself).

@alicederyn alicederyn force-pushed the use.pydantic.types.in.context branch 2 times, most recently from 0a7fb95 to d13e057 Compare September 5, 2024 17:47
@alicederyn alicederyn marked this pull request as ready for review September 5, 2024 17:53
@alicederyn
Copy link
Collaborator Author

Issue added: #1192

Copy link
Collaborator

@elliotgunton elliotgunton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating blocking status of the PR for discussion in #1192. I don't think we should make this drastic of a feature change to the old syntax

@elliotgunton elliotgunton marked this pull request as draft September 13, 2024 15:01
@elliotgunton
Copy link
Collaborator

Hey @alicederyn, we discussed this PR and motivation behind it in the maintainer's meeting (notes here), @samj1912 agreed that we don't want to mix the two syntaxes as that will confuse users, so I've moved the PR into the "draft" state.

It seems the main goal is to avoid a big-bang migration, or at least have a checkpoint approach as you explained. An alternative would be to work on #711 as the new decorators mean we can create a full Argo Workflow spec as Python stubs in Hera. This should mean we can construct the DAG too, so the only work to do would be convert old script templates to use Pydantic IO classes, which should match the YAML (because it would be generated from it (super awesome!!)) and then you'd just copy/paste the function body into the stubs.

Let's discuss when you're back, no rush to reply! 😄

@alicederyn alicederyn force-pushed the use.pydantic.types.in.context branch 2 times, most recently from da10075 to f263f56 Compare October 9, 2024 10:27
Code to check if _SCRIPT_PYDANTIC_IO_FLAG is set and error if not occurs
twice; factor out into a shared utility function.

Signed-off-by: Alice Purcell <[email protected]>
_create_subnode temporarily disables _context.declaring to avoid
triggering the altered `__getattribute__` behaviour in
TemplateInvocatorSubNodeMixin in two situations:

 - Pydantic's implementation of `!=`, which is attempting to read field
   values, and will raise a validation error due to the altered field
   names being of incorrect type.
 - Accessing `__class__` on a Pydantic subnode object with no build_obj
   set.

Instead, fix this by:

 - Using `is not` instead of `!=`, which is correct where we use it,
   and also faster.
 - Fixing `__getattribute__` to no longer fail if used on an object with
   no build_obj set.

Signed-off-by: Alice Purcell <[email protected]>
Extend experimental Pydantic I/O support to allow passing Pydantic types
into `@script`-decorated functions when inside a `with` workflow context
block, and using fields on the returned Pydantic output as shorthand for
the associated Hera template in subsequent steps.

Signed-off-by: Alice Purcell <[email protected]>
Currently, TemplateDecoratorFuncMixin._create_subnode unconditionally
adds a depends kwarg, causing a runtime error if the user also supplies
one. Instead, prefer the user-supplied one.

Signed-off-by: Alice Purcell <[email protected]>
Move the logic that creates the right leaf node for Steps, Parallel and
DAG to a _create_leaf_node method on those types. DAG now specifies how
to default the depends parameter to Task based on its
_current_task_depends field. This simplifies the duplicated logic in
_meta_mixins to a simple isinstance check for any of those three types,
followed by a _create_leaf_node call.

Signed-off-by: Alice Purcell <[email protected]>
Add a new experimental flag to gate use of Pydantic types in a workflow context manager

Signed-off-by: Alice Purcell <[email protected]>
@elliotgunton
Copy link
Collaborator

Closing as agreed in maintainer meeting discussion (I'd rather not leave the PR open indefinitely) - we need to consider the graduation criteria for the new decorators and then how to migrate once they are matured and ready to graduate.

@alicederyn alicederyn deleted the use.pydantic.types.in.context branch November 14, 2024 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:minor A change requiring a minor version bump type:enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow using Pydantic I/O in with-based code
3 participants