Runtime: Allow a step to execute only when all upstream steps have completed #850

josephjclark · 2025-01-03T14:05:24Z

When a step completes right now, the runtime looks to see if the step has any next (or downstream) steps. If so, the downstream step will be added to the queue to be executed immediately. If a step has multiple upstream edges, it'll run multiple times (after each one completes)

Basically if a node has upstream edges, they are treated as logical ORs. When each upstream step completes, the step will be re-executed.

This is often useful, but we also want to support a mode where a step will not execute until ALL upstream steps have completed. Like a logical AND.

See https://community.openfn.org/t/allow-a-step-to-run-only-when-all-upstream-ancestor-steps-have-run/738

Things to consider:

The runtime needs to be more aware of the hierarchy of steps. A step cannot be executed unless all upstream edges have been tested (or all upstream branches have been executed)
In other words, a step has dependencies now and cannot run until all dependencies have had a chance to run. Does this mean looking ahead in the queue to see if any upstream (including indirect upstream) steps are waiting? And then defer to the back of the queue? I think so - but it may be more complex than this
Do we toggle this behaviour on the edge, node, or global? Does it make sense that some branches are ORs and some are ANDs? I kind of hope not because that's over complicated and hard to visually explain.
How to reconcile state. Three upstream steps will have three different state objects. What state does the downstream step receive? We should have a shallow first-to-last merge - just squash it all down - by default. But we also need to enable a reconcile function which takes all state objects as arguments and returns a single state.
Don't get blocked if some upstream steps don't execute. The runtime needs to know if all upstream edges have had a chance to run, and when they've all been tried, we can run the downstream step.
In other words, if two upstreams steps say "execute x" and one upstream step says "don't execute x", who wins? I'd suggest that as soon as any step allows step x to run, then step x MUST run. We must just wait for any other ancestors to run first.
Remember that when referring to "upstream" steps, the upstream step may be indirect. Consider the whole branch.
Instead of a reconcile function, should we instead have a reconcile strategy, deep vs shallow? If deep, then we'll recursively traverse all state objects and arrays and merge them. Otherwise we just spread/assign keys at the top level.

The text was updated successfully, but these errors were encountered:

josephjclark · 2025-01-03T14:11:18Z

Something is noodling me. Is there a different version of this where you set the state behaviour to be shared or branched?

Branched mode is what we do now: all edges are ORs and each branch creates a unique slice of state which cannot be read by other steps.

In Shared mode, the state object is global - shared by all steps. In other words, when a step runs, it sees the sum/merged state of all the steps that have run before it. Whenever multiple branches converge, a reconcile function is needed.

But shared state doesn't imply that edges should be logical AND or OR. And shared state might give you sequencing problems if a step at the bottom of the workflow makes assumptions that upstream steps have run and modified state; or if one branch assumes another branch has executed. The execution pipeline doesn't give you much control of this stuff, and certainly doesn't tell you what has run. So if your workflow makes assumptions about state, the diagram must be structured correctly. And shared state would enable this to be subtly violated and will cause hard to debug problems.

So no, this is not the answer.

taylordowns2000 added this to v2 Jan 3, 2025

github-project-automation bot moved this to New Issues in v2 Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime: Allow a step to execute only when all upstream steps have completed #850

Runtime: Allow a step to execute only when all upstream steps have completed #850

josephjclark commented Jan 3, 2025

josephjclark commented Jan 3, 2025

Runtime: Allow a step to execute only when all upstream steps have completed #850

Runtime: Allow a step to execute only when all upstream steps have completed #850

Comments

josephjclark commented Jan 3, 2025

josephjclark commented Jan 3, 2025