Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about workflow and task lifecycle events #1054

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

cdavernas
Copy link
Member

@cdavernas cdavernas commented Jan 9, 2025

Please specify parts of this PR update:

  • Specification
  • Schema
  • Examples
  • Extensions
  • Use Cases
  • Community
  • CTK
  • Other

Discussion or Issue link:
#1024
#1030

What this PR does:

  • Adds documentation about workflow and task lifecycle events

@cdavernas cdavernas added change: documentation Improvements or additions to documentation. It won't impact a version change. change: feature New feature or request. Impacts in a minor version change area: spec Changes in the Specification labels Jan 9, 2025
@cdavernas cdavernas added this to the v1.0.0 milestone Jan 9, 2025
@cdavernas cdavernas self-assigned this Jan 9, 2025
@cdavernas cdavernas linked an issue Jan 9, 2025 that may be closed by this pull request
@cdavernas cdavernas linked an issue Jan 9, 2025 that may be closed by this pull request
- [Task Lifecycle Events](#task-lifecycle-events)
+ [Task Created](#task-created-event)
+ [Task Started](#task-started-event)
+ [Task Suspended](#task-suspended-event)
Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I took a closer look, and after my yesterday comment related with workflow cancelation suspension, Im not 100% sure a task can be really suspended.
A cancelation suspension of a workflow can be understood as "let the current task to finish and hold execution of the next one" rather than "Interrupt current task and let it in a limbo"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a task is a composite one (for, do, try) it can be of course be stopped, but at the end, it will executing another non composite task (basically a call), which is kind of atomic.
I think we can avoid that conundrum by remaining silent about the whole task cycle thing and just define events at workflow level (and task started and completed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I took a closer look, and after my yesterday comment related with workflow cancelation, Im not 100% sure a task can be really suspended.

Why not? Let's say you do a wait task, which sleeps for 10secs. You suspend it after 5secs, when resumed it still sleeps for another 5secs. Same can apply to consuming events, foreach enumerations, etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid that conundrum by remaining silent about the whole task cycle thing and just define events at workflow level (and task started and completed)

That's an option, but I think it's our loss. Workflow level updates are not enough in most cases, such as long running flows. An other option, though I'd prefer to leave it as is, is to only enforce workflow lifecycle events, while leaving task lifecycle events optional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the wait example, if we interrupt the wait (or stop listening to an event), what we should do when resuming, execute the next task after the wait or execute the wait again? I think we can avoid that question by not letting suspeding a workflow that is already waiting.
In the case of a For, what I suggested is to let the call that the loop is currently exeucting finish (the definition of finish might vary for a task, if a call, let the call finish, if a set, let the assignment finish, if a do, hold after the current task being executed finish, and so on )

Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I edited my first comment, Im not talking about cancelation, but about suspension. Sorry for the confusion. Cancellation is clear, you cancel the workflow execution, interrupting if you can and not allowing resuming. But suspending-resuming is trickier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the wait example, if we interrupt the wait (or stop listening to an event), what we should do when resuming

I think you missed my point there:

Let's say you do a wait task, which sleeps for 10secs. You suspend it after 5secs, when resumed it still sleeps for another 5secs.

In the case of a For, what I suggested is to let the call that the loop is currently executing finish

If what you mean is that the iteration should be completed before suspending, I think it's a possibility, but not my personal preference, as the iteration can be a succession of perfectly suspendable tasks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not allow suspending a workflow that is waiting (so implementors do not need that resuming logic for event/wait, which might be pretty tricky in some cases)
And for other task, I think suspending should let the operation finish (the call or the jq expression) and then freeze, so you resume in the next task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not allow suspending a workflow that is waiting

Your opinion makes sense, it can be tricky indeed! Maybe we should let that up to the implementers, and add a couple of lines about those specifics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resuming tasks can be challenging. Ideally, as @cdavernas suggested, allowing the runtime to directly resume a suspended task would be the best approach. However, this is often not feasible. For a passive task, such as a wait operation, it might work. But for active tasks, like an HTTP call request, the situation becomes more complex. Even if the runtime cancels the request, the server may already be processing it, and retrying the request upon resuming could lead to unintended behaviors.

As a result, there isn't a universal solution. The suspension mechanism must depend on the nature of the task. Passive tasks, such as wait or listen, can typically be suspended and resumed. Active tasks, like call or run, would need to complete before the suspension can occur, deferring it to the subsequent task.

+ [Task Cancelled](#task-cancelled-event)
+ [Task Faulted](#task-faulted-event)
+ [Task Completed](#task-completed-event)
+ [Task Status Changed](#task-status-changed-event)
Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status, as I understood it, only applies to the worklow as a whole
https://github.com/serverlessworkflow/specification/blob/main/dsl.md#status-phases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in my opinion. Task status is extremely important in most, if not all workflow execution scenarios. Think of a long running flow which you represent in a UI. Just having flow events will let your users know the flow has started or ended, but whatever is in the middle will be unknown to them, potentially for hours or even days at a time.

Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then we also need to edit that section to make it clear phase status applies both for workflow and task (I implemented it just for workflows because there was not hint on that section)
Note that I also implemented task created and task completed, because I concur on the usefulness for users, which is unclert to me is that they need more than task created and task completed ;), specially because we have compoiste task, so in theory you can have a do withint a do withitn a for withint a try and the four task will be running, so when you suspend them, you will have to send 4 events

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then we also need to edit that section to make it clear phase status applies both for workflow and task

I edited it in the PR, because as you said it only applied to workflows beforehand. Have you checked the updated section in the PR? I think it's clear, but I'd be happy to update it as you see fit!

Copy link
Member Author

@cdavernas cdavernas Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is unclear to me is that they need more than task created and task completed

Well, I see what you mean, but having at least canceled and faulted seemed reasonable. Let's discuss it in today's daily!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdavernas @JBBianchi as dicussed in Slack, I think that if we have specific event for every phase change (both for workflow an task) we do not really need the status change one (or the other way around, if we have the status change, we do not need the specific ones)
Having only the specific ones (created, started, faulted, completed) covers all the status phase changes, so a user just interested in the fact that the status has changed, can consume them and ignore the cloud event data (thats the beauty of the CloudEvent structure, the same event can be used in different ways depending on the consumer needs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status change event is a generic event which is in my personal experience critically useful. As a matter of fact, users who want to have lifecycle notifications but that are not interested in event-specific data are usually going for those kind of events. While not necessary per say, they are an absolute plus for many users. They are usually fired after any other events, to communicate in a generic fashion with the outside world.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users interested just in status change can subscribe to all event types and just check the type CE attribute to find out which phase changed. They can completely ignore (and with CE they do not need to deserialize) the data field (thats why I mention the beauty of CE)
That way we avoid publishing two different types of events for the same ocurrence (that the task changed the status from one state to another, with the current approach, we will be sending two events: the specific and the generic one, Im claiming that the specific can be consumed generically)

+ [Workflow Completed](#workflow-completed-event)
+ [Workflow Status Changed](#workflow-status-changed-event)
- [Task Lifecycle Events](#task-lifecycle-events)
+ [Task Created](#task-created-event)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how a task can be created and not started

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a task can be instantiated, meaning initialized and provisioned with its input, and not have yet started. That's what we do in Synapse. Unlike workflows, tasks do not exist beforehand, you therefore need to tell the user about them being created: you cannot start something that has not been created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use an example
do

  • firstTask
  • secondTask
  • thirdTask

I guess the sequence of events will be firstTask created, first task started, ..., first task completed, second task created, second task started, ..., second task completed

so inmediatealy after firstTask instance has been created with some input, it will start execution. (obviously you have to create something before executing it, which Im discussing is the fact that a task can be created without starting it just inmediately)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Im trying to say is the same we do not have an event called "setting input to task instance before actually starting it" we do not need an event "taks created before starting it"

Copy link
Collaborator

@fjtirado fjtirado Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im trying to save a redundant event, thats it, but if you feel the distinction between task created and task started is relevant, as far as it not fordibben for them to be simoulteneous in a particular implementation, Im fine

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JBBianchi Thats it. For me the moment in which is a task instance is created is an implementation detail the user is not really interested on, specially when, as far as I know, we do not really have the concept of an schedule task and users cannot arbitrarily exeucte task out of order (the task are executed according to the workflow definition) or do not explicitly intervene in the transitions (they might indirectly intervene by sending events for listen task or in custom ones, but there is not a task phase transition API the user might invoke. In other words, task are not created waiting for the user to start them manually. If that was the situation, the event will make perferct sense, but that being not the case, this state creates some ambiguity that I will prefer to avoid)

Copy link
Member Author

@cdavernas cdavernas Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjtirado The problem that I see is the data carried by the created event, which I do not want to carry over to the started one, for semantic reasons. While I do not care of getting rid of the created event, I am strongly opposed to moving its payload to the started one. However, not doing so, we are forbidding the external world to know intrinsics about the task being run.

Copy link
Member Author

@cdavernas cdavernas Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, task are not created waiting for the user to start them manually.

@fjtirado No, but as @JBBianchi mentionned, you could have provisionned them beforehand, before running them in the order defined by the workflow, in a warmup procedure or whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we define the required ones in the spec and allow implementations to emit others as they find specific use cases? Later, if the proposal is compelling and the community finds it useful, we can add it to the spec.

This discussion is endless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's exactly what @fjtirado and I concluded hours ago in Slack PMs 😉

Copy link
Collaborator

@fjtirado fjtirado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several doubts about the concept of task life cycle.
Its doable, buy pretty tricky and Im not sure we have to force implementors to keep track of every task status.
Also, for some task, for example, wait, which should be the status? suspended?

dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl-reference.md Outdated Show resolved Hide resolved
dsl.md Show resolved Hide resolved
@cdavernas
Copy link
Member Author

cdavernas commented Jan 9, 2025

Also, for some task, for example, wait, which should be the status? suspended?

@fjtirado That's specified in the dsl.md file:

| waiting | The workflow/task execution is temporarily paused, awaiting either inbound event(s) or a specified time interval as defined by a wait task. |

cdavernas and others added 9 commits January 9, 2025 16:26
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
@cdavernas
Copy link
Member Author

I have several doubts about the concept of task life cycle.
Its doable, buy pretty tricky

I must be missing something, because I do not see the problem with it. As a matter of fact, we are already doing it in Synapse, as you can see here, for example.

Copy link
Collaborator

@fjtirado fjtirado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets perform the agreed changed and move forward
@cdavernas Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spec Changes in the Specification change: documentation Improvements or additions to documentation. It won't impact a version change. change: feature New feature or request. Impacts in a minor version change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Cloud events to be publlished when workflow status change Add a new suspended workflow status phase
4 participants