-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation about workflow and task lifecycle events #1054
base: main
Are you sure you want to change the base?
Add documentation about workflow and task lifecycle events #1054
Conversation
Closes serverlessworkflow#1030 Signed-off-by: Charles d'Avernas <[email protected]>
Signed-off-by: Charles d'Avernas <[email protected]>
- [Task Lifecycle Events](#task-lifecycle-events) | ||
+ [Task Created](#task-created-event) | ||
+ [Task Started](#task-started-event) | ||
+ [Task Suspended](#task-suspended-event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I took a closer look, and after my yesterday comment related with workflow cancelation suspension, Im not 100% sure a task can be really suspended.
A cancelation suspension of a workflow can be understood as "let the current task to finish and hold execution of the next one" rather than "Interrupt current task and let it in a limbo"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a task is a composite
one (for, do, try) it can be of course be stopped, but at the end, it will executing another non composite
task (basically a call), which is kind of atomic.
I think we can avoid that conundrum by remaining silent about the whole task cycle thing and just define events at workflow level (and task started and completed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I took a closer look, and after my yesterday comment related with workflow cancelation, Im not 100% sure a task can be really suspended.
Why not? Let's say you do a wait
task, which sleeps for 10secs. You suspend it after 5secs, when resumed it still sleeps for another 5secs. Same can apply to consuming events, foreach enumerations, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid that conundrum by remaining silent about the whole task cycle thing and just define events at workflow level (and task started and completed)
That's an option, but I think it's our loss. Workflow level updates are not enough in most cases, such as long running flows. An other option, though I'd prefer to leave it as is, is to only enforce workflow lifecycle events, while leaving task lifecycle events optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the wait example, if we interrupt the wait (or stop listening to an event), what we should do when resuming, execute the next task after the wait or execute the wait again? I think we can avoid that question by not letting suspeding a workflow that is already waiting.
In the case of a For, what I suggested is to let the call that the loop is currently exeucting finish (the definition of finish might vary for a task, if a call, let the call finish, if a set, let the assignment finish, if a do, hold after the current task being executed finish, and so on )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I edited my first comment, Im not talking about cancelation, but about suspension. Sorry for the confusion. Cancellation is clear, you cancel the workflow execution, interrupting if you can and not allowing resuming. But suspending-resuming is trickier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the wait example, if we interrupt the wait (or stop listening to an event), what we should do when resuming
I think you missed my point there:
Let's say you do a wait task, which sleeps for 10secs. You suspend it after 5secs, when resumed it still sleeps for another 5secs.
In the case of a For, what I suggested is to let the call that the loop is currently executing finish
If what you mean is that the iteration should be completed before suspending, I think it's a possibility, but not my personal preference, as the iteration can be a succession of perfectly suspendable tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not allow suspending a workflow that is waiting (so implementors do not need that resuming logic for event/wait, which might be pretty tricky in some cases)
And for other task, I think suspending should let the operation finish (the call or the jq expression) and then freeze, so you resume in the next task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not allow suspending a workflow that is waiting
Your opinion makes sense, it can be tricky indeed! Maybe we should let that up to the implementers, and add a couple of lines about those specifics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resuming tasks can be challenging. Ideally, as @cdavernas suggested, allowing the runtime to directly resume a suspended task would be the best approach. However, this is often not feasible. For a passive task, such as a wait
operation, it might work. But for active tasks, like an HTTP call
request, the situation becomes more complex. Even if the runtime cancels the request, the server may already be processing it, and retrying the request upon resuming could lead to unintended behaviors.
As a result, there isn't a universal solution. The suspension mechanism must depend on the nature of the task. Passive tasks, such as wait
or listen
, can typically be suspended and resumed. Active tasks, like call
or run
, would need to complete before the suspension can occur, deferring it to the subsequent task.
+ [Task Cancelled](#task-cancelled-event) | ||
+ [Task Faulted](#task-faulted-event) | ||
+ [Task Completed](#task-completed-event) | ||
+ [Task Status Changed](#task-status-changed-event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status, as I understood it, only applies to the worklow as a whole
https://github.com/serverlessworkflow/specification/blob/main/dsl.md#status-phases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in my opinion. Task status is extremely important in most, if not all workflow execution scenarios. Think of a long running flow which you represent in a UI. Just having flow events will let your users know the flow has started or ended, but whatever is in the middle will be unknown to them, potentially for hours or even days at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then we also need to edit that section to make it clear phase status applies both for workflow and task (I implemented it just for workflows because there was not hint on that section)
Note that I also implemented task created and task completed, because I concur on the usefulness for users, which is unclert to me is that they need more than task created and task completed ;), specially because we have compoiste task, so in theory you can have a do withint a do withitn a for withint a try and the four task will be running, so when you suspend them, you will have to send 4 events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then we also need to edit that section to make it clear phase status applies both for workflow and task
I edited it in the PR, because as you said it only applied to workflows beforehand. Have you checked the updated section in the PR? I think it's clear, but I'd be happy to update it as you see fit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is unclear to me is that they need more than task created and task completed
Well, I see what you mean, but having at least canceled
and faulted
seemed reasonable. Let's discuss it in today's daily!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdavernas @JBBianchi as dicussed in Slack, I think that if we have specific event for every phase change (both for workflow an task) we do not really need the status change one (or the other way around, if we have the status change, we do not need the specific ones)
Having only the specific ones (created, started, faulted, completed) covers all the status phase changes, so a user just interested in the fact that the status has changed, can consume them and ignore the cloud event data (thats the beauty of the CloudEvent structure, the same event can be used in different ways depending on the consumer needs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status change event is a generic event which is in my personal experience critically useful. As a matter of fact, users who want to have lifecycle notifications but that are not interested in event-specific data are usually going for those kind of events. While not necessary per say, they are an absolute plus for many users. They are usually fired after any other events, to communicate in a generic fashion with the outside world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users interested just in status change can subscribe to all event types and just check the type CE attribute to find out which phase changed. They can completely ignore (and with CE they do not need to deserialize) the data field (thats why I mention the beauty of CE)
That way we avoid publishing two different types of events for the same ocurrence (that the task changed the status from one state to another, with the current approach, we will be sending two events: the specific and the generic one, Im claiming that the specific can be consumed generically)
+ [Workflow Completed](#workflow-completed-event) | ||
+ [Workflow Status Changed](#workflow-status-changed-event) | ||
- [Task Lifecycle Events](#task-lifecycle-events) | ||
+ [Task Created](#task-created-event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see how a task can be created and not started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, a task can be instantiated, meaning initialized and provisioned with its input, and not have yet started. That's what we do in Synapse. Unlike workflows, tasks do not exist beforehand, you therefore need to tell the user about them being created: you cannot start something that has not been created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use an example
do
- firstTask
- secondTask
- thirdTask
I guess the sequence of events will be firstTask created, first task started, ..., first task completed, second task created, second task started, ..., second task completed
so inmediatealy after firstTask instance has been created with some input, it will start execution. (obviously you have to create something before executing it, which Im discussing is the fact that a task can be created without starting it just inmediately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Im trying to say is the same we do not have an event called "setting input to task instance before actually starting it" we do not need an event "taks created before starting it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im trying to save a redundant event, thats it, but if you feel the distinction between task created and task started is relevant, as far as it not fordibben for them to be simoulteneous in a particular implementation, Im fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JBBianchi Thats it. For me the moment in which is a task instance is created is an implementation detail the user is not really interested on, specially when, as far as I know, we do not really have the concept of an schedule task and users cannot arbitrarily exeucte task out of order (the task are executed according to the workflow definition) or do not explicitly intervene in the transitions (they might indirectly intervene by sending events for listen task or in custom ones, but there is not a task phase transition API the user might invoke. In other words, task are not created waiting for the user to start them manually. If that was the situation, the event will make perferct sense, but that being not the case, this state creates some ambiguity that I will prefer to avoid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjtirado The problem that I see is the data carried by the created event, which I do not want to carry over to the started one, for semantic reasons. While I do not care of getting rid of the created event, I am strongly opposed to moving its payload to the started one. However, not doing so, we are forbidding the external world to know intrinsics about the task being run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, task are not created waiting for the user to start them manually.
@fjtirado No, but as @JBBianchi mentionned, you could have provisionned them beforehand, before running them in the order defined by the workflow, in a warmup procedure or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we define the required ones in the spec and allow implementations to emit others as they find specific use cases? Later, if the proposal is compelling and the community finds it useful, we can add it to the spec.
This discussion is endless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that's exactly what @fjtirado and I concluded hours ago in Slack PMs 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several doubts about the concept of task life cycle.
Its doable, buy pretty tricky and Im not sure we have to force implementors to keep track of every task status.
Also, for some task, for example, wait, which should be the status? suspended?
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
Co-authored-by: Ricardo Zanini <[email protected]>
I must be missing something, because I do not see the problem with it. As a matter of fact, we are already doing it in Synapse, as you can see here, for example. |
Signed-off-by: Charles d'Avernas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets perform the agreed changed and move forward
@cdavernas Thanks a lot!
Please specify parts of this PR update:
Discussion or Issue link:
#1024
#1030
What this PR does: