-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove catchall-rethrow indirections in the Device state wrappers and as a consequence also the ERROR transition #372
Comments
I think to remove it completely we should at least have something equivalent in the SDK that would notify of the termination. Something on the controller side or in between should switch the device state from whatever it was when error happened to an Error State or similar. I think leaving the ErrorState here is the simplest, even if the device doesn't automatically go into it (which we anyway cannot guarantee for crashes/terminations). Even if only for a reason that some controller may already have it in use. But perhaps it needs to be extended with further infos, like exit code. |
True, then let's revisit a possible removal of the ERROR state once we have proper DDS APIs to monitor the task termination from the SDK. (cc: @AnarManafov, @AndreyLebedev just FYI) |
@dennisklein , DDS ToolAPI now supports subscription on onTaskDone events. I have finished the implementation and pushed the code to the master last week.
|
for reference: AliceO2Group/AliceO2#6499 |
In an offline discussion we (@rbx, @davidrohr, and myself) agreed (correct me if I am wrong) that having a cooperative signaling about an imminent terminate inside the about-to-terminate process is redundant, because a controller has to monitor and handle terminate events anyways from outside. Removing the internal signaling logic (catchall-transition-to-ERROR-rethrow) will improve debuggability.
This likely opens the possibility to remove the FairMQ ERROR state completely AFAICS. If we do not find any surprises, I suggest to remove it together with the change above.
Once we have a PR ready, let us collect approval by all Alice O2 subsystems, that might potentially be affected, since I really lost overview who could be relying on the ERROR state): EPN, DD, DPL, QC, FLP. Anyone else?
The text was updated successfully, but these errors were encountered: