-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goal status corruption when goal takes longer to finishes than ActionServer's result_timeout #970
Comments
Yes, I think the correct thing to do is to expire goal results based on the time the result is ready (not the goal creation time). I think one way we could implement this would be to add a member to rcl_action_goal_handle_impl_s to record the result timestamp. We could then note the time when we are notified that a goal has reached a terminal state, e.g. change the signature of this function to include a goal UUID: rcl/rcl_action/include/rcl_action/action_server.h Lines 648 to 650 in 392f0d3
Changing the signature of |
At least the workaround is simple enough: pass a custom |
@jacobperron Without changing the timing behavior or function signatures, do you think it's possible to create and backport a fix for the memory issue that causes the invalid returned goal state? I was hoping that the goal handle could be properly cleaned up before sending the result value. This would result in a I'm asking because it took me quite some time to debug this issue. It's not trivial for unaware users that observing invalid goal states has something to do with the |
Aside from the timing bug, there is still the race you pointed out. E.g. if we set the expiry time to zero (expire goals immediately), then I guess we'll run into the same invalid memory issue. IMO, this is a bug that should be fixed and backported. I suspect it is something that needs to be fixed at the |
Bug report
Required Info:
Steps to reproduce issue
ActionServer
with a certainresult_timeout
(default: 900s) and anexecute_callback
that callsgoal_handle.succeed()
after a period greater thanresult_timeout
.ActionClient
, send a goal and wait for the result, e.g.,action_client.send_goal_async(goal).add_done_callback(_on_goal_response)
with:Expected behavior
All goal result status values are equal to
action_msgs.msg.GoalStatus.STATUS_SUCCEEDED
.Actual behavior
Some goal status values are equal to
action_msgs.msg.GoalStatus.STATUS_SUCCEEDED
and some have an invalid value, e.g.,-59
or-91
.Additional information
In
rcl_action_expire_goals()
(action_server.c
), inactive goals are checked against their creation timestamp and considered expired if they are created longer thanresult_timeout
ago. If this function is called directly after the goal succeeds, but before sending the result to theActionClient
, the goal might be considered as expired, its handle deallocated and bogus result data sent back to theActionClient
. Whether the function is called at this point or not seems to depend on the exact course of events and their timing.I believe the intended behavior is to cache the goal result after the goal finishes (
succeed()
,abort()
,canceled()
) to give the client some time to retrieve the result.Here is some debug log data of one of our production nodes - an
ActionServer
- to illustrate the course of events. I did not include all lines to avoid clutter, but I'd be happy to send the full log if so desired. Several times,STATUS_SUCCEEDED
is returned (4
) and sometimes invalid values (-59
,-91
):Whenever I observe
Updated timer period from '0ns' to '0ns'
after the goal is reached, an invalid state is returned.The text was updated successfully, but these errors were encountered: