Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context.on_shutdown is never called #1287

Open
firesurfer opened this issue May 22, 2024 · 8 comments
Open

context.on_shutdown is never called #1287

firesurfer opened this issue May 22, 2024 · 8 comments
Labels
help wanted Extra attention is needed

Comments

@firesurfer
Copy link

Bug report

It looks like the callback that can be registered via context.on_shutdown is never called.

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • binary
  • Version or commit hash:
    • 4.1.5
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

Register a callback to on_shutdown and then try various methods of exiting the node

def shutdown_handler():
    print("exit")

def main(args=None):
     rclpy.init(args=args) 
     
     node = Node("TestNode")
   
     node.context.on_shutdown(shutdown_handler)
     rclpy.get_default_context().on_shutdown(shutdown_handler)

     rclpy.spin(node)

I tried sending various signals: SIGINT, SIGTERM, SIGQUIT

Expected behavior

exit should be printed twice

Actual behavior

Prints nothing

Additional information

I also tried calling try_shutdown:

try:  
     rclpy.spin(node)
except (KeyboardInterrupt, rclpy.executors.ExternalShutdownException):
    pass
finally:
    rclpy.try_shutdown()

But this didn't result into on_shutdown being called either.

If one tries to explicitly shutdown the context:

finally:
    node.context.shutdown()

The following error is thrown:
failed to shutdown: rcl_shutdown already called on the given context

Related Issues

This issue is probably related to. But according to the issues I found on_shutdown should be at least called if the node is properly exited (or if try_shutdown is called?)

#532
#1077 <- This is actually what I am looking for as I do want to do some cleanup

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue May 23, 2024
@fujitatomoya
Copy link
Collaborator

This issue is probably related to. But according to the issues I found on_shutdown should be at least called if the node is properly exited (or if try_shutdown is called?)

if node is properly shutdown, on_shutdown callbacks are called. we have those test cases.

def test_on_shutdown_method():
context = Context()
context.init()
assert context.ok()
callback_called = False
class SomeClass:
def on_shutdown(self):
nonlocal callback_called
callback_called = True
instance = SomeClass()
context.on_shutdown(instance.on_shutdown)
context.shutdown()
assert not context.ok()
assert callback_called

the problem is with signal. once signal comes in e.g SIGINT, it deferred signal handler shuts down with calling rcl_shutdown in rcl layer, that means all contexts in this process space will be reset (invalid).

i think there are a couple of problems here, (i did not get deeper, i may be missing something here.)

after all, this is duplication of #532, i think we can close this an track #532 instead.

@firesurfer
Copy link
Author

Ok I read #532 that its only related to SIGINT (Ctrl+C) .

But this means there is currently no possibility to perform any cleanup action during shutdown when the program is closed via a signal (what is the default case during development) ?

@fujitatomoya
Copy link
Collaborator

As far as i played around, i could not find the solution to for the signal cases. But i might be missing here, i would like to hear from other maintainers and developers.

@firesurfer
Copy link
Author

firesurfer commented May 23, 2024

I found a very hacky workaround for performing cleanup calls at shutdown:

But it basically means one needs to recreate all publishers/clients etc. if one needs to perform any cleanup ros calls

    try:  
        while rclpy.ok():
            time.sleep(1.0)
    except KeyboardInterrupt:
        pass
    # This part is executed after the keyboard interrupt was caught. The default context is now destroyed and can not be recreated. 

    new_context = rclpy.Context()
    rclpy.init(context=new_context)
    new_node = rclpy.create_node("test", context=new_context)
    
    # Use the new new_node to perform cleanup ros calls

EDIT: Just to clarify: This does not solve on_shutdown not being called in signals.

@audrow
Copy link
Member

audrow commented Jun 6, 2024

I'm going to mark this as help wanted for now.

I think we need a much bigger discussion on signal handling across the ROS 2 core repos.

@audrow audrow added the help wanted Extra attention is needed label Jun 6, 2024
@firesurfer
Copy link
Author

@audrow Sounds good. Is there already an open issue for that I could subscribe to ?

@fujitatomoya
Copy link
Collaborator

@firesurfer not that i am aware of...

there are several issues and PRs are opened,

besides those,

rclcpp, rclpy (this issue) and rosbag2 have the signal handler support by their own implementation.

I think we need a much bigger discussion on signal handling across the ROS 2 core repos.

So i this this means that we need to consider current implementation and problems, and discussion for redesign for ROS 2.

@lks-dev
Copy link

lks-dev commented Sep 13, 2024

Hi there,

I am doing something else quite hacky, to be able to log messages to rousout after the SIGINT rclpy humble. The idea cam from recognizing, that the installed signalhandlers in rclpy's implementation are shutting down the current context before I was able to handle the sigint/keyboardInterrupt myself:

"""Implementation of a Minimal Example Node."""

import traceback

import rclpy
import rclpy.executors
import rclpy.logging
import rclpy.node
from example_interfaces.srv import AddTwoInts
from rclpy.signals import SignalHandlerOptions
from std_msgs.msg import String

LOGGER = rclpy.logging.get_logger(__file__)


class MinimalExampleNode(rclpy.node.Node):
    """
    Implement a Minimal Example Node, that showcases how to log after SIGINT.

    Based on example tutorial code from ros2 doc. MinimalPublisher and
    MinimalService.
    """

    def __init__(self) -> None:
        super().__init__('minimal_publisher')
        self.publisher_ = self.create_publisher(String, 'topic', 10)
        self.i = 0
        self.get_logger().info('init completed')
        # commenting the timer here showcases the fact, that the SIGINT will
        # only be processed, when a service request send to the node,
        # triggering the python callback.
        self.timer = self.create_timer(
            0.5,  # seconds
            self.timer_callback,
        )
        self.create_service(
            AddTwoInts,
            'add',
            self.add
        )

    def add(self, request, response):  # noqa: ANN001, ANN201, D102
        self.i = request.a + request.b
        response.sum = self.i
        return response

    def timer_callback(self) -> None:  # noqa: D102
        msg = String()
        msg.data = 'Hello World: %d' % self.i
        self.publisher_.publish(msg)
        self.get_logger().info('Publishing: "%s"' % msg.data)
        self.i += 1
        self.get_logger().info('self.i: %i' % self.i)
        if self.i > 2:
            # This will trigger the catch all exceptions block.
            msg = 'smthing bad could happen if i is above 2!'
            raise RuntimeError(msg)


def main() -> None:
    """Run Node in the ROS2 Network and log errors."""
    try:
        ros_context = rclpy.get_default_context()
        rclpy.init(
            context=ros_context,
            signal_handler_options=SignalHandlerOptions.NO
        )

        executor = rclpy.executors.SingleThreadedExecutor(
            context=ros_context
        )

        node = MinimalExampleNode()

        executor.add_node(node)
        executor.spin()

    except KeyboardInterrupt:
        node.get_logger().fatal(
            'got interrupted by a KeyboardInterrupt and will'
            ' shutdown now...'
        )
    except Exception as error:  # pragma: no cover
        # catch all exceptions to log error and traceback before exiting...
        err_msg = f'experienced an exception:\n{traceback.format_exc()}\n'
        'will shutdown now.'
        try:
            node.get_logger().fatal(err_msg)
        except Exception:  # incase the node is no longer there.
            rclpy.logging.get_logger(__file__).fatal(err_msg)
            # this is not published, useful fallback to get it in the
            # roslogging file.
        raise error
    finally:
        rclpy.try_shutdown()


if __name__ == '__main__':  # pragma: no cover
    main()

A caveat with this implementation is, that the signal is only processed in the client code, when a python callback is processed. Hence it only works with nodes, that regularly process python callbacks, e.g. from a timer. In one node, that does not rely on a timer i added an 'empty' callback in the following fashion:
self.create_timer(.1, (lambda: None)) # used to get a raised KeyboardInterrupt recognized and handled

This pattern of not letting the rclpy's handle the signals is probably not the best idea, but I am not entirely sure how bad it is or what bad things could happen, as I make sure to manually call the rclpy.try_shutdown() in the final statement.

So any improvement suggests or information on why its a bad idea or how stupid the idea is, are highly appreciated.

From my perspective a on_shutdown callback, which is guaranteed to run, before the current context is invalided, for what ever reason, would be the 'best' solution for the task of gracefully execute cleanup code that involves communicating over the ROS2 network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants