Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifecyle node failed to make transition #1166

Open
kjjpc opened this issue Jul 8, 2024 · 1 comment
Open

Lifecyle node failed to make transition #1166

kjjpc opened this issue Jul 8, 2024 · 1 comment

Comments

@kjjpc
Copy link

kjjpc commented Jul 8, 2024

Bug report

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • ros-jazzy-ros-base=0.11.0-1 of apt package
  • DDS implementation:
    • Fast DDS (Default DDS of jazzy)
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

When I launch multiple lifecycle nodes using LaunchService, some nodes failed to be activated.
This is a reproduction script.

import os
import signal

from launch import LaunchDescription
from launch.actions import EmitEvent
from launch.actions import RegisterEventHandler
from launch.event_handlers import OnProcessStart
import launch.events
from launch.launch_description_entity import LaunchDescriptionEntity
from launch.launch_service import LaunchService
import launch_ros
from launch_ros.actions import LifecycleNode
from launch_ros.events.lifecycle import ChangeState
from lifecycle_msgs.msg import Transition
from multiprocessing import Process
from typing import List


def up_launch(launch_description_list: List[LaunchDescriptionEntity]):
    ls = LaunchService()
    ls.include_launch_description(LaunchDescription(launch_description_list))
    ls.run()


def generate_launch_description(ns):
    lc_node = LifecycleNode(
        package='lifecycle_py',
        executable='lifecycle_talker',
        name='lc_talker',
        namespace=ns,
        output='screen',
    )
    emit_configure = RegisterEventHandler(
        OnProcessStart(
            target_action=lc_node,
            on_start=[
                EmitEvent(event=ChangeState(
                    lifecycle_node_matcher=launch.events.matches_action(lc_node),
                    transition_id=Transition.TRANSITION_CONFIGURE,
                )),
            ]))

    #  send activate when on_configure finished
    emit_active = RegisterEventHandler(
        launch_ros.event_handlers.OnStateTransition(
            target_lifecycle_node=lc_node,
            goal_state='inactive',
            # start_state='configuring',
            entities=[
                EmitEvent(event=ChangeState(
                    lifecycle_node_matcher=launch.events.matches_action(lc_node),
                    transition_id=Transition.TRANSITION_ACTIVATE,
                )),
            ],
        ))

    return [lc_node, emit_configure, emit_active]


if __name__ == '__main__':
    os.setpgrp()
    try:
        for i in range(12):
            ls = generate_launch_description('room'+str(i))
            launch_proc = Process(target=up_launch,
                                  args=(ls,),
                                  )
            launch_proc.start()
    except KeyboardInterrupt:
        print('group kill')
        os.killpg(0, signal.SIGINT)

Expected behavior

All lifecycle nodes become active.

Actual behavior

Some nodes remains configured occasionally.

Additional information

I investigated the behavior and found that the lifecycle_node action of launch_ros failed to catch transition_event topic.
The lifecycle_node action create subscriber just before calling change_state and the topic subscription registration is not completed before transition_event publish.

One solution is to make transition_event topic reliable and transient local.
Following changes resolve the issue.

Add following code to com_interface.c.

   rcl_publisher_options_t publisher_options = rcl_publisher_get_default_options();
+  publisher_options.qos.reliability = RMW_QOS_POLICY_RELIABILITY_RELIABLE;
+  publisher_options.qos.durability = RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL;
+  publisher_options.qos.depth = 1;
   rcl_ret_t ret = rcl_publisher_init(

Change qos setting in lifecycle_node.py of launch_ros.

         self.__rclpy_subscription = node.create_subscription(
             lifecycle_msgs.msg.TransitionEvent,
             '{}/transition_event'.format(self.node_name),
             functools.partial(self._on_transition_event, context),
-            10)
+            QoSProfile(depth=10, reliability=ReliabilityPolicy.RELIABLE,
+                       durability=DurabilityPolicy.TRANSIENT_LOCAL))
@fujitatomoya
Copy link
Collaborator

@kjjpc thanks for posting the issue and detailed report.

return [lc_node, emit_configure, emit_active]

as a possible work-around, changing this line into return [lc_node, emit_active, emit_configure] can mitigate the issue in my local environment. (register OnStateTransition event handler before changing state to configure.)

i am not sure if this LaunchDescription order is requirement for the user application, but i can see the examples are considered with this order. e.g https://github.com/ros2/launch_ros/blob/dbc2bbc80d31ae932bdd7060669f58d0b6e39305/launch_ros/examples/lifecycle_pub_sub_launch.py#L79-L84

if this is the design for LaunchDescription, probably we would want to add the doc section for LaunchDescription. @wjwwood @adityapande-1995 what do you think?

Change qos setting in lifecycle_node.py of launch_ros.

we have to be careful on this. if this change is applied alone, it can break the downstream launch application with incompatible QoS.

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants