Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test_record.TestRecordAfterShutdown.test_exit_code is tend to be flaky #1380

Closed
MichaelOrlov opened this issue Jun 9, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Jun 9, 2023

Description

The test_record.TestRecordAfterShutdown.test_exit_code from ros2bag is tend to be flaky since it appears to have failed multiple times recently on CI jobs:
http://ci.ros2.org/job/ci_linux/18847/
http://ci.ros2.org/job/ci_linux-aarch64/13388/

Expected Behavior

The test_record.TestRecordAfterShutdown.test_exit_code shall not be flaky and shall not fail randomly on the CI.

Actual Behavior

The test_record.TestRecordAfterShutdown.test_exit_code randomly fails with the error message:

FAIL: test_record.TestRecordAfterShutdown.test_exit_code[sqlite3]
2023-06-03T19:45:38.7299964Z ----------------------------------------------------------------------------------------------------------------------------------
2023-06-03T19:45:38.7300321Z Traceback (most recent call last):
2023-06-03T19:45:38.7300670Z   File "/__w/rosbag2/rosbag2/ros_ws/src/s1oktrgw0j/rosbag2/ros2bag/test/test_record.py", line 73, in test_exit_code
2023-06-03T19:45:38.7301027Z     launch_testing.asserts.assertExitCodes(
2023-06-03T19:45:38.7301467Z   File "/__w/rosbag2/rosbag2/ros_ws/build/launch_testing/launch_testing/asserts/assert_exit_codes.py", line 62, in assertExitCodes
2023-06-03T19:45:38.7301967Z     assert info.returncode in allowable_exit_codes, 'Proc {} exited with code {}'.format(
2023-06-03T19:45:38.7302418Z AssertionError: Proc ros2bag-cli-7 exited with code -9
2023-06-03T19:45:38.7302839Z ----------------------------- Captured stdout call -----------------------------
2023-06-03T19:45:38.7303047Z 
2023-06-03T19:45:38.7303143Z Starting test run test_record[sqlite3]
2023-06-03T19:45:38.7303591Z [INFO] [launch]: All log files can be found below /github/home/.ros/log/2023-06-03-19-45-17-529104-0dd7204ee0ae-126136
2023-06-03T19:45:38.7303940Z [INFO] [launch]: Default logging verbosity is set to INFO
2023-06-03T19:45:38.7304291Z [INFO] [ros2bag-cli-7]: process started with pid [126220]
2023-06-03T19:45:38.7304840Z [ros2bag-cli-7] stdin is not a terminal device. Keyboard handling disabled.[INFO] [1685821526.257381155] [rosbag2_recorder]: Press SPACE for pausing/resuming
2023-06-03T19:45:38.7305542Z [ros2bag-cli-7] [INFO] [1685821526.260584248] [rosbag2_storage]: Opened database '/tmp/tmpcx5wk9g_/ros2bag_test_record/ros2bag_test_record_0.db3' for READ_WRITE.
2023-06-03T19:45:38.7306069Z [ros2bag-cli-7] [INFO] [1685821526.264542138] [rosbag2_recorder]: Event publisher thread: Starting
2023-06-03T19:45:38.7306561Z [ros2bag-cli-7] [INFO] [1685821526.265532736] [rosbag2_recorder]: Listening for topics...
2023-06-03T19:45:38.7307011Z [ros2bag-cli-7] [INFO] [1685821526.270718724] [rosbag2_recorder]: Subscribed to topic '/rosout'
2023-06-03T19:45:38.7307436Z [INFO] [ros2bag-cli-7]: sending signal 'SIGINT' to process[ros2bag-cli-7]
2023-06-03T19:45:38.7307965Z [ERROR] [ros2bag-cli-7]: process[ros2bag-cli-7] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
2023-06-03T19:45:38.7308438Z [INFO] [ros2bag-cli-7]: sending signal 'SIGTERM' to process[ros2bag-cli-7]
2023-06-03T19:45:38.7308974Z [ERROR] [ros2bag-cli-7]: process[ros2bag-cli-7] failed to terminate '10.0' seconds after receiving 'SIGTERM', escalating to 'SIGKILL'
2023-06-03T19:45:38.7309436Z [INFO] [ros2bag-cli-7]: sending signal 'SIGKILL' to process[ros2bag-cli-7]
2023-06-03T19:45:38.7310002Z [ERROR] [ros2bag-cli-7]: process has died [pid 126220, exit code -9, cmd 'ros2 bag record -a -s sqlite3 --output /tmp/tmpcx5wk9g_/ros2bag_test_record'].

To Reproduce

Run CI on the morlov/properly_handle_ctrl-c_signal_on_windows branch or try to reproduce it locally with stress -m 100 -c 100 -i 100 command

  1. Go to local rosbag2 folder
  2. Checkout morlov/properly_handle_ctrl-c_signal_on_windows branch
  3. colcon build --packages-up-to ros2bag
  4. Run stress -m 100 -c 100 -i 100 command in another terminal window
  5. Run colcon test --packages-select ros2bag --event-handlers console_cohesion+ --retest-until-fail 150 in the terminal windows where colcon build was performed.

System (please complete the following information)

Additional context

@MichaelOrlov
Copy link
Contributor Author

RCA (Root Cause Analysis):

There are two major issues that could cause test_record.TestRecordAfterShutdown.test_exit_code from ros2bag to be flaky:

  1. A freeze in the recorder due to the race condition when calling Recorder::stop() while the event publisher thread hasn't been fully started yet.
    Will be resolved in the Fix for possible freeze in Recorder::stop() #1381.
  2. Unhandled exceptions from the underlying node when we are invalidating context by calling rclcpp::shutdown().
    Will be resolved in the Fix for rosbag2_transport::Recorder failures due to the unhandled exceptions #1382.

@MichaelOrlov
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant