Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeSequencer.simple test hanging in CI #21

Open
pbaughman opened this issue Apr 11, 2019 · 3 comments
Open

TimeSequencer.simple test hanging in CI #21

pbaughman opened this issue Apr 11, 2019 · 3 comments

Comments

@pbaughman
Copy link

pbaughman commented Apr 11, 2019

One of our CI jobs just failed because the TimeSequencer.simple unit test hung:

      Start  4: message_filters-test_time_sequencer

4: Test command: /usr/bin/python3 "-u" "opt_path/share/ament_cmake_test/cmake/run_test.py" "ws_path/build/message_filters/test_results/message_filters/message_filters-test_time_sequencer.gtest.xml" "--package-name" "message_filters" "--output-file" "ws_path/build/message_filters/ament_cmake_gtest/message_filters-test_time_sequencer.txt" "--command" "ws_path/build/message_filters/message_filters-test_time_sequencer" "--gtest_output=xml:ws_path/build/message_filters/test_results/message_filters/message_filters-test_time_sequencer.gtest.xml"
4: Test timeout computed to be: 60
4: -- run_test.py: invoking following command in 'ws_path/src/path/message_filters':
4:  - ws_path/build/message_filters/message_filters-test_time_sequencer --gtest_output=xml:ws_path/build/message_filters/test_results/message_filters/message_filters-test_time_sequencer.gtest.xml
4: [==========] Running 3 tests from 1 test case.
4: [----------] Global test environment set-up.
4: [----------] 3 tests from TimeSequencer
4: [ RUN      ] TimeSequencer.simple
 4/13 Test  #4: message_filters-test_time_sequencer ............***Timeout  60.01 sec

A passing run in our CI takes about half a second to run, so this isn't a case of the test just taking a little longer than normal.

It's almost certainly getting stuck in rclcpp::spin_some(node) on line 97 or a few lines later. The queue of things to handle is either building up faster than spin_some can process, or some work item in the queue is adding a work item to the queue so spin_some never finishes.

I need to dive into this a little more to see what's being tested, but maybe we could spin until something specific happens, or spin until future complete? If someone reads this and has a suggestion, please chime in

@pbaughman
Copy link
Author

It looks like we've got message_filters.message_filters-test_fuzz marked as problematic too, possibly for the same reason

@ZhenshengLee
Copy link

happens in rolling, output in in below.

zs@zs-vm-2204:~/ga_ws/ga_ros/colcon/build/message_filters$ ./message_filters-test_fuzz 
[==========] Running 3 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from TimeSequencer
[ RUN      ] TimeSequencer.fuzz_sequencer
unknown file: Failure
C++ exception with description "can't compare times with different time sources" thrown in the test body.
[  FAILED  ] TimeSequencer.fuzz_sequencer (26 ms)
[----------] 1 test from TimeSequencer (26 ms total)

[----------] 1 test from TimeSynchronizer
[ RUN      ] TimeSynchronizer.fuzz_synchronizer
[       OK ] TimeSynchronizer.fuzz_synchronizer (5013 ms)
[----------] 1 test from TimeSynchronizer (5013 ms total)

[----------] 1 test from Subscriber
[ RUN      ] Subscriber.fuzz_subscriber
[       OK ] Subscriber.fuzz_subscriber (5022 ms)
[----------] 1 test from Subscriber (5022 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 3 test suites ran. (10061 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TimeSequencer.fuzz_sequencer

 1 FAILED TEST

@wedenikf
Copy link

After migrating from galactic to humble I encountered the same exception when using TimeSequencer. "can't compare times with different time sources"

There are two problematic comparisons, a first here when adding any message

if (mt::TimeStamp<M>::value(*evt.getMessage()) < last_time_)

and a second here during dispatch.

if ((stamp + delay_) <= rclcpp::Clock().now())

It seems that the problem was introduced by fixing issue #72. This changes the rcl_clock_type_t returned by all message_filters::message_traits::TimeStamp<M>::value() for Messages with a Header to RCL_ROS_TIME.

template<typename M>
struct TimeStamp<M, typename std::enable_if<HasHeader<M>::value>::type >
{
  static rclcpp::Time value(const M& m) {
    return rclcpp::Time(m.header.stamp, RCL_ROS_TIME);
  }
};

TimeSequencer internally uses a member last_time_ which is never explicitly initialized. Based on time.hpp (checked for humble), this should be the relevant constructor in that case, which sets rcl_clock_type_t to RCL_SYSTEM_TIME.

explicit Time(int64_t nanoseconds = 0, rcl_clock_type_t clock_type = RCL_SYSTEM_TIME);

The comparison between last_time_ and the result of message_filters::message_traits::TimeStamp<M>::value() leads to the first exception.

In dispatch the result of message_filters::message_traits::TimeStamp<M>::value() is compared with rclcpp::Clock().now(). Based on clock.hpp (checked for humble) this should be the relevant constructor that also sets rcl_clock_type_t to RCL_SYSTEM_TIME.

explicit Clock(rcl_clock_type_t clock_type = RCL_SYSTEM_TIME);

Sorry for the long comment, but here are some more thoughts:

In message_traits.h there is the template definition for getting the value of the messages header stamp (changed in #72), but the SFINAE that's there as fallback for messages without header would return time with a different rcl_clock_type_t. IMHO both these functions should return the same rcl_clock_type_t.

TimeSequencer already holds a rclcpp::Node, which could be used to get the clock and clock type. (I.e. node_->get_clock()->now() over rclcpp::Clock().now())

The unit tests in "time_sequencer_unittest.cpp" aren't affected by this problem as the test defines a custom struct Msg and provides a new template definition for getting the stamp, which uses RCL_SYSTEM_TIME. Thus within these unit tests everything currently uses RCL_SYSTEM_TIME.

I'm happy to follow up on this if my descriptions aren't clear enough. Is it fine to handle these problems here or should I open a new issue for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants