-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cyclone DDS fails nightly CI tests #2043
Comments
It would be great if a more "minimal" example can be provided. My wild guess is that this can be related to ros2/rmw_cyclonedds#74, ros2/rmw_cyclonedds#191. If a service request is "lost" (because of service discovery issues), that could cause the error output seen in circleci. Relevant part of circle CI output:
|
It'd be nice to get all 4 of the failing tests working, but as you've noted, beginning with
|
So far I'm not having much luck in reproducing locally: my "docker build" step fails because of errors building the gazebo_ros test_plugins. I'll try to get past that, because I really want to know the root cause now that I am back from vacation. Anyway, while I agree with @ivanpauno's observation, by design there should be only a single way for that particular failure to occur if the application code is well-behaved (i.e., service doesn't get destroyed when it still has a request to service), and that is a time-out on the checking on the service side. If someone happens to know whether the service's attempt at publishing the response reports an error or not, that'd be helpful. Otherwise, I'm sure I'll eventually reproduce it and will be able to see for myself. I think it is quite likely that the Cyclone trace files should contain enough data to determine what exactly happened (regardless of whether the service ran into an error). Once I can reproduce it, I'll be able to get those. In principle, anyone who can reproduce the issue can get them, all it takes is configuring the tracing, but with many tests and many executables, it gets messy quickly, and I don't want to ask that from others. |
Does anyone know what version of gtest I am supposed to use? With an up-to-date Ubuntu 20.04, gtest-dev 1.10.0-2, up-to-date Foxy installed on it and following the reproduction process described here, as well as with last night's https://raw.githubusercontent.com/ros2/ros2/master/ros2.repos and a full source build, I'm running into build failures in the tests. (The gazebo_ros ones I have simply skipped over, but that, it seems to me, won't do for the system tests of nav2 itself):
(P.S. Does anyone think it is reasonable for a ROS2 build to take 25GB!? Not everyone has tons of space available in their VMs on their laptops ... and a native macOS build appears to be "non-trivial" as well.) |
From the logs, it seems that the gtest version being used is the one shipped with ROS 2, but that's 1.10 too (it was recently updated). We were using 1.8 before, and some things were deprecated between both gtest versions.
Yeah, it sounds like a lot, but I'm not sure if that can be improved.
I'm not a macOS user, but I agree that we require some "non-trivial" steps (like disabling SIP). |
It looks like there is an upstream build issue related to gtest: ros-simulation/gazebo_ros_pkgs#1183
ROS has a rather large set of build dependencies if you are building from scratch. It's not so much an issue if you are building downstream packages from released binaries/libs. You could try and build the project using foxy instead of ROS2 nightly: docker build --pull -t nav2:latest \
--build-arg=FROM_IMAGE=ros:foxy . |
The test failure in The test failure in |
Keep in mind our PR builder uses the fast-rtps, and this is an occasionally occurring, but rare flaky test. However on Cyclone, this appears deterministically. So from an external user perspective, Cyclone has a problem since the behavior isn't consistent across both of them where they should be consistent. The other option is that fast-rtps doesn't handle something properly that Cyclone does and why that occurs, but some notes on how to resolve would be appreciated. The costmap downsampler you pretty clearly laid out how to do it, but for the lifecycle manager, I don't see a clear solution. Any PRs to resolve these would also be appreciated - especially since most nav2 users are on Cyclone |
First ever Cyclone successful CI job, thanks @dennis-adlink @eboasson for looking into it for us. |
Nightly CI tests are failing for Cyclone DDS, as the
release_test-rmw_cyclonedds_cpp
job has never passed in the history of it's addition to the nightly CI workflow. It would be nice to get this fixed so the navigation2 project could firewatch and support more alternative RMW implementations.Bug report
Required Info:
rmw_cyclonedds_cpp
Steps to reproduce issue
Expected behavior
https://app.circleci.com/pipelines/github/ros-planning/navigation2/4004/workflows/8ebf1205-a43d-4fbe-bdc3-f15c6c605e05/jobs/15392
Actual behavior
https://app.circleci.com/pipelines/github/ros-planning/navigation2/4004/workflows/8ebf1205-a43d-4fbe-bdc3-f15c6c605e05/jobs/15391
Additional information
batman signal to
rmw_cyclonedds_cpp
maintainers: 📢CC @eboasson @rotu @hidmic @ivanpauno
The text was updated successfully, but these errors were encountered: