Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::bad_alloc' #214

Open
Ryanf55 opened this issue Dec 13, 2023 · 8 comments
Open

terminate called after throwing an instance of 'std::bad_alloc' #214

Ryanf55 opened this issue Dec 13, 2023 · 8 comments
Assignees

Comments

@Ryanf55
Copy link

Ryanf55 commented Dec 13, 2023

Describe the bug
When running the MicroROS agent, it periodically crashes with std::bad_alloc.

To Reproduce
Steps to reproduce the behavior:

  1. Clone ArduPilot on my branch: https://github.com/Ryanf55/ardupilot/tree/dds-plane-goal-interface
  2. Set up the ArduPilot build environment: https://ardupilot.org/dev/docs/building-the-code.html
  3. Follow the PR instructions to run ardupilot and the micro ROS agent
  4. Wait 2-3 seconds after initialization and observe the runtime crash

Expected behaviour

The agent runs reliably without an allocation error.

System information (please complete the following information):

  • OS: Ubuntu 22.04
  • ROS 2: humble binaries
  • Version: 3.0.5

Additional context

Here's the debug logs at verbosity 6 while running under gdb:

[1702444844.833881] debug    | UDPv4AgentLinux.cpp | recv_message             | [==>> UDP <<==]        | client_key: 0xAAAABBBB, len: 36, data: 
0000: 81 80 17 00 07 01 0C 00 00 50 00 05 01 00 00 00 20 5D 04 33 07 01 0C 00 00 51 00 75 01 00 00 00
0020: 20 5D 04 33
[1702444844.833918] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000000, len: 8, data: 
0000: 01 00 00 00 20 5D 04 33
[1702444844.833930] debug    | DataWriter.cpp     | write                    | [** <<DDS>> **]        | client_key: 0x00000007, len: 8, data: 
0000: 01 00 00 00 20 5D 04 33
[1702444844.833981] debug    | UDPv4AgentLinux.cpp | send_message             | [** <<UDP>> **]        | client_key: 0xAAAABBBB, len: 13, data: 
0000: 81 00 00 00 0A 01 05 00 18 00 00 00 80

Thread 19 "micro_ros_agent" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdcff9640 (LWP 130694)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140736901125696) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140736901125696, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff6c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff6c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff70a2b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff70ae20c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff70ae277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff70ae4d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff70a27ac in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff7ec8915 in ?? () from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#11 0x00007ffff7ec8eb7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::NodeEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#12 0x00007ffff7ec91f7 in rmw_dds_common::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, rmw_dds_common::msg::ParticipantEntitiesInfo_<std::allocator<void> >&) ()
   from /opt/ros/humble/lib/librmw_dds_common__rosidl_typesupport_fastrtps_cpp.so
#13 0x00005555555a02ed in uros::agent::graph_manager::ParticipantEntitiesInfoTypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) ()
#14 0x00007ffff7a2f42a in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#15 0x00007ffff76ed47b in eprosima::fastdds::dds::DataReaderImpl::read_or_take_next_sample(void*, eprosima::fastdds::dds::SampleInfo*, bool) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#16 0x0000555555585888 in uros::agent::graph_manager::GraphManager::update_node_entities_info() ()
#17 0x00005555555868c4 in uros::agent::graph_manager::GraphManager::DatareaderListener::on_data_available(eprosima::fastdds::dds::DataReader*) ()
#18 0x00007ffff76ee84d in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::onNewCacheChangeAdded(eprosima::fastrtps::rtps::RTPSReader*, eprosima::fastrtps::rtps::CacheChange_t const*) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#19 0x00007ffff76716b4 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#20 0x00007ffff7671e5b in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#21 0x00007ffff7672381 in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#22 0x00007ffff767ff20 in eprosima::fastrtps::rtps::MessageReceiver::process_data_message_without_security(eprosima::fastrtps::rtps::EntityId_t const&, eprosima::fastrtps::rtps::CacheChange_t&) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#23 0x00007ffff7689a7b in eprosima::fastrtps::rtps::MessageReceiver::proc_Submsg_Data(eprosima::fastrtps::rtps::CDRMessage_t*, eprosima::fastrtps::rtps::SubmessageHeader_t*) const () from /opt/ros/humble/lib/libfastrtps.so.2.6
#24 0x00007ffff768b6c8 in eprosima::fastrtps::rtps::MessageReceiver::processCDRMsg(eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::CDRMessage_t*) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#25 0x00007ffff769134f in eprosima::fastrtps::rtps::ReceiverResource::OnDataReceived(unsigned char const*, unsigned int, eprosima::fastrtps::rtps::Locator_t const&, eprosima::fastrtps::rtps::Locator_t const&) ()
   from /opt/ros/humble/lib/libfastrtps.so.2.6
#26 0x00007ffff78c7618 in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) () from /opt/ros/humble/lib/libfastrtps.so.2.6
#27 0x00007ffff78c22fb in ?? () from /opt/ros/humble/lib/libfastrtps.so.2.6
#28 0x00007ffff70dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#29 0x00007ffff6c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#30 0x00007ffff6d26660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) Quit
(gdb) 
quit

@pablogs9
Copy link
Member

Hello @Ryanf55, this is a well-known problem and IMO not really a micro-ROS issue.

The key is that the ROS 2 ros_discovery_info topic has a significant change on its type rmw_dds_common::msg::dds_::ParticipantEntitiesInfo_. They have reduced the size of an array from 24 to 16 for this very same topic, making it incompatible between ROS 2 distros:

Specifically, if your Humble installation receives an Iron ros_discovery_info data representation will not be compliant to the Humble deserialization, making its deserialization unpredictable a very likely to throw an exception.

In summary, this is a ROS 2 distro incompatibility issue and shall be solved if you ensure that your Humble environment does not have any interaction with an Iron environment (local or remote).

@Ryanf55
Copy link
Author

Ryanf55 commented Dec 13, 2023

HI Pablo,

thanks for the info. Just FYI, I do not have Iron installed, and there are no other ROS 2 developers on my home network, so I don't think that's the issue. Everything is on humble.

Ardupilot targets ros2 humble only.

@pablogs9
Copy link
Member

You have the very same error that we found some weeks ago.

How are you building the micro-ROS Agent?
Are any docker in your system?

Because of this line #26 0x00007ffff78c7618 in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t) () from /opt/ros/humble/lib/libfastrtps.so.2.6 it seems that the message that raises the error is in your same computer and communicating via shared memory.

@Ryanf55
Copy link
Author

Ryanf55 commented Dec 13, 2023

We are building the micro-ROS with colcon, with the humble branch in our ROS workspace. This is all on host, no docker.

https://github.com/ArduPilot/ardupilot/blob/6515df72f0473b8982f3d25bdade74f5d9df8be3/Tools/ros2/ros2.repos#L9

Fast-dds is installed with the humble binaries.

@pablogs9
Copy link
Member

Can you provide a Dockerfile with a replicator without the Ardupilot part?

@Ryanf55
Copy link
Author

Ryanf55 commented Dec 13, 2023

Can you provide a Dockerfile with a replicator without the Ardupilot part?

I can try. The MicroXRCE DDS Agent is heavily tied to ArduPilot right now; it may be hard to build a standalone example to reproduce.

Would it be acceptable to provide a dockerfile with ArduPilot already built and running? Then you can just run it against MicroROS on your host OS built with debug and run under GDB?

@pablogs9
Copy link
Member

That would be acceptable as far as everything runs inside a Docker.

@Ryanf55
Copy link
Author

Ryanf55 commented Dec 13, 2023

Thanks. Can you assign this ticket to me. I can get you the info a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants