Add rosout logging to rclrs #422

geoff-imdex · 2024-10-21T03:44:59Z

The purpose of this PR is to add logging macros that replicate the logging interface provided by the rclcpp and rclpy interfaces, while aiming to provide an idiomatic rust interface.

This work is based on the initial work from here and here.

Examples of the interfaces provided:

log_debug!(&node.name(), "Simple message");
log_debug!(&node.name(), "Simple message {some_variable}");
log_fatal!(&node.name(), "Simple message from {}", node.name());
log_warn!(&node.name(), LogConditions::once(), "Only log this the first time");
log_error!(&node.name(),
           LogConditions::skip_first_throttle(Duration::from_millis(1000)),
           "Noisy error that we expect the first time");

log_info!(&node.name(), LogConditions { occurs: LoggingOccurrence::Always,
                                        publish_interval: Duration::from_millis(1000),
                                        log_if_true: count % 10 == 0, },
          "Manually constructed LogConditions");

The macros make use of the LogConditions struct which provides the control over the logging, e.g. throttling, occurrence (once, skip first), etc.
The most common control constructs can be instantiated using the provided builder functions, e.g.

/// Only log the message on the first occurrence
LogConditions::once() 

/// Only log the message at the specified duration
LogConditions::throttle(Duration)

/// Only log the message if the condition is true.
LogConditions::log_if_true(<some expression>)

Known Issues

1. The is currently an issue on node closure (as we process the rcl_node_fini call) where the "impl" pointer appears to be "null" when we try too fini the rosout publisher (in rcl_logging_rosout_fini_publisher_for_node). The rest of the node closure occurs successfully. Note: the rosout usage can be disabled with --ros-args --disable-rosout-logs and the error will not occur.
This issue has been addressed in the latest commit. I believe that the issue was caused by the rcl_node variable in builder.rs going out of scope during the "drop" sequence. The fix is to allocate the rcl_node on the heap (using Box).
(As a side note: technically, we may have the same issue with the Context object).
2. We use static variables to facilitate the "once", "skip_first", and throttling functionality, however, I believe it would be preferred
to use the lazy_static. At this moment, our rust compiler is 1.79 and therefore the lazy_static is not regarded as stable.

esteve · 2024-10-30T12:30:23Z

@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at r2r and at rules_ros2, it'd be good to have some feedback from the authors of r2r, and maybe we could a shared crate with this code so as to not duplicate work. Legally speaking we can copy that code (both r2r and ros2-rust are license as Apache 2.0), but it's always nice to ask.

@m-dahl @kristoferB is it ok with you guys if we copy this code from r2r into ros2-rust? We'd of course attribute credit. Moreover, would you be interested in spinning off the logging code of r2r into a shared crate that both projects can use?

mxgrey · 2024-10-30T12:54:19Z

rclrs/src/node.rs

 ///
 /// [1]: <https://doc.rust-lang.org/reference/destructors.html>
 pub(crate) struct NodeHandle {
-    pub(crate) rcl_node: Mutex<rcl_node_t>,
+    pub(crate) rcl_node: Mutex<Box<rcl_node_t>>,


Can I check why this was put into a Box? The contents of rcl_node_t are just pointers so there shouldn't be any issues with moving the address of the rcl_node_t (and its parent struct is managed by an Arc, so its address shouldn't change anyway).

Good question - without this change it appears as though the node handle goes out of scope before we get chance to clear the logging publisher (in humble this occurs in the rcl_node_fini() call). The behaviour without the change is that you will get a segfault on node closure because the object pointed to by the rcl_node_t is no longer valid. Took a bit of debugging to get to the bottom of it - and honestly, I don't particularly like the change with the extra de-refs.
Looking at Jazzy, I believe that the issue will no longer occur due the logging publisher being managed outside the rcl_node_init/fini calls.
For reference the failure occurs here

Thanks for the explanation. I think we do want to keep supporting humble for now, so I'll try to reproduce the seg fault in humble and see if there may be alternative solution.

I'm still pretty lost on how boxing the rcl_node_t would affect its lifetime, but if I can reproduce the problem then I should be able to make sense of it.

Oh actually if rcl itself is storing the rcl_node_t* value somewhere and using it without consideration for its lifetime then I can see how this would happen. I'll check if that's the case, and maybe we can set things up so that we don't initialize the rcl_node_t until it's already inside the NodeHandle.

Yep, that's it in a nutshell - the rosout code keeps a record of the node used to create it (this line specifically) and this pointed to the stack variable in NodeHandle (here). During the rcl_node_fini we call rcl_logging_rosout_fini_publisher_for_node which tries to reference the stack pointer, however, this area of memory it points to has now been mangled, i.e. the pointer still points to the same memory location but the values at that spot are garbage.

maspe36 · 2024-10-27T15:53:01Z

rclrs/src/logging.rs

+use std::ffi::CString;
+use std::sync::Mutex;
+use std::time::Duration;


Suggested change

use std::ffi::CString;

use std::sync::Mutex;

use std::time::Duration;

use std::{ffi::CString, sync::Mutex, time::Duration};

maspe36 · 2024-10-27T16:08:29Z

rclrs/src/logging.rs

+/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence};
+/// use std::sync::Mutex;
+/// use std::time::Duration;
+/// use std::env;


Condense the use declarations here, and hide them from the output docs

Suggested change

/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence};

/// use std::sync::Mutex;

/// use std::time::Duration;

/// use std::env;

/// # use std::{env, sync::Mutex, time::Duration};

/// # use rclrs::*;

///

/// # let context = Context::new(env::args()).unwrap();

/// # let node = Node::new(&context, "test_node").unwrap();

The # is used to hide lines from the output documentation but also allows the lines to be executed so the doc test will pass.

Nice! Thanks for the tip.

maspe36 · 2024-10-27T16:13:09Z

rclrs/src/node.rs

 ///
 /// [1]: <https://doc.rust-lang.org/reference/destructors.html>
 pub(crate) struct NodeHandle {
-    pub(crate) rcl_node: Mutex<rcl_node_t>,
+    pub(crate) rcl_node: Mutex<Box<rcl_node_t>>,


I checked this code out locally, reverted this change, and I was able to pass all the tests and run the examples. What problem specifically were you seeing?

I've replied above but as a minimal example of how I was able to cause the segfault:

use rclrs::{ log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence, Node, }; static NODE_NAME: &str = "test_node"; // Minimal test code use std::env; use std::thread::sleep; fn main() -> Result<(), rclrs::RclrsError> { log_warn!(NODE_NAME, "Starting"); let context = rclrs::Context::new(env::args()).unwrap(); let node = rclrs::Node::new(&context, NODE_NAME).unwrap(); sleep(Duration::from_secs(1)); log_warn!(NODE_NAME, "Finishing"); Ok(()) }

Note: some of the use statements are unnecessary in the code - I've just hacked up an existing test app. We also had a similar test in our CI that showed the same behaviour.
Happy to go in to more detail if needed.

mxgrey · 2024-10-30T14:52:39Z

In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden).

In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging).

Would we be interested in having a Logger struct similar to rclcpp's? I could draft that up.

I do like the LogConditions of this implementation, though. That's a very nice mechanism.

mxgrey · 2024-10-30T15:00:03Z

rclrs/src/logging.rs

+        .expect("Valid c style string required, e.g. check for extraneous null characters");
+    let severity = severity.to_native();
+
+    let _guard = LOG_GUARD.lock().unwrap();


Since this is rclrs should use ENTITY_LIFECYCLE_MUTEX, which is the mutex that rclrs uses to ensure that global variables within the rcl layer and lower are protected.

rcl logging is notoriously not thread safe, so it's important to make sure we're locking appropriately.

Good comment. The thought had crossed my mind on this one - I thought it more akin to the mutex in rclpp logging, but I'm happy to change it to the ENTITY_LIFECYCLE_MUTEX

mxgrey · 2024-10-30T15:25:45Z

rclrs/src/context.rs

+                // sequence is unnecessarily complex
+                let allocator = rcutils_get_default_allocator();
+
+                rcl_logging_configure(&rcl_context.global_arguments, &allocator).ok()


This will get called every time a Context is created, but it's meant to be called only once. It's possible for users to create multiple Contexts in one application (we do this a lot in tests), so we'd be making redundant calls.

Similar to ros2/rclcpp#998 we should guard this from being called multiple times. We also need to make sure that rcl_logging_fini is only called after all Contexts are destructed.

This is a tricky problem to resolve in a sound way. I might suggest something like this:

struct LoggingConfiguration { lifecycle: Mutex<Weak<LoggingLifecycle>> } impl LoggingConfiguration { fn configure(args: &rcl_arguments_t) -> Result<Arc<LoggingLifecycle>, RclError> { static CONFIGURATION: LazyLock<Self> = LazyLock::new(|| Self { lifecycle: Mutex::new(Weak::new()) }); let mut lifecycle = CONFIGURATION.lifecycle.lock().unwrap(); if let Some(arc_lifecycle) = lifecycle.upgrade() { return Ok(arc_lifecycle); } let arc_lifecycle = Arc::new(LoggingLifecycle::new(args)?); *lifecycle = arc_lifecycle.downgrade(); Ok(arc_lifecycle); } } struct LoggingLifecycle; impl LoggingLifecycle { fn new(args: &rcl_arguments_t) -> Result<Self, RclError> { let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap(); let allocator = rcutils_get_default_allocator(); rcl_logging_configure(args, &allocator).ok()?; Self } } impl Drop for LoggingLifecycle { fn drop(&mut self) { let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap(); rcl_logging_fini(); } }

Then we would add an Arc<LoggingLifecycle> to the ContextHandle. Once all ContextHandles are gone, the LoggingLifecycle will drop and call rcl_logging_fini. If a new context is made after that, rcl_logging_configure will be called again.

Really good pick-up and suggestion @mxgrey - thank you. Will update the code to make use of your suggestion.

m-dahl · 2024-10-30T18:12:33Z

@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at r2r and at rules_ros2, it'd be good to have some feedback from the authors of r2r, and maybe we could a shared crate with this code so as to not duplicate work. Legally speaking we can copy that code (both r2r and ros2-rust are license as Apache 2.0), but it's always nice to ask.

@m-dahl @kristoferB is it ok with you guys if we copy this code from r2r into ros2-rust? We'd of course attribute credit. Moreover, would you be interested in spinning off the logging code of r2r into a shared crate that both projects can use?

Thanks for the mention, happy to see it used and improved. In the long term I would be in favor of a shared crate based on this work (perhaps it could be a starting point to for more code sharing), but right now I don't have much time to work on ros stuff so for now I think you should just go ahead and do what makes most sense for this crate.

geoff-imdex · 2024-10-31T13:45:19Z

In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden).

As you can tell, I avoided this extra complexity with this implementation as I wasn't sure (naivety/ignorance) what benefit/use it would bring and was a little outside the scope of what I needed.

In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging).

It's not obvious, but the filtering occurs in the rcutils_log function. I'm sure you're across this already, but just to be sure: the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it. It does mean that we do a certain amount of function calls for levels that aren't enabled, however, they're never published. The rclcpp implementation uses some very smart macro construction to include logs at compile time, but I'm not sure we would (or could?) want to do the same thing.

Would we be interested in having a Logger struct similar to rclcpp's? I could draft that up.

I do like the LogConditions of this implementation, though. That's a very nice mechanism.
Thanks!

geoff-imdex · 2024-10-31T13:54:24Z

@mxgrey , @maspe36 , really appreciate you taking the time to review the code and provide some helpful feedback. I'll try and incorporate the suggestions over the next few days or so.
As to the segfault we observed locally, happy to provide more details as needed - would be interesting to note if you still the same behaviour with a minimal test app (create node + context, and shutdown).

mxgrey · 2024-11-01T06:56:56Z

the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it.

I've never dug into how logging works at the rcl layer, so I actually didn't know this detail of the implementation. Thanks for highlighting this.

That actually makes me feel much more strongly that we should have a firm Logger struct rather than using &str. The current API in the PR would allow rclrs users to pass arbitrary names into the macros which might not be names of actual loggers that rcl recognizes, and it sounds like that could lead to misbehavior in the rcl layer. If we have the macros require a Logger object to be passed in then we can enforce that the user is always using a valid logger that rcl will know about.

I have some ideas for how to give rclrs a Logger struct that aligns with rclcpp's while also covering the capabilities of LogCondition, but implementing the ideas will take about the same amount of time as explaining them so I'll open a PR for that which targets your branch if you don't mind @geoff-imdex .

geoff-imdex · 2024-11-01T11:33:26Z

That actually makes me feel much more strongly that we should have a firm Logger struct rather than using &str. The current API in the PR would allow rclrs users to pass arbitrary names into the macros which might not be names of actual loggers that rcl recognizes, and it sounds like that could lead to misbehavior in the rcl layer. If we have the macros require a Logger object to be passed in then we can enforce that the user is always using a valid logger that rcl will know about.

Not disagreeing with the suggestion to create a logging struct, but it is worth highlighting that the rcl logging interface explicitly allows passing in arbitrary log names, e.g. the logging macros throughout the rcl code pass in a simple string. Perhaps a better way to phrase - the logging system supports existing without requiring a node to be associated - in this scenario the rcl layer uses the default log values (INFO level and logging to stdout/stderr). It is only when we want to publish to /rosout that the logging is associated with a node (via the implementation in logging_rosout.c).
To be clear, I think the suggestion to create a logging struct will add a lot of value for the nodes that use the logging interface.

I have some ideas for how to give rclrs a Logger struct that aligns with rclcpp's while also covering the capabilities of LogCondition, but implementing the ideas will take about the same amount of time as explaining them so I'll open a PR for that which targets your branch if you don't mind @geoff-imdex .
Please feel free! And thank you!

geoff-imdex added 2 commits October 11, 2024 09:12

Initial logging implementation

239dab3

* Moved logging_fini to follow correct call sequence based on rclcpp

6ad25da

geoff-imdex mentioned this pull request Oct 21, 2024

Add support for logging #184

Open

Primary change to fix builder's node variable going out of scope

d5a6bc4

jhdcs requested a review from maspe36 October 23, 2024 11:45

geoff-imdex added 2 commits October 28, 2024 06:56

Merge branch 'ros2-rust:main' into main

19e8d3d

Fix linting issue with use statement

673d478

mxgrey reviewed Oct 30, 2024

View reviewed changes

maspe36 reviewed Oct 30, 2024

View reviewed changes

mxgrey reviewed Oct 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rosout logging to rclrs #422

Add rosout logging to rclrs #422

geoff-imdex commented Oct 21, 2024 •

edited

Loading

esteve commented Oct 30, 2024

mxgrey Oct 30, 2024 •

edited

Loading

geoff-imdex Oct 31, 2024

mxgrey Oct 31, 2024

mxgrey Oct 31, 2024

geoff-imdex Nov 1, 2024

maspe36 Oct 27, 2024

maspe36 Oct 27, 2024

geoff-imdex Oct 31, 2024

maspe36 Oct 27, 2024

geoff-imdex Oct 31, 2024

mxgrey commented Oct 30, 2024 •

edited

Loading

mxgrey Oct 30, 2024

geoff-imdex Oct 31, 2024

mxgrey Oct 30, 2024 •

edited

Loading

geoff-imdex Oct 31, 2024

m-dahl commented Oct 30, 2024

geoff-imdex commented Oct 31, 2024

geoff-imdex commented Oct 31, 2024

mxgrey commented Nov 1, 2024 •

edited

Loading

geoff-imdex commented Nov 1, 2024

Add rosout logging to rclrs #422

Are you sure you want to change the base?

Add rosout logging to rclrs #422

Conversation

geoff-imdex commented Oct 21, 2024 • edited Loading

Known Issues

esteve commented Oct 30, 2024

mxgrey Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxgrey commented Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxgrey Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-dahl commented Oct 30, 2024

geoff-imdex commented Oct 31, 2024

geoff-imdex commented Oct 31, 2024

mxgrey commented Nov 1, 2024 • edited Loading

geoff-imdex commented Nov 1, 2024

geoff-imdex commented Oct 21, 2024 •

edited

Loading

mxgrey Oct 30, 2024 •

edited

Loading

mxgrey commented Oct 30, 2024 •

edited

Loading

mxgrey Oct 30, 2024 •

edited

Loading

mxgrey commented Nov 1, 2024 •

edited

Loading