Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rosout logging to rclrs #422

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

geoff-imdex
Copy link

@geoff-imdex geoff-imdex commented Oct 21, 2024

The purpose of this PR is to add logging macros that replicate the logging interface provided by the rclcpp and rclpy interfaces, while aiming to provide an idiomatic rust interface.

This work is based on the initial work from here and here.

Examples of the interfaces provided:

log_debug!(&node.name(), "Simple message");
log_debug!(&node.name(), "Simple message {some_variable}");
log_fatal!(&node.name(), "Simple message from {}", node.name());
log_warn!(&node.name(), LogConditions::once(), "Only log this the first time");
log_error!(&node.name(),
           LogConditions::skip_first_throttle(Duration::from_millis(1000)),
           "Noisy error that we expect the first time");

log_info!(&node.name(), LogConditions { occurs: LoggingOccurrence::Always,
                                        publish_interval: Duration::from_millis(1000),
                                        log_if_true: count % 10 == 0, },
          "Manually constructed LogConditions");

The macros make use of the LogConditions struct which provides the control over the logging, e.g. throttling, occurrence (once, skip first), etc.
The most common control constructs can be instantiated using the provided builder functions, e.g.

/// Only log the message on the first occurrence
LogConditions::once() 

/// Only log the message at the specified duration
LogConditions::throttle(Duration)

/// Only log the message if the condition is true.
LogConditions::log_if_true(<some expression>)

Known Issues

1. The is currently an issue on node closure (as we process the rcl_node_fini call) where the "impl" pointer appears to be "null" when we try too fini the rosout publisher (in rcl_logging_rosout_fini_publisher_for_node). The rest of the node closure occurs successfully. Note: the rosout usage can be disabled with --ros-args --disable-rosout-logs and the error will not occur.
This issue has been addressed in the latest commit. I believe that the issue was caused by the rcl_node variable in builder.rs going out of scope during the "drop" sequence. The fix is to allocate the rcl_node on the heap (using Box).
(As a side note: technically, we may have the same issue with the Context object).
2. We use static variables to facilitate the "once", "skip_first", and throttling functionality, however, I believe it would be preferred
to use the lazy_static. At this moment, our rust compiler is 1.79 and therefore the lazy_static is not regarded as stable.

@jhdcs jhdcs requested a review from maspe36 October 23, 2024 11:45
@esteve
Copy link
Collaborator

esteve commented Oct 30, 2024

@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at r2r and at rules_ros2, it'd be good to have some feedback from the authors of r2r, and maybe we could a shared crate with this code so as to not duplicate work. Legally speaking we can copy that code (both r2r and ros2-rust are license as Apache 2.0), but it's always nice to ask.

@m-dahl @kristoferB is it ok with you guys if we copy this code from r2r into ros2-rust? We'd of course attribute credit. Moreover, would you be interested in spinning off the logging code of r2r into a shared crate that both projects can use?

///
/// [1]: <https://doc.rust-lang.org/reference/destructors.html>
pub(crate) struct NodeHandle {
pub(crate) rcl_node: Mutex<rcl_node_t>,
pub(crate) rcl_node: Mutex<Box<rcl_node_t>>,
Copy link
Collaborator

@mxgrey mxgrey Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I check why this was put into a Box? The contents of rcl_node_t are just pointers so there shouldn't be any issues with moving the address of the rcl_node_t (and its parent struct is managed by an Arc, so its address shouldn't change anyway).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - without this change it appears as though the node handle goes out of scope before we get chance to clear the logging publisher (in humble this occurs in the rcl_node_fini() call). The behaviour without the change is that you will get a segfault on node closure because the object pointed to by the rcl_node_t is no longer valid. Took a bit of debugging to get to the bottom of it - and honestly, I don't particularly like the change with the extra de-refs.
Looking at Jazzy, I believe that the issue will no longer occur due the logging publisher being managed outside the rcl_node_init/fini calls.
For reference the failure occurs here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think we do want to keep supporting humble for now, so I'll try to reproduce the seg fault in humble and see if there may be alternative solution.

I'm still pretty lost on how boxing the rcl_node_t would affect its lifetime, but if I can reproduce the problem then I should be able to make sense of it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually if rcl itself is storing the rcl_node_t* value somewhere and using it without consideration for its lifetime then I can see how this would happen. I'll check if that's the case, and maybe we can set things up so that we don't initialize the rcl_node_t until it's already inside the NodeHandle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's it in a nutshell - the rosout code keeps a record of the node used to create it (this line specifically) and this pointed to the stack variable in NodeHandle (here). During the rcl_node_fini we call rcl_logging_rosout_fini_publisher_for_node which tries to reference the stack pointer, however, this area of memory it points to has now been mangled, i.e. the pointer still points to the same memory location but the values at that spot are garbage.

Comment on lines 6 to 8
use std::ffi::CString;
use std::sync::Mutex;
use std::time::Duration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use std::ffi::CString;
use std::sync::Mutex;
use std::time::Duration;
use std::{ffi::CString, sync::Mutex, time::Duration};

Comment on lines +199 to +202
/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence};
/// use std::sync::Mutex;
/// use std::time::Duration;
/// use std::env;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condense the use declarations here, and hide them from the output docs

Suggested change
/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence};
/// use std::sync::Mutex;
/// use std::time::Duration;
/// use std::env;
/// # use std::{env, sync::Mutex, time::Duration};
/// # use rclrs::*;
///
/// # let context = Context::new(env::args()).unwrap();
/// # let node = Node::new(&context, "test_node").unwrap();

The # is used to hide lines from the output documentation but also allows the lines to be executed so the doc test will pass.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the tip.

///
/// [1]: <https://doc.rust-lang.org/reference/destructors.html>
pub(crate) struct NodeHandle {
pub(crate) rcl_node: Mutex<rcl_node_t>,
pub(crate) rcl_node: Mutex<Box<rcl_node_t>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this code out locally, reverted this change, and I was able to pass all the tests and run the examples. What problem specifically were you seeing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replied above but as a minimal example of how I was able to cause the segfault:

use rclrs::{
    log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence, Node,
};
static NODE_NAME: &str = "test_node";

// Minimal test code
use std::env;
use std::thread::sleep;
fn main() -> Result<(), rclrs::RclrsError> {
    log_warn!(NODE_NAME, "Starting");
    let context = rclrs::Context::new(env::args()).unwrap();
    let node = rclrs::Node::new(&context, NODE_NAME).unwrap();
    sleep(Duration::from_secs(1));
 
    log_warn!(NODE_NAME, "Finishing");
    Ok(())
}

Note: some of the use statements are unnecessary in the code - I've just hacked up an existing test app. We also had a similar test in our CI that showed the same behaviour.
Happy to go in to more detail if needed.

@mxgrey
Copy link
Collaborator

mxgrey commented Oct 30, 2024

In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden).

In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging).

Would we be interested in having a Logger struct similar to rclcpp's? I could draft that up.

I do like the LogConditions of this implementation, though. That's a very nice mechanism.

.expect("Valid c style string required, e.g. check for extraneous null characters");
let severity = severity.to_native();

let _guard = LOG_GUARD.lock().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is rclrs should use ENTITY_LIFECYCLE_MUTEX, which is the mutex that rclrs uses to ensure that global variables within the rcl layer and lower are protected.

rcl logging is notoriously not thread safe, so it's important to make sure we're locking appropriately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. The thought had crossed my mind on this one - I thought it more akin to the mutex in rclpp logging, but I'm happy to change it to the ENTITY_LIFECYCLE_MUTEX

// sequence is unnecessarily complex
let allocator = rcutils_get_default_allocator();

rcl_logging_configure(&rcl_context.global_arguments, &allocator).ok()
Copy link
Collaborator

@mxgrey mxgrey Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get called every time a Context is created, but it's meant to be called only once. It's possible for users to create multiple Contexts in one application (we do this a lot in tests), so we'd be making redundant calls.

Similar to ros2/rclcpp#998 we should guard this from being called multiple times. We also need to make sure that rcl_logging_fini is only called after all Contexts are destructed.

This is a tricky problem to resolve in a sound way. I might suggest something like this:

struct LoggingConfiguration {
    lifecycle: Mutex<Weak<LoggingLifecycle>>
}

impl LoggingConfiguration {
    fn configure(args: &rcl_arguments_t) -> Result<Arc<LoggingLifecycle>, RclError> {
        static CONFIGURATION: LazyLock<Self> = LazyLock::new(|| Self { lifecycle: Mutex::new(Weak::new()) });
        let mut lifecycle = CONFIGURATION.lifecycle.lock().unwrap();
        if let Some(arc_lifecycle) = lifecycle.upgrade() {
            return Ok(arc_lifecycle);
        }
        let arc_lifecycle = Arc::new(LoggingLifecycle::new(args)?);
        *lifecycle = arc_lifecycle.downgrade();
        Ok(arc_lifecycle);
    }
}

struct LoggingLifecycle;
impl LoggingLifecycle {
    fn new(args: &rcl_arguments_t) -> Result<Self, RclError> {
        let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap();
        let allocator = rcutils_get_default_allocator();
        rcl_logging_configure(args, &allocator).ok()?;
        Self
    }
}

impl Drop for LoggingLifecycle {
    fn drop(&mut self) {
        let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap();
        rcl_logging_fini();
    }
}

Then we would add an Arc<LoggingLifecycle> to the ContextHandle. Once all ContextHandles are gone, the LoggingLifecycle will drop and call rcl_logging_fini. If a new context is made after that, rcl_logging_configure will be called again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good pick-up and suggestion @mxgrey - thank you. Will update the code to make use of your suggestion.

@m-dahl
Copy link
Collaborator

m-dahl commented Oct 30, 2024

@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at r2r and at rules_ros2, it'd be good to have some feedback from the authors of r2r, and maybe we could a shared crate with this code so as to not duplicate work. Legally speaking we can copy that code (both r2r and ros2-rust are license as Apache 2.0), but it's always nice to ask.

@m-dahl @kristoferB is it ok with you guys if we copy this code from r2r into ros2-rust? We'd of course attribute credit. Moreover, would you be interested in spinning off the logging code of r2r into a shared crate that both projects can use?

Thanks for the mention, happy to see it used and improved. In the long term I would be in favor of a shared crate based on this work (perhaps it could be a starting point to for more code sharing), but right now I don't have much time to work on ros stuff so for now I think you should just go ahead and do what makes most sense for this crate.

@geoff-imdex
Copy link
Author

In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden).

As you can tell, I avoided this extra complexity with this implementation as I wasn't sure (naivety/ignorance) what benefit/use it would bring and was a little outside the scope of what I needed.

In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging).

It's not obvious, but the filtering occurs in the rcutils_log function. I'm sure you're across this already, but just to be sure: the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it. It does mean that we do a certain amount of function calls for levels that aren't enabled, however, they're never published. The rclcpp implementation uses some very smart macro construction to include logs at compile time, but I'm not sure we would (or could?) want to do the same thing.

Would we be interested in having a Logger struct similar to rclcpp's? I could draft that up.

I do like the LogConditions of this implementation, though. That's a very nice mechanism.
Thanks!

@geoff-imdex
Copy link
Author

@mxgrey , @maspe36 , really appreciate you taking the time to review the code and provide some helpful feedback. I'll try and incorporate the suggestions over the next few days or so.
As to the segfault we observed locally, happy to provide more details as needed - would be interesting to note if you still the same behaviour with a minimal test app (create node + context, and shutdown).

@mxgrey
Copy link
Collaborator

mxgrey commented Nov 1, 2024

the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it.

I've never dug into how logging works at the rcl layer, so I actually didn't know this detail of the implementation. Thanks for highlighting this.

That actually makes me feel much more strongly that we should have a firm Logger struct rather than using &str. The current API in the PR would allow rclrs users to pass arbitrary names into the macros which might not be names of actual loggers that rcl recognizes, and it sounds like that could lead to misbehavior in the rcl layer. If we have the macros require a Logger object to be passed in then we can enforce that the user is always using a valid logger that rcl will know about.

I have some ideas for how to give rclrs a Logger struct that aligns with rclcpp's while also covering the capabilities of LogCondition, but implementing the ideas will take about the same amount of time as explaining them so I'll open a PR for that which targets your branch if you don't mind @geoff-imdex .

@geoff-imdex
Copy link
Author

That actually makes me feel much more strongly that we should have a firm Logger struct rather than using &str. The current API in the PR would allow rclrs users to pass arbitrary names into the macros which might not be names of actual loggers that rcl recognizes, and it sounds like that could lead to misbehavior in the rcl layer. If we have the macros require a Logger object to be passed in then we can enforce that the user is always using a valid logger that rcl will know about.

Not disagreeing with the suggestion to create a logging struct, but it is worth highlighting that the rcl logging interface explicitly allows passing in arbitrary log names, e.g. the logging macros throughout the rcl code pass in a simple string. Perhaps a better way to phrase - the logging system supports existing without requiring a node to be associated - in this scenario the rcl layer uses the default log values (INFO level and logging to stdout/stderr). It is only when we want to publish to /rosout that the logging is associated with a node (via the implementation in logging_rosout.c).
To be clear, I think the suggestion to create a logging struct will add a lot of value for the nodes that use the logging interface.

I have some ideas for how to give rclrs a Logger struct that aligns with rclcpp's while also covering the capabilities of LogCondition, but implementing the ideas will take about the same amount of time as explaining them so I'll open a PR for that which targets your branch if you don't mind @geoff-imdex .
Please feel free! And thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants