-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rosout logging to rclrs #422
base: main
Are you sure you want to change the base?
Conversation
@geoff-imdex thank you so much for your PR. Given that the code is based on the work done at @m-dahl @kristoferB is it ok with you guys if we copy this code from |
/// | ||
/// [1]: <https://doc.rust-lang.org/reference/destructors.html> | ||
pub(crate) struct NodeHandle { | ||
pub(crate) rcl_node: Mutex<rcl_node_t>, | ||
pub(crate) rcl_node: Mutex<Box<rcl_node_t>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I check why this was put into a Box? The contents of rcl_node_t
are just pointers so there shouldn't be any issues with moving the address of the rcl_node_t
(and its parent struct is managed by an Arc, so its address shouldn't change anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - without this change it appears as though the node handle goes out of scope before we get chance to clear the logging publisher (in humble this occurs in the rcl_node_fini() call). The behaviour without the change is that you will get a segfault on node closure because the object pointed to by the rcl_node_t is no longer valid. Took a bit of debugging to get to the bottom of it - and honestly, I don't particularly like the change with the extra de-refs.
Looking at Jazzy, I believe that the issue will no longer occur due the logging publisher being managed outside the rcl_node_init/fini calls.
For reference the failure occurs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think we do want to keep supporting humble for now, so I'll try to reproduce the seg fault in humble and see if there may be alternative solution.
I'm still pretty lost on how boxing the rcl_node_t
would affect its lifetime, but if I can reproduce the problem then I should be able to make sense of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually if rcl itself is storing the rcl_node_t*
value somewhere and using it without consideration for its lifetime then I can see how this would happen. I'll check if that's the case, and maybe we can set things up so that we don't initialize the rcl_node_t
until it's already inside the NodeHandle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's it in a nutshell - the rosout code keeps a record of the node used to create it (this line specifically) and this pointed to the stack variable in NodeHandle (here). During the rcl_node_fini
we call rcl_logging_rosout_fini_publisher_for_node
which tries to reference the stack pointer, however, this area of memory it points to has now been mangled, i.e. the pointer still points to the same memory location but the values at that spot are garbage.
rclrs/src/logging.rs
Outdated
use std::ffi::CString; | ||
use std::sync::Mutex; | ||
use std::time::Duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use std::ffi::CString; | |
use std::sync::Mutex; | |
use std::time::Duration; | |
use std::{ffi::CString, sync::Mutex, time::Duration}; |
/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence}; | ||
/// use std::sync::Mutex; | ||
/// use std::time::Duration; | ||
/// use std::env; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condense the use declarations here, and hide them from the output docs
/// use rclrs::{log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence}; | |
/// use std::sync::Mutex; | |
/// use std::time::Duration; | |
/// use std::env; | |
/// # use std::{env, sync::Mutex, time::Duration}; | |
/// # use rclrs::*; | |
/// | |
/// # let context = Context::new(env::args()).unwrap(); | |
/// # let node = Node::new(&context, "test_node").unwrap(); |
The #
is used to hide lines from the output documentation but also allows the lines to be executed so the doc test will pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the tip.
/// | ||
/// [1]: <https://doc.rust-lang.org/reference/destructors.html> | ||
pub(crate) struct NodeHandle { | ||
pub(crate) rcl_node: Mutex<rcl_node_t>, | ||
pub(crate) rcl_node: Mutex<Box<rcl_node_t>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this code out locally, reverted this change, and I was able to pass all the tests and run the examples. What problem specifically were you seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replied above but as a minimal example of how I was able to cause the segfault:
use rclrs::{
log_debug, log_error, log_fatal, log_info, log_warn, LogConditions, LoggingOccurrence, Node,
};
static NODE_NAME: &str = "test_node";
// Minimal test code
use std::env;
use std::thread::sleep;
fn main() -> Result<(), rclrs::RclrsError> {
log_warn!(NODE_NAME, "Starting");
let context = rclrs::Context::new(env::args()).unwrap();
let node = rclrs::Node::new(&context, NODE_NAME).unwrap();
sleep(Duration::from_secs(1));
log_warn!(NODE_NAME, "Finishing");
Ok(())
}
Note: some of the use statements are unnecessary in the code - I've just hacked up an existing test app. We also had a similar test in our CI that showed the same behaviour.
Happy to go in to more detail if needed.
In rclcpp there's a Logger class that bundles up parameters like the name of the logger and severity filter. It also allows hierarchical structures where you can create a Logger which is a child of another Logger and it will inherit settings like a name prefix and severity level (which can be overridden). In this PR the name and severity seem to be loose values. In fact I don't see where severity gets filtered at all (e.g. we wouldn't want DEBUG severity being printed out unless the user asks for DEBUG level logging). Would we be interested in having a I do like the |
.expect("Valid c style string required, e.g. check for extraneous null characters"); | ||
let severity = severity.to_native(); | ||
|
||
let _guard = LOG_GUARD.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is rclrs should use ENTITY_LIFECYCLE_MUTEX
, which is the mutex that rclrs uses to ensure that global variables within the rcl layer and lower are protected.
rcl logging is notoriously not thread safe, so it's important to make sure we're locking appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comment. The thought had crossed my mind on this one - I thought it more akin to the mutex in rclpp logging, but I'm happy to change it to the ENTITY_LIFECYCLE_MUTEX
// sequence is unnecessarily complex | ||
let allocator = rcutils_get_default_allocator(); | ||
|
||
rcl_logging_configure(&rcl_context.global_arguments, &allocator).ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get called every time a Context is created, but it's meant to be called only once. It's possible for users to create multiple Contexts in one application (we do this a lot in tests), so we'd be making redundant calls.
Similar to ros2/rclcpp#998 we should guard this from being called multiple times. We also need to make sure that rcl_logging_fini
is only called after all Contexts are destructed.
This is a tricky problem to resolve in a sound way. I might suggest something like this:
struct LoggingConfiguration {
lifecycle: Mutex<Weak<LoggingLifecycle>>
}
impl LoggingConfiguration {
fn configure(args: &rcl_arguments_t) -> Result<Arc<LoggingLifecycle>, RclError> {
static CONFIGURATION: LazyLock<Self> = LazyLock::new(|| Self { lifecycle: Mutex::new(Weak::new()) });
let mut lifecycle = CONFIGURATION.lifecycle.lock().unwrap();
if let Some(arc_lifecycle) = lifecycle.upgrade() {
return Ok(arc_lifecycle);
}
let arc_lifecycle = Arc::new(LoggingLifecycle::new(args)?);
*lifecycle = arc_lifecycle.downgrade();
Ok(arc_lifecycle);
}
}
struct LoggingLifecycle;
impl LoggingLifecycle {
fn new(args: &rcl_arguments_t) -> Result<Self, RclError> {
let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap();
let allocator = rcutils_get_default_allocator();
rcl_logging_configure(args, &allocator).ok()?;
Self
}
}
impl Drop for LoggingLifecycle {
fn drop(&mut self) {
let _lifecycle_lock = ENTITY_LIFECYCLE_MUTEX.lock().unwrap();
rcl_logging_fini();
}
}
Then we would add an Arc<LoggingLifecycle>
to the ContextHandle
. Once all ContextHandle
s are gone, the LoggingLifecycle
will drop and call rcl_logging_fini
. If a new context is made after that, rcl_logging_configure
will be called again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good pick-up and suggestion @mxgrey - thank you. Will update the code to make use of your suggestion.
Thanks for the mention, happy to see it used and improved. In the long term I would be in favor of a shared crate based on this work (perhaps it could be a starting point to for more code sharing), but right now I don't have much time to work on ros stuff so for now I think you should just go ahead and do what makes most sense for this crate. |
As you can tell, I avoided this extra complexity with this implementation as I wasn't sure (naivety/ignorance) what benefit/use it would bring and was a little outside the scope of what I needed.
It's not obvious, but the filtering occurs in the rcutils_log function. I'm sure you're across this already, but just to be sure: the rcutils logging code basically has a look-up table that tracks the logger name and what logging level is enabled for it. It does mean that we do a certain amount of function calls for levels that aren't enabled, however, they're never published. The rclcpp implementation uses some very smart macro construction to include logs at compile time, but I'm not sure we would (or could?) want to do the same thing.
|
@mxgrey , @maspe36 , really appreciate you taking the time to review the code and provide some helpful feedback. I'll try and incorporate the suggestions over the next few days or so. |
I've never dug into how logging works at the rcl layer, so I actually didn't know this detail of the implementation. Thanks for highlighting this. That actually makes me feel much more strongly that we should have a firm I have some ideas for how to give rclrs a |
Not disagreeing with the suggestion to create a logging struct, but it is worth highlighting that the rcl logging interface explicitly allows passing in arbitrary log names, e.g. the logging macros throughout the rcl code pass in a simple string. Perhaps a better way to phrase - the logging system supports existing without requiring a node to be associated - in this scenario the rcl layer uses the default log values (INFO level and logging to stdout/stderr). It is only when we want to publish to
|
The purpose of this PR is to add logging macros that replicate the logging interface provided by the
rclcpp
andrclpy
interfaces, while aiming to provide an idiomatic rust interface.This work is based on the initial work from here and here.
Examples of the interfaces provided:
The macros make use of the
LogConditions
struct which provides the control over the logging, e.g. throttling, occurrence (once, skip first), etc.The most common control constructs can be instantiated using the provided
builder
functions, e.g.Known Issues
1. The is currently an issue on node closure (as we process thercl_node_fini
call) where the "impl" pointer appears to be "null" when we try too fini the rosout publisher (inrcl_logging_rosout_fini_publisher_for_node
). The rest of the node closure occurs successfully. Note: the rosout usage can be disabled with--ros-args --disable-rosout-logs
and the error will not occur.This issue has been addressed in the latest commit. I believe that the issue was caused by the
rcl_node
variable in builder.rs going out of scope during the "drop" sequence. The fix is to allocate thercl_node
on the heap (using Box).(As a side note: technically, we may have the same issue with the Context object).
2. We use static variables to facilitate the "once", "skip_first", and throttling functionality, however, I believe it would be preferred
to use the
lazy_static
. At this moment, our rust compiler is 1.79 and therefore the lazy_static is not regarded as stable.