-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-26025: Sanitize UTF-8 strings in process signals #1857
Conversation
This method replaces every invalid UTF-8 character with a '?'
collector/lib/Utility.h
Outdated
@@ -108,6 +108,36 @@ ScopedLock<Mutex> Lock(Mutex& mutex) { | |||
#define ssizeof(x) static_cast<ssize_t>(sizeof(x)) | |||
|
|||
std::optional<std::string_view> ExtractContainerIDFromCgroup(std::string_view cgroup); | |||
|
|||
// A Holder refers to an object that it can own, or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a lifetime nightmare. Can't we do something similar to what protobuf does and simply use pointers? Or maybe make SanitizeUTF8
return a std::optional<std::string>
? That way if the return value is empty it would mean the original string is valid and, if it is not empty, it will hold the sanitized string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, I wasn't convinced myself.
The problem is with the immutability of std::string, which makes you copy and copy again. I will figure out a more maintainable way.
4cb0181
to
558f363
Compare
558f363
to
6226563
Compare
Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Replace every byte of an invalid UTF-8 sequence with '?' Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
* ROX-26025: Sanitize UTF-8 strings in process signals (#1857) Replace every byte of an invalid UTF-8 sequence with '?' * Pin ansible version to <2.17 to avoid loop bug (#1837) --------- Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]> Co-authored-by: Giles Hutton <[email protected]>
Description
Containers can produce non-UTF8 process names or paths. We want to prevent this from causing issues in protobuf, which doesn't like it at all.
As a quick workaround, we reuse
google::protobuf::internal::UTF8CoerceToStructurallyValid
to replace any invalid sequence with '?'.In order to minimize string copies in the "no modifications required" case, we introduce a Holder<> class to either point to the original unmodified string, or to an altered copy of it.
Checklist