Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-26025: Sanitize UTF-8 strings in process signals #1857

Merged
merged 9 commits into from
Sep 23, 2024

Conversation

ovalenti
Copy link
Contributor

Description

Containers can produce non-UTF8 process names or paths. We want to prevent this from causing issues in protobuf, which doesn't like it at all.

As a quick workaround, we reuse google::protobuf::internal::UTF8CoerceToStructurallyValid to replace any invalid sequence with '?'.

In order to minimize string copies in the "no modifications required" case, we introduce a Holder<> class to either point to the original unmodified string, or to an altered copy of it.

Checklist

  • Investigated and inspected CI test results

This method replaces every invalid UTF-8 character with a '?'
@@ -108,6 +108,36 @@ ScopedLock<Mutex> Lock(Mutex& mutex) {
#define ssizeof(x) static_cast<ssize_t>(sizeof(x))

std::optional<std::string_view> ExtractContainerIDFromCgroup(std::string_view cgroup);

// A Holder refers to an object that it can own, or not.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a lifetime nightmare. Can't we do something similar to what protobuf does and simply use pointers? Or maybe make SanitizeUTF8 return a std::optional<std::string>? That way if the return value is empty it would mean the original string is valid and, if it is not empty, it will hold the sanitized string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, I wasn't convinced myself.

The problem is with the immutability of std::string, which makes you copy and copy again. I will figure out a more maintainable way.

@ovalenti ovalenti force-pushed the ovalenti/coerse_utf8_in_process_signals branch 2 times, most recently from 4cb0181 to 558f363 Compare September 23, 2024 13:47
@ovalenti ovalenti force-pushed the ovalenti/coerse_utf8_in_process_signals branch from 558f363 to 6226563 Compare September 23, 2024 13:52
collector/test/UtilityTest.cpp Outdated Show resolved Hide resolved
collector/test/UtilityTest.cpp Show resolved Hide resolved
ovalenti and others added 2 commits September 23, 2024 16:25
Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
@ovalenti ovalenti marked this pull request as ready for review September 23, 2024 14:26
@ovalenti ovalenti requested a review from a team as a code owner September 23, 2024 14:26
@ovalenti ovalenti requested a review from Molter73 September 23, 2024 14:26
Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ovalenti ovalenti merged commit d95a279 into master Sep 23, 2024
66 checks passed
@ovalenti ovalenti deleted the ovalenti/coerse_utf8_in_process_signals branch September 23, 2024 16:20
ovalenti added a commit that referenced this pull request Sep 25, 2024
Replace every byte of an invalid UTF-8 sequence with '?'

Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
ovalenti added a commit that referenced this pull request Sep 26, 2024
* ROX-26025: Sanitize UTF-8 strings in process signals (#1857)

Replace every byte of an invalid UTF-8 sequence with '?'

* Pin ansible version to <2.17 to avoid loop bug (#1837)

---------

Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
Co-authored-by: Giles Hutton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants