Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RKOTLIN-1089] Guard decoding issues when core logs invalid utf-8 messages #1792

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

rorbech
Copy link
Contributor

@rorbech rorbech commented Jul 3, 2024

Closes #1760

@cla-bot cla-bot bot added the cla: yes label Jul 3, 2024
@rorbech rorbech marked this pull request as ready for review July 3, 2024 10:03
for (std::string::size_type i = 0; i < str.size(); ++i)
ret << " 0x" << std::hex << std::setfill('0') << std::setw(2) << (int)s[i];
ret << "; ";
ret << "in_begin = " << in_begin << "; ";
ret << "in_end = " << in_end << "; ";
ret << "in_begin = " << (void*) in_begin << "; ";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why casting to opaque pointer?

Copy link
Contributor Author

@rorbech rorbech Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If treating in_begin and in_end as char * they will be considered as 0-terminated strings and printed to the stream ... in which case we end up including the invalid UTF-8 bytes in the message - Just as for ret << "StringData.data = " << str << "; ";. Want to avoid having invalid unicode in the exception message as spelled out in #1792 (comment). Just including the pointers to give some insights if we were to try to understand and reproduce an issue based on one of these exceptions.

@@ -82,13 +82,12 @@ static std::string string_to_hex(const std::string& message, realm::StringData&
ret << "error_code = " << error_code << "; ";
ret << "retcode = " << retcode << "; ";
ret << "StringData.size = " << str.size() << "; ";
ret << "StringData.data = " << str << "; ";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not printing data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string_to_hex is only used to make a hex-dump of the data when we fail to decode it from UTF-8. So don't think it makes sense to include the non-decodable string itself. The result of this method is used as the message of an exception and we cannot pass that on to the JVM if the message itself is not valid unicode.

jstring j_message = NULL;
try {
j_message = to_jstring(jenv, message);
} catch (RuntimeError exception) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you observe this on Android & JVM?

Copy link
Contributor Author

@rorbech rorbech Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is on both platforms

@rorbech rorbech force-pushed the cr/fix-logger-encoding-issues branch from db2e957 to 7af0a91 Compare July 5, 2024 06:37
Copy link
Collaborator

@nhachicha nhachicha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should skip printing memory address of begin/end if there's not plan how to use them

@rorbech
Copy link
Contributor Author

rorbech commented Jul 5, 2024

I think we should skip printing memory address of begin/end if there's not plan how to use them

We can actually derive some context from it. The in_begin will point to the byte that we cannot decode, so we can derive the index of the buffer from in_begin and in_end. Haven't traced it but I assume there is a similar correspondence with the out-arguments around the encoding to UTF-16. So will leave them for now.

@rorbech rorbech changed the title [RKOTLIN-1089] Guard encoding issues when core logs invalid utf-8 messages [RKOTLIN-1089] Guard decoding issues when core logs invalid utf-8 messages Jul 5, 2024
@rorbech rorbech merged commit 2eaaf92 into releases Jul 5, 2024
56 checks passed
@rorbech rorbech deleted the cr/fix-logger-encoding-issues branch July 5, 2024 12:42
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants