-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message consumer garbles subjects with non-ascii characters #1143
Comments
Understood. We recognize that the server actually will support utf-8 subjects, but for cross client compatibility, at one point we decided the clients would not and suggest using ascii only as noted in our docs: https://docs.nats.io/nats-concepts/subjects That being said, we discussed this topic today, and decided that it is okay to allow the clients to opt-in for this behavior, so I will add this to my todo list. |
Ah, I see. I thought I read somewhere that subjects were allowed to use "any printable character", but I can't find the reference. Happy to hear it will be fixed 👍 |
Probably in ADR-6 |
I clarified the "ASCII only" for subjects and other names in NATS. https://docs.nats.io/nats-concepts/subjects We don't support UTF8 for the same reasons its a bad idea to use Unicode for names on the web. Names need to be shared in all kinds of documents, written to logs files and sometimes typed by people. Once you allow anything outside ASCII you open the floodgates and you cannot work with peoples systems in other countries anymore. If you want to know where this can lead I suggest some computer archaeology into the 90s when Microsoft "localized" Visual basic by actually translating the language key words. |
If this is how you want to do it, that's fine. I will add: The developer experience would be much smoother if both the server and the clients did the same validation on this. We ran into this problem because of inconsistent behavior between the server, CLI and Java SDK. Had non-ascii characters been outright disallowed this wouldn't have been an issue. If you don't want to tighten validations to avoid breaking backwards compatibility then I completely agree. But in that case, making the Java SDK behave the same way as the server and CLI (e.g. more permissive, by reading subject names as UTF-8) would have given fewer surprises. It could even issue a warning when non-asciis are detected at creation time. |
Correction: There was a bit of a mixup between recommendations for subject usage and support for UTF-8. We decided to bring back the UTF-8 support for most clients. In general UTF-8 should work. But it may be optional for some client implementations. We recommend to NOT use non-ASCII characters in subjects as this can cause all kinds of issues in configuration files, command line tools and simply people being able to read and configure the subjects. I'm speaking as somebody who worked with the Unicode standard in IT systems since 1995(!!!). If you use non ASCII in "names" - its at your own risk. |
Thanks for clarifying, this sounds good to me 👍😊 |
I have found the issues (subject not utf8 decoded for incoming messages) and will suggest a fix. PS: Its a feature. We are planning to optionally allow UTF-8 again in a future release. The code exists but is disabled for the current release. |
@cjohansen I have merged the PR #1169 This allows you to turn on UTF-8 support via the connect options with the You can publish with a UTF-8 subject whether or not this flag is set, since all that happens outgoing is we convert strings to byte arrays using UTF-8 character encoding anyway. But on incoming messages, it's a different code path to process messages that might expect a UTF-8 subject, so the option is required. |
Awesome, thanks a lot 😊👍 |
Observed behavior
I set up a consumer for a topic containing the utf-8 character
ø
, e.g.test.løp
.nats subscribe 'test.løp'
picks up messages published withnats publish test.løp "Hello"
. The jnats consumer receives a message with the subject"test.lᅢᄌp"
.jnats is able to publish messages with non-ascii characters correctly.
Expected behavior
I expect the subject received by jnats to be read as the UTF-8 string "test.løp", not "test.lᅢᄌp"
Server and client version
jnats 2.17.7
nats-server 2.10.12
Host environment
Mac OSX
Steps to reproduce
The text was updated successfully, but these errors were encountered: