Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Change wording from UTF-16 to Unicode
I believe the wording of the Protocol Guide confuses Unicode and character encodings such as UTF-16. Citing ECMA-404, chapter 1 "Scope": >JSON syntax describes a sequence of Unicode code points. Citing ECMA-404, chapter 9 "String": >A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). JSON is by definition a format which is a sequence of Unicode code points. Fields of this format do not have any character encoding associated with them at the conceptual level. It is only when being serialized eg. for transport over the wire this sequence of Unicode character is encoded using a specific character encoding. Talking about a specific UTF encoding of a JSON field and then referring to string length in code points is confusing. The wording seems to imply that this specific field is serialised differently from the entire JSON sequence. This is impossible. Morover the fact that this JSON is then encoded using UTF-16 is irrelevant to the remark about the length of this field and already covered by this sentence: >The canonical format is defined by the ECMA-262 6th Edition section >JSON.stringify. For an example, see how the above message is formatted. I decided to replace the phase "UTF-16" with "Unicode" instead of removing it to make sure that the phrase "code units" is explicit.
- Loading branch information