Change wording from UTF-16 to Unicode

I believe the wording of the Protocol Guide confuses Unicode and character encodings such as UTF-16. Citing ECMA-404, chapter 1 "Scope": >JSON syntax describes a sequence of Unicode code points. Citing ECMA-404, chapter 9 "String": >A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). JSON is by definition a format which is a sequence of Unicode code points. Fields of this format do not have any character encoding associated with them at the conceptual level. It is only when being serialized eg. for transport over the wire this sequence of Unicode character is encoded using a specific character encoding. Talking about a specific UTF encoding of a JSON field and then referring to string length in code points is confusing. The wording seems to imply that this specific field is serialised differently from the entire JSON sequence. This is impossible. Morover the fact that this JSON is then encoded using UTF-16 is irrelevant to the remark about the length of this field and already covered by this sentence: >The canonical format is defined by the ECMA-262 6th Edition section >JSON.stringify. For an example, see how the above message is formatted. I decided to replace the phase "UTF-16" with "Unicode" instead of removing it to make sure that the phrase "code units" is explicit.
ssbc · Apr 13, 2022 · 7ba2a0c · 7ba2a0c
1 parent c04d813
commit 7ba2a0c
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/index.html b/index.html
@@ -859,7 +859,7 @@ <h3 id="message-format">Message format</h3>
             </tr>
             <tr>
                 <td>content</td>
-                <td>If the message is not encrypted, This is a dictionary containing free-form data for applications to interpret, plus a mandatory <em>type</em> field. The <em>type</em> field allows applications to filter out message types they don’t understand and must be a UTF-16 string between 3 and 52 code units long (inclusive). If the message is encrypted, then this is a base64 encoded string, followed by a suffix of <code>.box</code>; we will describe private messages later in this document.</td>
+                <td>If the message is not encrypted, This is a dictionary containing free-form data for applications to interpret, plus a mandatory <em>type</em> field. The <em>type</em> field allows applications to filter out message types they don’t understand and must be a Unicode string between 3 and 52 code units long (inclusive). If the message is encrypted, then this is a base64 encoded string, followed by a suffix of <code>.box</code>; we will describe private messages later in this document.</td>
             </tr>
         </table>
         <aside style="align-self: start; position: relative; top: 19px;">