Skip to content

Commit

Permalink
keysyms: Add sharp S upper case mapping exception
Browse files Browse the repository at this point in the history
The case mapping `ssharp` ß ↔ `U1E9E` ẞ was added in
13b30f4 but was broken:
- For the lower case mapping it returned the keysym `0x10000df`, which
  is an invalid Unicode keysym.
- For the upper case mapping it returned the upper Unicode code point
  rather than the corresponding keysym.

It did accidentally enable the detection of alphabetic key type for the
pair (ß, ẞ) though. However this detection was accidentally removed in
5c7c799 (v1.7) with an attempt to fix
the wrong keysym case mapping. Finally both the *lower* case mapping
and the key type detection were fixed for good when we implemented the
complete Unicode simple case mappings and corresponding tests in
e83d08d.

However, the *upper* case mapping `ssharp` → `U1E9E` remained disabled.
Indeed, ẞ is a relatively recent addition to Unicode (2008) and had no
official recommendation, until recently. So while the lower mapping ẞ→ß
exists in Unicode, its converse upper mapping does not. Yet since 2017
the Council for German Orthography (Rat für deutsche Rechtschreibung)
recommends[^1] ẞ as the capitalization of ß.

Due to its stability policies, the Unicode Character Database (UCD)
that we use to generate our keysym case mappings (via ICU) cannot update
the simple case mapping of ß. Discussions are currently ongoing in the
Unicode mailing list[^2] and CLDR[^3] about how to deal with the new
recommended case mapping. However, the discussions are oriented on
text-processing and compatibility mappings, while libxkbcommon is
on a rather lower level.

It seems that the slow adoption of ẞ is partly due to the difficulty
to type it. Since ẞ is used only for ALL CAPS casing, the expectation
is to type it using CapsLock. While our detection of alphabetic key
types works well[^4] for the pair (ß,ẞ), the *internal capitalization*
currently does not work and is fixed by this commit.

Added the ß → ẞ upper mapping:
- Added an exception in the generation script
- Fixed tests
- Added documentation of the exceptions in `xkbcommon.h`
- Added/updated log entries

[^1]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
[^2]: https://corp.unicode.org/pipermail/unicode/2024-November/011162.html
[^3]: https://unicode-org.atlassian.net/browse/CLDR-17624
[^4]: Except libxkbcommon 1.7, see the second paragraph.
  • Loading branch information
wismill committed Dec 15, 2024
1 parent 0ebdc4d commit b9b4ab4
Show file tree
Hide file tree
Showing 11 changed files with 456 additions and 338 deletions.
2 changes: 2 additions & 0 deletions changes/api/+großes-ẞ.breaking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Added the upper case mapping ß → ẞ (`ssharp``U1E9E`). This enable to type
ẞ using CapsLock thanks to the internal capitalization rules.
2 changes: 2 additions & 0 deletions changes/api/+großes-ẞ.bugfix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Fixed the lower case mapping ẞ → ß (`U1E9E``ssharp`). This re-enable the detection
of alphabetic key types for the pair (ß, ẞ).
9 changes: 6 additions & 3 deletions changes/api/+unicode-16.breaking.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,16 @@ the following:
- `xkb_keysym_to_lower()` and `xkb_keysym_to_upper()` give different output
for keysyms not covered previously and handle *title*-cased keysyms.

Example of title-cased keysym: `0x10001f2` (`U+01F2` “Dz”):
- `xkb_keysym_to_lower(0x10001f2) == 0x10001f3` (`U+01F3` “dz”)
- `xkb_keysym_to_upper(0x10001f2) == 0x10001f1` (`U+01F1` “DZ”)
Example of title-cased keysym: `U01F2` “Dz”:
- `xkb_keysym_to_lower(U01F2) == U01F3` “Dz” → “dz”
- `xkb_keysym_to_upper(U01F2) == U01F1` “Dz” → “DZ”
- *Implicit* alphabetic key types are better detected, because they use the
latest Unicode case mappings and now handle the *title*-cased keysyms the
same way as upper-case ones.

Note: There is a single *exception* that do not follow the Unicode mappings:
- `xkb_keysym_to_upper(ssharp) == U1E9E` “ß” → “ẞ”

Note: As before, only *simple* case mappings (i.e. one-to-one) are supported.
For example, the full upper case of `U+01F0` “ǰ” is “J̌” (2 characters: `U+004A`
and `U+030C`), which would require 2 keysyms, which is not supported by the
Expand Down
1 change: 1 addition & 0 deletions data/keysyms.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,7 @@
0x00df:
name: ssharp
code point: 0x00DF
upper: 0x1001e9e # U1E9E
0x00e0:
name: agrave
code point: 0x00E0
Expand Down
15 changes: 12 additions & 3 deletions include/xkbcommon/xkbcommon.h
Original file line number Diff line number Diff line change
Expand Up @@ -552,9 +552,18 @@ xkb_utf32_to_keysym(uint32_t ucs);
* If there is no such form, the keysym is returned unchanged.
*
* The conversion rules are the *simple* (i.e. one-to-one) Unicode case
* mappings and do not depend on the locale. If you need the special
* case mappings (i.e. not one-to-one or locale-dependent), prefer to
* work with the Unicode representation instead, when possible.
* mappings (with some exceptions, see hereinafter) and do not depend
* on the locale. If you need the special case mappings (i.e. not
* one-to-one or locale-dependent), prefer to work with the Unicode
* representation instead, when possible.
*
* Exceptions to the Unicode mappings:
*
* | Lower keysym | Lower letter | Upper keysym | Upper letter | Comment |
* | ------------ | ------------ | ------------ | ------------ | ------- |
* | `ssharp` | `U+00DF`: ß | `U1E9E` | `U+1E9E`: ẞ | [Council for German Orthography] |
*
* [Council for German Orthography]: https://www.rechtschreibrat.com/regeln-und-woerterverzeichnis/
*
* @since 0.8.0: Initial implementation, based on `libX11`.
* @since 1.8.0: Use Unicode 16.0 mappings for complete Unicode coverage.
Expand Down
8 changes: 6 additions & 2 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -742,8 +742,12 @@ test(
)
test(
'keymap',
executable('test-keymap', 'test/keymap.c', 'test/keysym.h',
dependencies: test_dep),
executable(
'test-keymap',
'test/keymap.c',
'test/keysym.h',
'test/keysym-case-mapping.h',
dependencies: test_dep),
env: test_env,
)
test(
Expand Down
50 changes: 46 additions & 4 deletions scripts/update-unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
from pathlib import Path
from typing import (
Any,
ClassVar,
Generator,
Generic,
Iterable,
Expand All @@ -101,15 +102,18 @@
TypeVar,
cast,
)
import unicodedata

import icu
import jinja2
import yaml

assert sys.version_info >= (3, 12)

c = icu.Locale.createFromName("C")
icu.Locale.setDefault(c)

SCRIPT = Path(__file__)
CodePoint = NewType("CodePoint", int)
Keysym = NewType("Keysym", int)
KeysymName = NewType("KeysymName", str)
Expand Down Expand Up @@ -294,6 +298,9 @@ class Entry:
upper: int
is_lower: bool
is_upper: bool
# [NOTE] Exceptions must be documented in `xkbcommon.h`.
to_upper_exceptions: ClassVar[dict[str, str]] = {"ß": "ẞ"}
"Upper mappings exceptions"

@classmethod
def zeros(cls) -> Self:
Expand Down Expand Up @@ -326,16 +333,20 @@ def lower_delta(cls, cp: CodePoint) -> int:
def upper_delta(cls, cp: CodePoint) -> int:
return cp - cls.to_upper_cp(cp)

@staticmethod
def to_upper_cp(cp: CodePoint) -> CodePoint:
@classmethod
def to_upper_cp(cls, cp: CodePoint) -> CodePoint:
if upper := cls.to_upper_exceptions.get(chr(cp)):
return ord(upper)
return icu.Char.toupper(cp)

@staticmethod
def to_lower_cp(cp: CodePoint) -> CodePoint:
return icu.Char.tolower(cp)

@staticmethod
def to_upper_char(char: str) -> str:
@classmethod
def to_upper_char(cls, char: str) -> str:
if upper := cls.to_upper_exceptions.get(char):
return upper
return icu.Char.toupper(char)

@staticmethod
Expand Down Expand Up @@ -1954,6 +1965,37 @@ def run(
best_solution.test(config)
if write:
best_solution.write(root)
cls.write_tests(root)

@classmethod
def write_tests(cls, root: Path) -> None:
# Configure Jinja
template_loader = jinja2.FileSystemLoader(root, encoding="utf-8")
jinja_env = jinja2.Environment(
loader=template_loader,
keep_trailing_newline=True,
trim_blocks=True,
lstrip_blocks=True,
)

def code_point_name_constant(c: str, padding: int = 0) -> str:
if not (name := unicodedata.name(c)):
raise ValueError(f"No Unicode name for code point: U+{ord(c):0>4X}")
name = name.replace("-", "_").replace(" ", "_").upper()
return name.ljust(padding)

jinja_env.filters["code_point"] = lambda c: f"0x{ord(c):0>4x}"
jinja_env.filters["code_point_name_constant"] = code_point_name_constant
path = root / "test/keysym-case-mapping.h"
template_path = path.with_suffix(f"{path.suffix}.jinja")
template = jinja_env.get_template(str(template_path.relative_to(root)))
with path.open("wt", encoding="utf-8") as fd:
fd.writelines(
template.generate(
upper_exceptions=Entry.to_upper_exceptions,
script=SCRIPT.relative_to(root),
)
)


################################################################################
Expand Down
Loading

0 comments on commit b9b4ab4

Please sign in to comment.