Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to improve the user experience with non-ASCII characters #3206

Merged
merged 14 commits into from
Jun 29, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Apr 27, 2024

See #2204 (comment) for background and the issue.

This PR solves the issue by:

Preview:
The tables of supported encodings are at
https://pygmt-dev--3206.org.readthedocs.build/en/3206/techref/encodings.html.

This page is in the "Technical Reference" section, as discussed in #2834.

Note to maintainers

Below is the script used to generate the Markdown tables in the documentation

from pygmt.encodings import charset

undefined = "\ufffd"
markdown = ""
for charset_name, mappings in charset.items():
    markdown += f"## Adobe {charset_name} Encoding\n\n"
    markdown += "| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |\n"
    markdown += "|---|---|---|---|---|---|---|---|---|\n"
    for i in range(0o00, 0o400, 8):
        chars = [mappings.get(j, undefined) for j in range(i, i + 8)]
        if chars == [undefined] * 8:
            continue
        chars = [f"&#x{ord(char):04x};" for char in chars]
        row = f"\\{i:03o}"[:-1] + "x"
        markdown += f"| **{row}** | {' | '.join(chars)} |\n"
    markdown += "\n"

print(markdown)

@seisman seisman force-pushed the encodings branch 3 times, most recently from 7d14bfc to 807ed18 Compare April 29, 2024 00:45
@seisman seisman force-pushed the encodings branch 2 times, most recently from 3d325e9 to 7f95045 Compare June 19, 2024 13:21
@seisman seisman changed the title WIP + POC: Improved support of non-ASCII characters Refactor the mapping dictionary to improve the user experience with non-ASCII characters Jun 19, 2024
The translated string.

Examples
--------
>>> non_ascii_to_octal("•‰“”±°ÿ")
'\\31\\214\\216\\217\\261\\260\\377'
>>> non_ascii_to_octal("αζΔΩ∑π∇")
>>> non_ascii_to_octal("αζ∆Ω∑π∇")
Copy link
Member Author

@seisman seisman Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [19]: "Ω" == "Ω"
Out[19]: False

In [20]: "Δ" == "∆"
Out[20]: False

In [21]: import unicodedata

In [22]: unicodedata.name("Ω")
Out[22]: 'OHM SIGN'

In [23]: unicodedata.name("Ω")
Out[23]: 'GREEK CAPITAL LETTER OMEGA'

In [24]: unicodedata.name("Δ")
Out[24]: 'GREEK CAPITAL LETTER DELTA'

In [25]: unicodedata.name("∆")
Out[25]: 'INCREMENT'

We were using incorrect characters previously!

@@ -417,7 +417,7 @@ def test_text_nonascii():
fig.basemap(region=[0, 10, 0, 10], projection="X10c", frame=True)
fig.text(position="TL", text="position-text:°α") # noqa: RUF001
fig.text(x=1, y=1, text="xytext:°α") # noqa: RUF001
fig.text(x=[5, 5], y=[3, 5], text=["xytext1:αζΔ❡", "xytext2:∑π∇✉"])
fig.text(x=[5, 5], y=[3, 5], text=["xytext1:αζ∆❡", "xytext2:∑π∇✉"])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Δ" != "∆".

@seisman seisman added the enhancement Improving an existing feature label Jun 19, 2024
@seisman seisman added this to the 0.13.0 milestone Jun 19, 2024
@seisman seisman marked this pull request as ready for review June 19, 2024 13:32
@seisman seisman added the needs review This PR has higher priority and needs review. label Jun 19, 2024
charset["ZapfDingbats"] = dict(
zip(
[*range(0o040, 0o220), *range(0o240, 0o400)],
"\u0020\u2701\u2702\u2703\u2704\u260e\u2706\u2707"
Copy link
Member Author

@seisman seisman Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous version, we maintained a big dictionary like:

 ✁✂✃✄☎✆✇✈✉☛☞✌✍✎✏

but here I decide to use:

\u0020\u2701\u2702\u2703\u2704\u260e\u2706\u2707

The main reasons are:

  • Avoid having too many non-ASCII characters in the source code
  • Some non-ASCII characters can't be displayed correctly in text editors or GitHub web UI

For similar reasons, in the documentation, I use HTML syntax like

  | ✁ | ✂ | ✃ | ✄ | ☎ | ✆ | ✇ |

@seisman seisman changed the title Refactor the mapping dictionary to improve the user experience with non-ASCII characters Refactor to improve the user experience with non-ASCII characters Jun 19, 2024
pygmt/encodings.py Outdated Show resolved Hide resolved
@seisman seisman requested a review from a team June 22, 2024 05:08
pygmt/encodings.py Outdated Show resolved Hide resolved
pygmt/encodings.py Outdated Show resolved Hide resolved
Co-authored-by: Yvonne Fröhlich <[email protected]>
@@ -0,0 +1,7 @@
# Technical Reference
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #2834, the documentation group is called "Technical Reference". Here the directory name is techref. Please let me know if you're OK with the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ok with the title name. We could maybe add a short paragraph under this title to mention what this page is about? Also would be good to link to https://docs.generic-mapping-tools.org/6.5/reference.html.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in f2b3dcb.

@seisman
Copy link
Member Author

seisman commented Jun 25, 2024

Ping @GenericMappingTools/pygmt-maintainers for reviews and will self-approve and merge in one week if no further comments.

@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Jun 25, 2024
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor typos and one suggestion on adding a short descriptive paragraph in the Technical Reference page. Haven't checked every single character in the tables, but I trust that they're ok 🙂

@@ -0,0 +1,7 @@
# Technical Reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ok with the title name. We could maybe add a short paragraph under this title to mention what this page is about? Also would be good to link to https://docs.generic-mapping-tools.org/6.5/reference.html.

doc/techref/encodings.md Outdated Show resolved Hide resolved
doc/techref/encodings.md Outdated Show resolved Hide resolved
doc/techref/encodings.md Outdated Show resolved Hide resolved
doc/techref/index.md Outdated Show resolved Hide resolved
@seisman seisman merged commit b907b4a into main Jun 29, 2024
19 checks passed
@seisman seisman deleted the encodings branch June 29, 2024 01:41
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants