Support non-ASCII characters in function arguments #2584

seisman · 2023-06-24T08:06:18Z

Description of proposed changes

Support non-ASCII characters in PyGMT arguments, so that users don't have to use octal codes.
See #2204 for context.

Working example

import pygmt

fig = pygmt.Figure()
fig.basemap(region=[0, 10, 0, 5], projection="x1c", frame="WSEN+tABC¥±°″≥×≈∇⇔∑DEF")
fig.show()

it produces:

Notes to maintainers
The symbols and ZapfDingbats charset can be obtained from https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt and https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt. The following script is used to generate the list of characters, but manually editing are necessary.

with open("zdingbat.txt", "r") as fin:
    for line in fin:
        if line.startswith("#"):
            continue
        hexval = line.split()[0].strip()
        print(chr(int(hexval, base=16)), end="")
        i += 1
        if i % 16 == 0:
            print()

TODO

Support all non-ASCII characters in the ISOLatin1+ character set
Support non-ASCII characters in the Symbols character set
Support non-ASCII characters in the ZapfDingbats character set
Check PS_CHAR_ENCODING so it also works with Standard+ character set?
Check arguments that are not processed by build_arg_string (e.g., text in Figure.text)
Add a gallery example or a tutorial?
update any existing examples

Closes #2204 Closes #2000

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.
Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

/format: automatically format and lint the code
/test-gmt-dev: run full tests on the latest GMT development version

pygmt/helpers/utils.py

Co-authored-by: Yvonne Fröhlich <[email protected]>

seisman · 2023-06-26T13:43:47Z

This new feature is almost done and is ready for more testings. Please let me know what you think.

weiji14 · 2023-06-26T20:44:21Z

Looking good! Could you also change this line in the colorbar gallery example to use degrees perhaps?

pygmt/examples/gallery/embellishments/colorbar.py

Line 43 in dde23f1

frame=["x+lTemperature", r"y+l\260C"],

Also, do we need to test different PS_CHAR_ENCODING settings like Standard, Standard+, ISOLatin1, ISOLatin1+, and ISO-8859-x? Currently the mapping seems to assume ISOLatin1, but if a user sets PS_CHAR_ENCODING to another encoding, the output might be different.

seisman · 2023-06-27T00:24:51Z

Looking good! Could you also change this line in the colorbar gallery example to use degrees perhaps?

pygmt/examples/gallery/embellishments/colorbar.py

Line 43 in dde23f1

frame=["x+lTemperature", r"y+l\260C"],

Done in e1b43b2.

Also, do we need to test different PS_CHAR_ENCODING settings like Standard, Standard+, ISOLatin1, ISOLatin1+, and ISO-8859-x? Currently the mapping seems to assume ISOLatin1, but if a user sets PS_CHAR_ENCODING to another encoding, the output might be different.

ISOLatin1+ is an extension of ISOLatin1, which means ISOLatin1+ supports more characters than ISOLatin (https://docs.generic-mapping-tools.org/latest/cookbook/octal-codes.html). Thus, it does no harm to support ISOLatin1+ only characters even if users set ISOLatin1 encoding.

For Standard+ and other ISO-8859-x encodings, yes, we can definitely support them. But it also means we have to carefully check their differences and maintain the corresponding mapping dictionary from character to octal codes.

seisman · 2023-06-27T00:26:32Z

pygmt/helpers/utils.py

+    # ISOLatin1+ charset: \240-\377
+    mapping.update({chr(i): "\\" + format(i, "o") for i in range(160, 256)})


Actually, Python supports many different encodings, thus it's possible to implement this feature without manually maintaining the big dictionary, just like what I already do at line 166 for ISOLatin1+ characters \240 to \377. But I don't have enough knowledge about Python encodings now.

Update:

Can be done using:

for i in range(160, 256): print(i, chr(i), chr(i).encode("iso-8859-1").decode("iso-8859-5"))

Here is the improved version:

for code in [*range(0o040, 0o200), *range(0o240, 0o400)]: char = codecs.decode(bytes([code]), "iso8859-5", errors="replace")

pygmt/helpers/utils.py

examples/gallery/embellishments/colorbar.py

Co-authored-by: Yvonne Fröhlich <[email protected]>

seisman · 2023-08-02T08:08:25Z

pygmt/helpers/utils.py

+                + "αβχδεφγηιϕκλμνο"  # \14x-15x
+                + "πθρστυϖωξψζ{|}∼"  # \16x-17x. \177 is undefined
+                + "€ϒ′≤⁄∞ƒ♣♦♥♠↔←↑→↓"  # \24x-\25x
+                + "°±″≥×∝∂•÷≠≡≈…↵"  # \26x-27x
+                + "ℵℑℜ℘⊗⊕∅∩∪⊃⊇⊄⊂⊆∈∉"  # \30x-31x
+                + "∠∇®©™∏√⋅¬∧∨⇔⇐⇑⇒⇓"  # \32x-33x
+                + "◊〈®©™∑"  # \34x-35x
+                + "〉∫⌠⌡",  # \36x-37x. \360 and \377 are undefined


Symbols \140, \275 and \276 don't exist in the unicode table, so they're incorrectly shown.

Some symbols of \36x and \37x are shown as boxes in the GitHub web GUI, but they're shown correctly in Vim, so these symbols should be good.

seisman · 2023-08-02T09:03:27Z

Check PS_CHAR_ENCODING so it also works with Standard+ character set?

Already answered in #2584 (comment).

For other ISO-8859-x encodings, it's possible to generate the big mapping dictionary automatically using the following code:
{chr(i).encode("iso-8859-1").decode("iso-8859-5"): "\\" + format(i, "o") for i in range(160, 256)}
but we have to check the PS_CHAR_ENCODING parameter to decide which dictionary to use. However, adding from pygmt.clib.session import Session to pygmt.helpers.utils results in circular imports.

Check arguments that are not processed by build_arg_string (e.g., text in Figure.text)

Yes, we can also translate the strings passed by the text parameter, but we can do very little to plaintext files (we can, but it means we have to open and read the whole plaintext file, which is not efficient). So not sure what we should do here.

Add a gallery example or a tutorial?

Better to do it in a separate PR.

seisman · 2023-08-16T04:17:29Z

@GenericMappingTools/pygmt-maintainers Any comments and suggestions?

michaelgrund

Looks good to me!

weiji14

Haven't checked every individual symbol, but I trust that they are correct 🙂

Check arguments that are not processed by build_arg_string (e.g., text in Figure.text)

Yes, we can also translate the strings passed by the text parameter, but we can do very little to plaintext files (we can, but it means we have to open and read the whole plaintext file, which is not efficient). So not sure what we should do here.

Ok, to handle ASCII characters in fig.text in a separate PR.

pygmt/helpers/utils.py

Co-authored-by: Wei Ji <[email protected]>

Co-authored-by: Dongdong Tian <[email protected]>

Tenth minor release of PyGMT. * Add changelog entry to version switcher * Update compatibility table * Update citation * Add draft changelog * Add full names of contributors to changelog * Add two highlight bullet points * Combine non-ASCII character PRs #2638 and #2584 into one highlight point * Swap author positions for Dongdong and Leo * Change release date to 20230902 * Move Yvonne up a few spots --------- Co-authored-by: Yvonne Fröhlich <[email protected]> Co-authored-by: Dongdong Tian <[email protected]> Co-authored-by: Michael Grund <[email protected]>

Support non-ASCII characters

75778bf

seisman added the feature Brand new feature label Jun 24, 2023

seisman added this to the 0.10.0 milestone Jun 24, 2023

seisman added the needs review This PR has higher priority and needs review. label Jun 24, 2023

seisman added 3 commits June 25, 2023 00:50

Support all ISOlatin1 characters

e840c99

Support more ISOLatin1+ characters

1b6d940

fix

1b88cd6

yvonnefroehlich reviewed Jun 25, 2023

View reviewed changes

pygmt/helpers/utils.py Outdated Show resolved Hide resolved

seisman and others added 7 commits June 26, 2023 10:16

Update pygmt/helpers/utils.py

311b976

Co-authored-by: Yvonne Fröhlich <[email protected]>

Refactor to make it more readable

672413b

Need to remove single quote

2dcc288

[ci skip] Use a better reference for ASCII table

c9b8254

Support Symbols charset

e2947fa

Support ZapfDingbats charset

1a84634

Refactor and add more doctests

636ace0

seisman changed the title ~~POC: Support non-ASCII characters~~ Support non-ASCII characters Jun 26, 2023

seisman added 2 commits June 27, 2023 08:12

Fix a symbol which is incorrectly copied from PDF

ffabaee

Replace octal codes with non-ASCII character in two examples

e1b43b2

seisman commented Jun 27, 2023

View reviewed changes

pygmt/helpers/utils.py Outdated Show resolved Hide resolved

seisman added 7 commits June 27, 2023 08:30

Fix a typo in doctest

7ea78da

Merge branch 'main' into non-ascii-support

487f2d8

Fix some characters

288486c

Fix symbol characters

97a223d

Add one more reference

4ff2e56

Add two more references

388109f

Update ZapfDingbats charset

2270d9d

Add docstrings

37b0c6a

yvonnefroehlich reviewed Jul 22, 2023

View reviewed changes

examples/gallery/embellishments/colorbar.py Outdated Show resolved Hide resolved

seisman and others added 3 commits July 22, 2023 21:59

Update examples/gallery/embellishments/colorbar.py

3362b31

Co-authored-by: Yvonne Fröhlich <[email protected]>

Merge branch 'main' into non-ascii-support

695c59b

Remove an unused pylint directive

bbc223b

seisman commented Aug 2, 2023

View reviewed changes

Fix a typo in doctest

35306bf

seisman marked this pull request as ready for review August 2, 2023 08:41

seisman added 2 commits August 5, 2023 10:39

Merge branch 'main' into non-ascii-support

e56359e

Merge branch 'main' into non-ascii-support

633b2f9

michaelgrund approved these changes Aug 16, 2023

View reviewed changes

Merge branch 'main' into non-ascii-support

cdb4ab1

weiji14 mentioned this pull request Aug 21, 2023

Add gallery example "Cross-section along a transect" #2515

Merged

7 tasks

weiji14 approved these changes Aug 21, 2023

View reviewed changes

pygmt/helpers/utils.py Outdated Show resolved Hide resolved

seisman changed the title ~~Support non-ASCII characters~~ Support non-ASCII characters in function arguments Aug 21, 2023

seisman and others added 2 commits August 21, 2023 17:55

Update pygmt/helpers/utils.py

646465c

Co-authored-by: Wei Ji <[email protected]>

Merge branch 'main' into non-ascii-support

9ce0217

seisman merged commit 691f1d4 into main Aug 21, 2023
8 of 14 checks passed

seisman deleted the non-ascii-support branch August 21, 2023 10:54

seisman removed the needs review This PR has higher priority and needs review. label Aug 21, 2023

This was referenced Aug 21, 2023

Support non-ASCII characters in PyGMT arguments and text in Figure.text #2204

Open

Figure.text: Support non-ASCII characters in the 'text' parameter #2638

Merged

yvonnefroehlich added a commit that referenced this pull request Aug 21, 2023

Use '+u°' instead of '+u@.' following PR #2584

765f2fd

Co-authored-by: Dongdong Tian <[email protected]>

weiji14 mentioned this pull request Aug 26, 2023

Add gallery example for plotting an RGB image from an xarray.DataArray #2641

Merged

7 tasks

weiji14 added a commit that referenced this pull request Sep 1, 2023

Combine non-ASCII character PRs #2638 and #2584 into one highlight point

81dfe9d

This was referenced Apr 23, 2024

Support left/right single quotation marks in text and arguments #3192

Merged

non_ascii_to_octal: Return the input string if it only contains printable ASCII characters #3199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non-ASCII characters in function arguments #2584

Support non-ASCII characters in function arguments #2584

seisman commented Jun 24, 2023 •

edited

Loading

seisman commented Jun 26, 2023

weiji14 commented Jun 26, 2023 •

edited

Loading

seisman commented Jun 27, 2023

seisman Jun 27, 2023 •

edited

Loading

seisman Jul 2, 2023

seisman Jul 1, 2024 •

edited

Loading

seisman Aug 2, 2023

seisman commented Aug 2, 2023

seisman commented Aug 16, 2023

michaelgrund left a comment

weiji14 left a comment

		# ISOLatin1+ charset: \240-\377
		mapping.update({chr(i): "\\" + format(i, "o") for i in range(160, 256)})

Support non-ASCII characters in function arguments #2584

Support non-ASCII characters in function arguments #2584

Conversation

seisman commented Jun 24, 2023 • edited Loading

seisman commented Jun 26, 2023

weiji14 commented Jun 26, 2023 • edited Loading

seisman commented Jun 27, 2023

seisman Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

seisman Jul 2, 2023

Choose a reason for hiding this comment

seisman Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

seisman Aug 2, 2023

Choose a reason for hiding this comment

seisman commented Aug 2, 2023

seisman commented Aug 16, 2023

michaelgrund left a comment

Choose a reason for hiding this comment

weiji14 left a comment

Choose a reason for hiding this comment

seisman commented Jun 24, 2023 •

edited

Loading

weiji14 commented Jun 26, 2023 •

edited

Loading

seisman Jun 27, 2023 •

edited

Loading

seisman Jul 1, 2024 •

edited

Loading