-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-ASCII characters in function arguments #2584
Conversation
Co-authored-by: Yvonne Fröhlich <[email protected]>
This new feature is almost done and is ready for more testings. Please let me know what you think. |
Looking good! Could you also change this line in the colorbar gallery example to use degrees perhaps?
Also, do we need to test different PS_CHAR_ENCODING settings like Standard, Standard+, ISOLatin1, ISOLatin1+, and ISO-8859-x? Currently the mapping seems to assume ISOLatin1, but if a user sets PS_CHAR_ENCODING to another encoding, the output might be different. |
Done in e1b43b2.
ISOLatin1+ is an extension of ISOLatin1, which means ISOLatin1+ supports more characters than ISOLatin (https://docs.generic-mapping-tools.org/latest/cookbook/octal-codes.html). Thus, it does no harm to support ISOLatin1+ only characters even if users set ISOLatin1 encoding. For Standard+ and other ISO-8859-x encodings, yes, we can definitely support them. But it also means we have to carefully check their differences and maintain the corresponding mapping dictionary from character to octal codes. |
pygmt/helpers/utils.py
Outdated
# ISOLatin1+ charset: \240-\377 | ||
mapping.update({chr(i): "\\" + format(i, "o") for i in range(160, 256)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Python supports many different encodings, thus it's possible to implement this feature without manually maintaining the big dictionary, just like what I already do at line 166 for ISOLatin1+ characters \240 to \377. But I don't have enough knowledge about Python encodings now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update:
Can be done using:
for i in range(160, 256):
print(i, chr(i), chr(i).encode("iso-8859-1").decode("iso-8859-5"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the improved version:
for code in [*range(0o040, 0o200), *range(0o240, 0o400)]:
char = codecs.decode(bytes([code]), "iso8859-5", errors="replace")
+ "αβχδεφγηιϕκλμνο" # \14x-15x | ||
+ "πθρστυϖωξψζ{|}∼" # \16x-17x. \177 is undefined | ||
+ "€ϒ′≤⁄∞ƒ♣♦♥♠↔←↑→↓" # \24x-\25x | ||
+ "°±″≥×∝∂•÷≠≡≈…↵" # \26x-27x | ||
+ "ℵℑℜ℘⊗⊕∅∩∪⊃⊇⊄⊂⊆∈∉" # \30x-31x | ||
+ "∠∇®©™∏√⋅¬∧∨⇔⇐⇑⇒⇓" # \32x-33x | ||
+ "◊〈®©™∑" # \34x-35x | ||
+ "〉∫⌠⌡", # \36x-37x. \360 and \377 are undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already answered in #2584 (comment).
Yes, we can also translate the strings passed by the
Better to do it in a separate PR. |
@GenericMappingTools/pygmt-maintainers Any comments and suggestions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't checked every individual symbol, but I trust that they are correct 🙂
Check arguments that are not processed by build_arg_string (e.g.,
text
in Figure.text)Yes, we can also translate the strings passed by the
text
parameter, but we can do very little to plaintext files (we can, but it means we have to open and read the whole plaintext file, which is not efficient). So not sure what we should do here.
Ok, to handle ASCII characters in fig.text
in a separate PR.
Co-authored-by: Wei Ji <[email protected]>
Co-authored-by: Dongdong Tian <[email protected]>
Tenth minor release of PyGMT. * Add changelog entry to version switcher * Update compatibility table * Update citation * Add draft changelog * Add full names of contributors to changelog * Add two highlight bullet points * Combine non-ASCII character PRs #2638 and #2584 into one highlight point * Swap author positions for Dongdong and Leo * Change release date to 20230902 * Move Yvonne up a few spots --------- Co-authored-by: Yvonne Fröhlich <[email protected]> Co-authored-by: Dongdong Tian <[email protected]> Co-authored-by: Michael Grund <[email protected]>
Description of proposed changes
Support non-ASCII characters in PyGMT arguments, so that users don't have to use octal codes.
See #2204 for context.
Working example
it produces:
Notes to maintainers
The symbols and ZapfDingbats charset can be obtained from https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt and https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt. The following script is used to generate the list of characters, but manually editing are necessary.
TODO
PS_CHAR_ENCODING
so it also works with Standard+ character set?build_arg_string
(e.g.,text
inFigure.text
)Closes #2204 Closes #2000
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash commands are:
/format
: automatically format and lint the code/test-gmt-dev
: run full tests on the latest GMT development version