From 97fb29e52e1b6c4c3d99d65b21f586d5ea93e2a5 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Fri, 17 May 2024 13:30:46 +0800 Subject: [PATCH 1/9] Improve theh support of non-ASCII characters --- doc/index.md | 1 + doc/techref/encodings.md | 108 ++++++++++++++++++++++++++++++++++ doc/techref/index.md | 7 +++ pygmt/encodings.py | 124 +++++++++++++++++++++++++++++++++++++++ pygmt/helpers/utils.py | 105 +++++---------------------------- pygmt/tests/test_text.py | 2 +- 6 files changed, 257 insertions(+), 90 deletions(-) create mode 100644 doc/techref/encodings.md create mode 100644 doc/techref/index.md create mode 100644 pygmt/encodings.py diff --git a/doc/index.md b/doc/index.md index cff9e9eea72..347224cc3d4 100644 --- a/doc/index.md +++ b/doc/index.md @@ -42,6 +42,7 @@ external_resources.md :caption: Reference documentation api/index.rst +techref/index.md changes.md minversions.md ``` diff --git a/doc/techref/encodings.md b/doc/techref/encodings.md new file mode 100644 index 00000000000..ce09b37f96f --- /dev/null +++ b/doc/techref/encodings.md @@ -0,0 +1,108 @@ +# Supported Encodings and Non-ASCII Characters + +GMT supports a number of encodings and each encoding contains a set of ASCII and non-ASCII +characters. Below are a few of the most common encodings and the characters they support. + +In PyGMT, you can use any of these ASCII and non-ASCII characters in arguments and text +strings. When use non-ASCII characters in PyGMT, the easiest way is to copy and paste +the character from the tables below. + +**Note**: The special character � (REPLACEMENT CHARACTER) is used to indicate that +the character is not defined in the encoding. + +## Adobe ISOLatin1+ Encoding + +| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | +|---|---|---|---|---|---|---|---|---| +| **\03x** | � | • | … | ™ | — | – | fi | ž | +| **\04x** | | ! | " | # | $ | % | & | ’ | +| **\05x** | ( | ) | * | + | , | - | . | / | +| **\06x** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | +| **\07x** | 8 | 9 | : | ; | < | = | > | ? | +| **\10x** | @ | A | B | C | D | E | F | G | +| **\11x** | H | I | J | K | L | M | N | O | +| **\12x** | P | Q | R | S | T | U | V | W | +| **\13x** | X | Y | Z | [ | \ | ] | ^ | _ | +| **\14x** | ‘ | a | b | c | d | e | f | g | +| **\15x** | h | i | j | k | l | m | n | o | +| **\16x** | p | q | r | s | t | u | v | w | +| **\17x** | x | y | z | { | | | } | ~ | š | +| **\20x** | Œ | † | ‡ | Ł | ⁄ | ‹ | Š | › | +| **\21x** | œ | Ÿ | Ž | ł | ‰ | „ | “ | ” | +| **\22x** | ı | ` | ´ | ^ | ˜ | ¯ | ˘ | ˙ | +| **\23x** | ¨ | ‚ | ˚ | ¸ | ' | ˝ | ˛ | ˇ | +| **\24x** | � | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | +| **\25x** | ¨ | © | ª | « | ¬ | ­ | ® | ¯ | +| **\26x** | ° | ± | ² | ³ | ´ | µ | ¶ | · | +| **\27x** | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ | +| **\30x** | À | Á |  | à | Ä | Å | Æ | Ç | +| **\31x** | È | É | Ê | Ë | Ì | Í | Î | Ï | +| **\32x** | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | +| **\33x** | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß | +| **\34x** | à | á | â | ã | ä | å | æ | ç | +| **\35x** | è | é | ê | ë | ì | í | î | ï | +| **\36x** | ð | ñ | ò | ó | ô | õ | ö | ÷ | +| **\37x** | ø | ù | ú | û | ü | ý | þ | ÿ | + +## Adobe Symbol Encoding + +| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | +|---|---|---|---|---|---|---|---|---| +| **\04x** | | ! | ∀ | # | ∃ | % | & | ∋ | +| **\05x** | ( | ) | ∗ | + | , | − | . | / | +| **\06x** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | +| **\07x** | 8 | 9 | : | ; | < | = | > | ? | +| **\10x** | ≅ | Α | Β | Χ | ∆ | Ε | Φ | Γ | +| **\11x** | Η | Ι | ϑ | Κ | Λ | Μ | Ν | Ο | +| **\12x** | Π | Θ | Ρ | Σ | Τ | Υ | ς | Ω | +| **\13x** | Ξ | Ψ | Ζ | [ | ∴ | ] | ⊥ | _ | +| **\14x** |  | α | β | χ | δ | ε | φ | γ | +| **\15x** | η | ι | ϕ | κ | λ | μ | ν | ο | +| **\16x** | π | θ | ρ | σ | τ | υ | ϖ | ω | +| **\17x** | ξ | ψ | ζ | { | | | } | ∼ | � | +| **\24x** | € | ϒ | ′ | ≤ | ∕ | ∞ | ƒ | ♣ | +| **\25x** | ♦ | ♥ | ♠ | ↔ | ← | ↑ | → | ↓ | +| **\26x** | ° | ± | ″ | ≥ | × | ∝ | ∂ | • | +| **\27x** | ÷ | ≠ | ≡ | ≈ | … | ⏐ | ⎯ | ↵ | +| **\30x** | ℵ | ℑ | ℜ | ℘ | ⊗ | ⊕ | ∅ | ∩ | +| **\31x** | ∪ | ⊃ | ⊇ | ⊄ | ⊂ | ⊆ | ∈ | ∉ | +| **\32x** | ∠ | ∇ | ® | © | ™ | ∏ | √ | ⋅ | +| **\33x** | ¬ | ∧ | ∨ | ⇔ | ⇐ | ⇑ | ⇒ | ⇓ | +| **\34x** | ◊ | 〈 | ® | © | ™ | ∑ | ⎛ | ⎜ | +| **\35x** | ⎝ | ⎡ | ⎢ | ⎣ | ⎧ | ⎨ | ⎩ | ⎪ | +| **\36x** | � | 〉 | ∫ | ⌠ | ⎮ | ⌡ | ⎞ | ⎟ | +| **\37x** | ⎠ | ⎤ | ⎥ | ⎦ | ⎫ | ⎬ | ⎭ | � | + +**Note**: The octal code `\140` represent the RADICAL EXTENDER character, which is not available in +the Unicode character set. + +## Adobe ZapfDingbats Encoding + +| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | +|---|---|---|---|---|---|---|---|---| +| **\04x** | | ✁ | ✂ | ✃ | ✄ | ☎ | ✆ | ✇ | +| **\05x** | ✈ | ✉ | ☛ | ☞ | ✌ | ✍ | ✎ | ✏ | +| **\06x** | ✐ | ✑ | ✒ | ✓ | ✔ | ✕ | ✖ | ✗ | +| **\07x** | ✘ | ✙ | ✚ | ✛ | ✜ | ✝ | ✞ | ✟ | +| **\10x** | ✠ | ✡ | ✢ | ✣ | ✤ | ✥ | ✦ | ✧ | +| **\11x** | ★ | ✩ | ✪ | ✫ | ✬ | ✭ | ✮ | ✯ | +| **\12x** | ✰ | ✱ | ✲ | ✳ | ✴ | ✵ | ✶ | ✷ | +| **\13x** | ✸ | ✹ | ✺ | ✻ | ✼ | ✽ | ✾ | ✿ | +| **\14x** | ❀ | ❁ | ❂ | ❃ | ❄ | ❅ | ❆ | ❇ | +| **\15x** | ❈ | ❉ | ❊ | ❋ | ● | ❍ | ■ | ❏ | +| **\16x** | ❐ | ❑ | ❒ | ▲ | ▼ | ◆ | ❖ | ◗ | +| **\17x** | ❘ | ❙ | ❚ | ❛ | ❜ | ❝ | ❞ | � | +| **\20x** | ❨ | ❩ | ❪ | ❫ | ❬ | ❭ | ❮ | ❯ | +| **\21x** | ❰ | ❱ | ❲ | ❳ | ❴ | ❵ | � | � | +| **\24x** | � | ❡ | ❢ | ❣ | ❤ | ❥ | ❦ | ❧ | +| **\25x** | ♣ | ♦ | ♥ | ♠ | ① | ② | ③ | ④ | +| **\26x** | ⑤ | ⑥ | ⑦ | ⑧ | ⑨ | ⑩ | ❶ | ❷ | +| **\27x** | ❸ | ❹ | ❺ | ❻ | ❼ | ❽ | ❾ | ❿ | +| **\30x** | ➀ | ➁ | ➂ | ➃ | ➄ | ➅ | ➆ | ➇ | +| **\31x** | ➈ | ➉ | ➊ | ➋ | ➌ | ➍ | ➎ | ➏ | +| **\32x** | ➐ | ➑ | ➒ | ➓ | ➔ | → | ↔ | ↕ | +| **\33x** | ➘ | ➙ | ➚ | ➛ | ➜ | ➝ | ➞ | ➟ | +| **\34x** | ➠ | ➡ | ➢ | ➣ | ➤ | ➥ | ➦ | ➧ | +| **\35x** | ➨ | ➩ | ➪ | ➫ | ➬ | ➭ | ➮ | ➯ | +| **\36x** | � | ➱ | ➲ | ➳ | ➴ | ➵ | ➶ | ➷ | +| **\37x** | ➸ | ➹ | ➺ | ➻ | ➼ | ➽ | ➾ | � | diff --git a/doc/techref/index.md b/doc/techref/index.md new file mode 100644 index 00000000000..69fb22572de --- /dev/null +++ b/doc/techref/index.md @@ -0,0 +1,7 @@ +# Technical Reference + +```{toctree} +:maxdepth: 1 + +encodings.md +``` diff --git a/pygmt/encodings.py b/pygmt/encodings.py new file mode 100644 index 00000000000..401af7830c0 --- /dev/null +++ b/pygmt/encodings.py @@ -0,0 +1,124 @@ +""" +Adobe character encodings supported by GMT. + +Currently, only Adobe Symbol, Adobe ZapfDingbats, and Adobe ISOLatin1+ encodings are +supported. + +The corresponding Unicode characters in each Adobe chararacter encoding are generated +from the mapping table and conversion script in the GMT-octal-codes +(https://github.com/seisman/GMT-octal-codes) repository. Refer to that repository for +details. + +Some code points are undefined and are assigned with the replacement characeter +(``\ufffd``). + +References +---------- + +- GMT-octal-codes: https://github.com/seisman/GMT-octal-codes +- GMT official documentation: https://docs.generic-mapping-tools.org/dev/reference/octal-codes.html +- Adobe Postscript Language Reference: https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf +- Adobe Symbol: https://en.wikipedia.org/wiki/Symbol_(typeface) +- Zapf Dingbats: https://en.wikipedia.org/wiki/Zapf_Dingbats +- ISO-8859-1: https://en.wikipedia.org/wiki/ISO/IEC_8859-1 +- ISOLatin1+: https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding +- Adobe Glyph List: https://github.com/adobe-type-tools/agl-aglfn +""" + +# Dictionary of character mappings for different encodings. +charset: dict = {} + +# Adobe ISOLatin1+ charset. +# Most characters are the same in ISOLatin1+ and ISO-8859-1 encodings. +charset["ISOLatin1+"] = { + i: chr(i) for i in [*range(0o040, 0o177), *range(0o240, 0o400)] +} +# Handle characters that are different in ISOLatin1+ and ISO-8859-1 encodings. +charset["ISOLatin1+"].update( + { + 0o047: "\u2019", # Change "Apostrophe" to "Right Single Quotation Mark" + 0o055: "\u2212", # Change "Hyphen-minus" to "Minus Sign" + 0o140: "\u2018", # Change "Grave Accent" to "Left Single Quotation Mark" + 0o177: "\u0161", # Set to "Latin Small Letter S with Caron" + } +) +# Add extended characters in ISOLatin1+. +charset["ISOLatin1+"].update( + dict( + zip( + [*range(0o030, 0o040), *range(0o200, 0o240)], + "\ufffd\u2022\u2026\u2122\u2014\u2013\ufb01\u017e" + "\u0152\u2020\u2021\u0141\u2044\u2039\u0160\u203a" + "\u0153\u0178\u017d\u0142\u2030\u201e\u201c\u201d" + "\u0131\u0060\u00b4\u02c6\u02dc\u00af\u02d8\u02d9" + "\u00a8\u201a\u02da\u00b8\u0027\u02dd\u02db\u02c7", + strict=False, + ) + ) +) + +# Adobe Symbol charset. +charset["Symbol"] = dict( + zip( + [*range(0o040, 0o200), *range(0o240, 0o400)], + "\u0020\u0021\u2200\u0023\u2203\u0025\u0026\u220b" + "\u0028\u0029\u2217\u002b\u002c\u2212\u002e\u002f" + "\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" + "\u0038\u0039\u003a\u003b\u003c\u003d\u003e\u003f" + "\u2245\u0391\u0392\u03a7\u2206\u0395\u03a6\u0393" + "\u0397\u0399\u03d1\u039a\u039b\u039c\u039d\u039f" + "\u03a0\u0398\u03a1\u03a3\u03a4\u03a5\u03c2\u2126" + "\u039e\u03a8\u0396\u005b\u2234\u005d\u22a5\u005f" + "\uf8e5\u03b1\u03b2\u03c7\u03b4\u03b5\u03c6\u03b3" + "\u03b7\u03b9\u03d5\u03ba\u03bb\u03bc\u03bd\u03bf" + "\u03c0\u03b8\u03c1\u03c3\u03c4\u03c5\u03d6\u03c9" + "\u03be\u03c8\u03b6\u007b\u007c\u007d\u223c\ufffd" + "\u20ac\u03d2\u2032\u2264\u2215\u221e\u0192\u2663" + "\u2666\u2665\u2660\u2194\u2190\u2191\u2192\u2193" + "\u00b0\u00b1\u2033\u2265\u00d7\u221d\u2202\u2022" + "\u00f7\u2260\u2261\u2248\u2026\u23d0\u23af\u21b5" + "\u2135\u2111\u211c\u2118\u2297\u2295\u2205\u2229" + "\u222a\u2283\u2287\u2284\u2282\u2286\u2208\u2209" + "\u2220\u2207\u00ae\u00a9\u2122\u220f\u221a\u22c5" + "\u00ac\u2227\u2228\u21d4\u21d0\u21d1\u21d2\u21d3" + "\u25ca\u2329\u00ae\u00a9\u2122\u2211\u239b\u239c" + "\u239d\u23a1\u23a2\u23a3\u23a7\u23a8\u23a9\u23aa" + "\ufffd\u232a\u222b\u2320\u23ae\u2321\u239e\u239f" + "\u23a0\u23a4\u23a5\u23a6\u23ab\u23ac\u23ad\ufffd", + strict=False, + ) +) + +# Adobe ZapfDingbats charset. +charset["ZapfDingbats"] = dict( + zip( + [*range(0o040, 0o220), *range(0o240, 0o400)], + "\u0020\u2701\u2702\u2703\u2704\u260e\u2706\u2707" + "\u2708\u2709\u261b\u261e\u270c\u270d\u270e\u270f" + "\u2710\u2711\u2712\u2713\u2714\u2715\u2716\u2717" + "\u2718\u2719\u271a\u271b\u271c\u271d\u271e\u271f" + "\u2720\u2721\u2722\u2723\u2724\u2725\u2726\u2727" + "\u2605\u2729\u272a\u272b\u272c\u272d\u272e\u272f" + "\u2730\u2731\u2732\u2733\u2734\u2735\u2736\u2737" + "\u2738\u2739\u273a\u273b\u273c\u273d\u273e\u273f" + "\u2740\u2741\u2742\u2743\u2744\u2745\u2746\u2747" + "\u2748\u2749\u274a\u274b\u25cf\u274d\u25a0\u274f" + "\u2750\u2751\u2752\u25b2\u25bc\u25c6\u2756\u25d7" + "\u2758\u2759\u275a\u275b\u275c\u275d\u275e\ufffd" + "\u2768\u2769\u276a\u276b\u276c\u276d\u276e\u276f" + "\u2770\u2771\u2772\u2773\u2774\u2775\ufffd\ufffd" + "\ufffd\u2761\u2762\u2763\u2764\u2765\u2766\u2767" + "\u2663\u2666\u2665\u2660\u2460\u2461\u2462\u2463" + "\u2464\u2465\u2466\u2467\u2468\u2469\u2776\u2777" + "\u2778\u2779\u277a\u277b\u277c\u277d\u277e\u277f" + "\u2780\u2781\u2782\u2783\u2784\u2785\u2786\u2787" + "\u2788\u2789\u278a\u278b\u278c\u278d\u278e\u278f" + "\u2790\u2791\u2792\u2793\u2794\u2192\u2194\u2195" + "\u2798\u2799\u279a\u279b\u279c\u279d\u279e\u279f" + "\u27a0\u27a1\u27a2\u27a3\u27a4\u27a5\u27a6\u27a7" + "\u27a8\u27a9\u27aa\u27ab\u27ac\u27ad\u27ae\u27af" + "\ufffd\u27b1\u27b2\u27b3\u27b4\u27b5\u27b6\u27b7" + "\u27b8\u27b9\u27ba\u27bb\u27bc\u27bd\u27be\ufffd", + strict=False, + ) +) diff --git a/pygmt/helpers/utils.py b/pygmt/helpers/utils.py index 781e0e4533f..4f6dcfb7ced 100644 --- a/pygmt/helpers/utils.py +++ b/pygmt/helpers/utils.py @@ -2,7 +2,6 @@ Utilities and common tasks for wrapping the GMT modules. """ -# ruff: noqa: RUF001 import os import pathlib import shutil @@ -16,6 +15,7 @@ from typing import Any import xarray as xr +from pygmt.encodings import charset from pygmt.exceptions import GMTInvalidInput @@ -205,31 +205,31 @@ def data_kind(data=None, x=None, y=None, z=None, required_z=False, required_data return kind -def non_ascii_to_octal(argstr): +def non_ascii_to_octal(argstr: str) -> str: r""" Translate non-ASCII characters to their corresponding octal codes. - Currently, only characters in the ISOLatin1+ charset and - Symbol/ZapfDingbats fonts are supported. + Currently, only characters in the ISOLatin1+ charset and Symbol/ZapfDingbats fonts + are supported. Parameters ---------- - argstr : str + argstr The string to be translated. Returns ------- - translated_argstr : str + translated_argstr The translated string. Examples -------- >>> non_ascii_to_octal("•‰“”±°ÿ") - '\\31\\214\\216\\217\\261\\260\\377' - >>> non_ascii_to_octal("αζΔΩ∑π∇") + '\\031\\214\\216\\217\\261\\260\\377' + >>> non_ascii_to_octal("αζ∆Ω∑π∇") '@~\\141@~@~\\172@~@~\\104@~@~\\127@~@~\\345@~@~\\160@~@~\\321@~' >>> non_ascii_to_octal("✁❞❡➾") - '@%34%\\41@%%@%34%\\176@%%@%34%\\241@%%@%34%\\376@%%' + '@%34%\\041@%%@%34%\\176@%%@%34%\\241@%%@%34%\\376@%%' >>> non_ascii_to_octal("ABC ±120° DEF α ♥") 'ABC \\261120\\260 DEF @~\\141@~ @%34%\\252@%%' """ # noqa: RUF002 @@ -238,88 +238,15 @@ def non_ascii_to_octal(argstr): return argstr # Dictionary mapping non-ASCII characters to octal codes - mapping = {} - - # Adobe Symbol charset - # References: - # 1. https://en.wikipedia.org/wiki/Symbol_(typeface) - # 2. https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt - # Notes: - # 1. \322 and \342 are "REGISTERED SIGN SERIF" and - # "REGISTERED SIGN SANS SERIF" respectively, but only "REGISTERED SIGN" - # is available in the unicode table. So both are mapped to - # "REGISTERED SIGN". \323, \343, \324 and \344 also have the same - # problem. - # 2. Characters for \140, \275, \276 are incorrect. + mapping: dict = {} + # Adobe Symbol charset. + mapping.update({c: f"@~\\{i:03o}@~" for i, c in charset["Symbol"].items()}) + # Adobe ZapfDingbats charset. Font number is 34. mapping.update( - { - c: "@~\\" + format(i, "o") + "@~" - for c, i in zip( - " !∀#∃%&∋()∗+,−./" # \04x-05x - "0123456789:;<=>?" # \06x-07x - "≅ΑΒΧΔΕΦΓΗΙϑΚΛΜΝΟ" # \10x-11x - "ΠΘΡΣΤΥςΩΞΨΖ[∴]⊥_" # \12x-13x - "αβχδεφγηιϕκλμνο" # \14x-15x - "πθρστυϖωξψζ{|}∼" # \16x-17x. \177 is undefined - "€ϒ′≤⁄∞ƒ♣♦♥♠↔←↑→↓" # \24x-\25x - "°±″≥×∝∂•÷≠≡≈…↵" # \26x-27x - "ℵℑℜ℘⊗⊕∅∩∪⊃⊇⊄⊂⊆∈∉" # \30x-31x - "∠∇®©™∏√⋅¬∧∨⇔⇐⇑⇒⇓" # \32x-33x - "◊〈®©™∑" # \34x-35x - "〉∫⌠⌡", # \36x-37x. \360 and \377 are undefined - [*range(32, 127), *range(160, 240), *range(241, 255)], - strict=True, - ) - } + {c: f"@%34%\\{i:03o}@%%" for i, c in charset["ZapfDingbats"].items()} ) - - # Adobe ZapfDingbats charset - # References: - # 1. https://en.wikipedia.org/wiki/Zapf_Dingbats - # 2. https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt - mapping.update( - { - c: "@%34%\\" + format(i, "o") + "@%%" - for c, i in zip( - " ✁✂✃✄☎✆✇✈✉☛☞✌✍✎✏" # \04x-\05x - "✐✑✒✓✔✕✖✗✘✙✚✛✜✝✞✟" # \06x-\07x - "✠✡✢✣✤✥✦✧★✩✪✫✬✭✮✯" # \10x-\11x - "✰✱✲✳✴✵✶✷✸✹✺✻✼✽✾✿" # \12x-\13x - "❀❁❂❃❄❅❆❇❈❉❊❋●❍■❏" # \14x-\15x - "❐❑❒▲▼◆❖◗❘❙❚❛❜❝❞" # \16x-\17x. \177 is undefined - "❡❢❣❤❥❦❧♣♦♥♠①②③④" # \24x-\25x. \240 is undefined - "⑤⑥⑦⑧⑨⑩❶❷❸❹❺❻❼❽❾❿" # \26x-\27x - "➀➁➂➃➄➅➆➇➈➉➊➋➌➍➎➏" # \30x-\31x - "➐➑➒➓➔→↔↕➘➙➚➛➜➝➞➟" # \32x-\33x - "➠➡➢➣➤➥➦➧➨➩➪➫➬➭➮➯" # \34x-\35x - "➱➲➳➴➵➶➷➸➹➺➻➼➽➾", # \36x-\37x. \360 and \377 are undefined - [*range(32, 127), *range(161, 240), *range(241, 255)], - strict=True, - ) - } - ) - - # Adobe ISOLatin1+ charset (i.e., ISO-8859-1 with extensions) - # References: - # 1. https://en.wikipedia.org/wiki/ISO/IEC_8859-1 - # 2. https://docs.generic-mapping-tools.org/dev/reference/octal-codes.html - # 3. https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf - mapping.update( - { - c: "\\" + format(i, "o") - for c, i in zip( - "•…™—–fiž" # \03x. \030 is undefined - "’‘" # \047 and \140 - "š" # \177 - "Œ†‡Ł⁄‹Š›œŸŽł‰„“”" # \20x-\21x - "ı`´ˆ˜¯˘˙¨‚˚¸'˝˛ˇ", # \22x-\23x - [*range(25, 32), 39, 96, *range(127, 160)], - strict=True, - ) - } - ) - # \240-\377 - mapping.update({chr(i): "\\" + format(i, "o") for i in range(160, 256)}) + # Adobe ISOLatin1+ charset. + mapping.update({c: f"\\{i:03o}" for i, c in charset["ISOLatin1+"].items()}) # Remove any printable characters mapping = {k: v for k, v in mapping.items() if k not in string.printable} diff --git a/pygmt/tests/test_text.py b/pygmt/tests/test_text.py index ab07e964954..6bd2c61383e 100644 --- a/pygmt/tests/test_text.py +++ b/pygmt/tests/test_text.py @@ -417,7 +417,7 @@ def test_text_nonascii(): fig.basemap(region=[0, 10, 0, 10], projection="X10c", frame=True) fig.text(position="TL", text="position-text:°α") # noqa: RUF001 fig.text(x=1, y=1, text="xytext:°α") # noqa: RUF001 - fig.text(x=[5, 5], y=[3, 5], text=["xytext1:αζΔ❡", "xytext2:∑π∇✉"]) + fig.text(x=[5, 5], y=[3, 5], text=["xytext1:αζ∆❡", "xytext2:∑π∇✉"]) return fig From f26874345c273770d2e5ee3ca414455ae33b170e Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Sat, 22 Jun 2024 13:08:46 +0800 Subject: [PATCH 2/9] Show all characters for ISOLatin1+ encoding --- pygmt/encodings.py | 58 ++++++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 25 deletions(-) diff --git a/pygmt/encodings.py b/pygmt/encodings.py index 401af7830c0..abcaab79db4 100644 --- a/pygmt/encodings.py +++ b/pygmt/encodings.py @@ -29,31 +29,39 @@ charset: dict = {} # Adobe ISOLatin1+ charset. -# Most characters are the same in ISOLatin1+ and ISO-8859-1 encodings. -charset["ISOLatin1+"] = { - i: chr(i) for i in [*range(0o040, 0o177), *range(0o240, 0o400)] -} -# Handle characters that are different in ISOLatin1+ and ISO-8859-1 encodings. -charset["ISOLatin1+"].update( - { - 0o047: "\u2019", # Change "Apostrophe" to "Right Single Quotation Mark" - 0o055: "\u2212", # Change "Hyphen-minus" to "Minus Sign" - 0o140: "\u2018", # Change "Grave Accent" to "Left Single Quotation Mark" - 0o177: "\u0161", # Set to "Latin Small Letter S with Caron" - } -) -# Add extended characters in ISOLatin1+. -charset["ISOLatin1+"].update( - dict( - zip( - [*range(0o030, 0o040), *range(0o200, 0o240)], - "\ufffd\u2022\u2026\u2122\u2014\u2013\ufb01\u017e" - "\u0152\u2020\u2021\u0141\u2044\u2039\u0160\u203a" - "\u0153\u0178\u017d\u0142\u2030\u201e\u201c\u201d" - "\u0131\u0060\u00b4\u02c6\u02dc\u00af\u02d8\u02d9" - "\u00a8\u201a\u02da\u00b8\u0027\u02dd\u02db\u02c7", - strict=False, - ) +charset["ISOLatin1+"] = dict( + zip( + range(0o030, 0o400), + "\ufffd\u2022\u2026\u2122\u2014\u2013\ufb01\u017e" + "\u0020\u0021\u0022\u0023\u0024\u0025\u0026\u2019" + "\u0028\u0029\u002a\u002b\u002c\u2212\u002e\u002f" + "\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" + "\u0038\u0039\u003a\u003b\u003c\u003d\u003e\u003f" + "\u0040\u0041\u0042\u0043\u0044\u0045\u0046\u0047" + "\u0048\u0049\u004a\u004b\u004c\u004d\u004e\u004f" + "\u0050\u0051\u0052\u0053\u0054\u0055\u0056\u0057" + "\u0058\u0059\u005a\u005b\u005c\u005d\u005e\u005f" + "\u2018\u0061\u0062\u0063\u0064\u0065\u0066\u0067" + "\u0068\u0069\u006a\u006b\u006c\u006d\u006e\u006f" + "\u0070\u0071\u0072\u0073\u0074\u0075\u0076\u0077" + "\u0078\u0079\u007a\u007b\u007c\u007d\u007e\u0161" + "\u0152\u2020\u2021\u0141\u2044\u2039\u0160\u203a" + "\u0153\u0178\u017d\u0142\u2030\u201e\u201c\u201d" + "\u0131\u0060\u00b4\u02c6\u02dc\u00af\u02d8\u02d9" + "\u00a8\u201a\u02da\u00b8\u0027\u02dd\u02db\u02c7" + "\u0020\u00a1\u00a2\u00a3\u00a4\u00a5\u00a6\u00a7" + "\u00a8\u00a9\u00aa\u00ab\u00ac\u002d\u00ae\u00af" + "\u00b0\u00b1\u00b2\u00b3\u00b4\u00b5\u00b6\u00b7" + "\u00b8\u00b9\u00ba\u00bb\u00bc\u00bd\u00be\u00bf" + "\u00c0\u00c1\u00c2\u00c3\u00c4\u00c5\u00c6\u00c7" + "\u00c8\u00c9\u00ca\u00cb\u00cc\u00cd\u00ce\u00cf" + "\u00d0\u00d1\u00d2\u00d3\u00d4\u00d5\u00d6\u00d7" + "\u00d8\u00d9\u00da\u00db\u00dc\u00dd\u00de\u00df" + "\u00e0\u00e1\u00e2\u00e3\u00e4\u00e5\u00e6\u00e7" + "\u00e8\u00e9\u00ea\u00eb\u00ec\u00ed\u00ee\u00ef" + "\u00f0\u00f1\u00f2\u00f3\u00f4\u00f5\u00f6\u00f7" + "\u00f8\u00f9\u00fa\u00fb\u00fc\u00fd\u00fe\u00ff", + strict=False, ) ) From 5093b2c81c91894aa883f59c783fbbae4a487196 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Sun, 23 Jun 2024 12:45:00 +0800 Subject: [PATCH 3/9] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Yvonne Fröhlich <94163266+yvonnefroehlich@users.noreply.github.com> --- pygmt/encodings.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pygmt/encodings.py b/pygmt/encodings.py index abcaab79db4..60d47bd79b9 100644 --- a/pygmt/encodings.py +++ b/pygmt/encodings.py @@ -4,12 +4,12 @@ Currently, only Adobe Symbol, Adobe ZapfDingbats, and Adobe ISOLatin1+ encodings are supported. -The corresponding Unicode characters in each Adobe chararacter encoding are generated +The corresponding Unicode characters in each Adobe character encoding are generated from the mapping table and conversion script in the GMT-octal-codes (https://github.com/seisman/GMT-octal-codes) repository. Refer to that repository for details. -Some code points are undefined and are assigned with the replacement characeter +Some code points are undefined and are assigned with the replacement character (``\ufffd``). References From b4440a5971c564081a58e34ef15b8d79ef9f75d5 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Sun, 23 Jun 2024 22:56:15 +0800 Subject: [PATCH 4/9] Add a comment --- pygmt/helpers/utils.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pygmt/helpers/utils.py b/pygmt/helpers/utils.py index 4f6dcfb7ced..8997a2b0df1 100644 --- a/pygmt/helpers/utils.py +++ b/pygmt/helpers/utils.py @@ -245,7 +245,7 @@ def non_ascii_to_octal(argstr: str) -> str: mapping.update( {c: f"@%34%\\{i:03o}@%%" for i, c in charset["ZapfDingbats"].items()} ) - # Adobe ISOLatin1+ charset. + # Adobe ISOLatin1+ charset. Put at the end. mapping.update({c: f"\\{i:03o}" for i, c in charset["ISOLatin1+"].items()}) # Remove any printable characters From 062268f221c8f5aa76172ca9a3873bb6213ddff0 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Tue, 25 Jun 2024 12:57:37 +0800 Subject: [PATCH 5/9] Remove one comment line --- pygmt/encodings.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/pygmt/encodings.py b/pygmt/encodings.py index 60d47bd79b9..2cfda9b5728 100644 --- a/pygmt/encodings.py +++ b/pygmt/encodings.py @@ -18,10 +18,9 @@ - GMT-octal-codes: https://github.com/seisman/GMT-octal-codes - GMT official documentation: https://docs.generic-mapping-tools.org/dev/reference/octal-codes.html - Adobe Postscript Language Reference: https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf +- ISOLatin1+: https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding - Adobe Symbol: https://en.wikipedia.org/wiki/Symbol_(typeface) - Zapf Dingbats: https://en.wikipedia.org/wiki/Zapf_Dingbats -- ISO-8859-1: https://en.wikipedia.org/wiki/ISO/IEC_8859-1 -- ISOLatin1+: https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding - Adobe Glyph List: https://github.com/adobe-type-tools/agl-aglfn """ From 46847d1d62eb86f1f460a7de7226d9611e427927 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Wed, 26 Jun 2024 07:46:22 +0800 Subject: [PATCH 6/9] Apply suggestions from code review Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> --- doc/techref/encodings.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/techref/encodings.md b/doc/techref/encodings.md index ce09b37f96f..638370e9bd3 100644 --- a/doc/techref/encodings.md +++ b/doc/techref/encodings.md @@ -1,10 +1,10 @@ # Supported Encodings and Non-ASCII Characters GMT supports a number of encodings and each encoding contains a set of ASCII and non-ASCII -characters. Below are a few of the most common encodings and the characters they support. +characters. Below are some of the most common encodings and characters that are supported. In PyGMT, you can use any of these ASCII and non-ASCII characters in arguments and text -strings. When use non-ASCII characters in PyGMT, the easiest way is to copy and paste +strings. When using non-ASCII characters in PyGMT, the easiest way is to copy and paste the character from the tables below. **Note**: The special character � (REPLACEMENT CHARACTER) is used to indicate that @@ -73,7 +73,7 @@ the character is not defined in the encoding. | **\36x** | � | 〉 | ∫ | ⌠ | ⎮ | ⌡ | ⎞ | ⎟ | | **\37x** | ⎠ | ⎤ | ⎥ | ⎦ | ⎫ | ⎬ | ⎭ | � | -**Note**: The octal code `\140` represent the RADICAL EXTENDER character, which is not available in +**Note**: The octal code `\140` represents the RADICAL EXTENDER character, which is not available in the Unicode character set. ## Adobe ZapfDingbats Encoding From f2b3dcb81eebdb402292fef1ad8773ecd905f125 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Wed, 26 Jun 2024 08:08:30 +0800 Subject: [PATCH 7/9] Add a short description for the Technical Reference section --- doc/techref/index.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/techref/index.md b/doc/techref/index.md index 69fb22572de..ab9e48c32d9 100644 --- a/doc/techref/index.md +++ b/doc/techref/index.md @@ -1,5 +1,10 @@ # Technical Reference +The Technical Reference section provides detailed information on the technical aspects of +GMT and PyGMT, including supported encodings, fonts, bit and hachure patterns, and other +essential components for creating high-quality visualizations. For additional details, +visit the [GMT Technical Reference](https://docs.generic-mapping-tools.org/dev/reference.html). + ```{toctree} :maxdepth: 1 From b860f2e304577f777cefb0601393747ade606494 Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Wed, 26 Jun 2024 08:14:10 +0800 Subject: [PATCH 8/9] Fix styling --- doc/techref/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/techref/index.md b/doc/techref/index.md index ab9e48c32d9..76a01760f02 100644 --- a/doc/techref/index.md +++ b/doc/techref/index.md @@ -1,8 +1,8 @@ # Technical Reference The Technical Reference section provides detailed information on the technical aspects of -GMT and PyGMT, including supported encodings, fonts, bit and hachure patterns, and other -essential components for creating high-quality visualizations. For additional details, +GMT and PyGMT, including supported encodings, fonts, bit and hachure patterns, and other +essential components for creating high-quality visualizations. For additional details, visit the [GMT Technical Reference](https://docs.generic-mapping-tools.org/dev/reference.html). ```{toctree} From e02d23f50e2993fac67450682eb604f8e5606d6b Mon Sep 17 00:00:00 2001 From: Dongdong Tian Date: Sat, 29 Jun 2024 09:12:16 +0800 Subject: [PATCH 9/9] Update doc/techref/index.md Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com> --- doc/techref/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/techref/index.md b/doc/techref/index.md index 76a01760f02..bf22ff1acc5 100644 --- a/doc/techref/index.md +++ b/doc/techref/index.md @@ -3,7 +3,7 @@ The Technical Reference section provides detailed information on the technical aspects of GMT and PyGMT, including supported encodings, fonts, bit and hachure patterns, and other essential components for creating high-quality visualizations. For additional details, -visit the [GMT Technical Reference](https://docs.generic-mapping-tools.org/dev/reference.html). +visit the :gmt-docs:`GMT Technical Reference `. ```{toctree} :maxdepth: 1