Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

johnnyshields · 2023-04-20T18:48:24Z

The following are not "zenkaku" characters. It causes issues when working with Greek/Cyrillic text which is not intended to be considered Japanese. I think these should be removed.

    ZEN_GREEK => /[Α-Ωα-ω]/,
    ZEN_CYRILLIC => /[А-Яа-я]/,

The text was updated successfully, but these errors were encountered:

gimite · 2023-04-21T13:22:31Z

Thanks for the feedback. This library is, as you can guess from its name, designed to process text mainly written in Japanese. In context of Japanese, Greek and Cyrillic letters have traditionally been considered Zenkaku. See e.g., https://ja.wikipedia.org/wiki/%E5%85%A8%E8%A7%92%E3%81%A8%E5%8D%8A%E8%A7%92 for background. So this is by design.

The library also provides finer-grained character types and the methods can take target character types as a parameter. So you can exclude Greek and Cyrillic letters from the target if it is the requirement of your application. Also, if you want a more international solution, not just Japanese, I recommend using a library handling Unicode and use its East_Asian_Width attribute.

johnnyshields · 2023-04-21T18:16:14Z

I don't understand. Characters such as Ω are not zenkaku (double-width) characters. At a minimum, they should be renamed to HAN_GREEK / HAN_CYRILLIC

johnnyshields changed the title ~~Remove ZEN_GREEK and ZEN_CYRILLIC~~ Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE Apr 20, 2023

johnnyshields mentioned this issue Apr 20, 2023

Major refactor -- version 2.0.0 #6

Open

gimite closed this as completed Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

johnnyshields commented Apr 20, 2023

gimite commented Apr 21, 2023

johnnyshields commented Apr 21, 2023

Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

Comments

johnnyshields commented Apr 20, 2023

gimite commented Apr 21, 2023

johnnyshields commented Apr 21, 2023