Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

Closed
johnnyshields opened this issue Apr 20, 2023 · 2 comments · May be fixed by #6
Closed

Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE #5

johnnyshields opened this issue Apr 20, 2023 · 2 comments · May be fixed by #6

Comments

@johnnyshields
Copy link

The following are not "zenkaku" characters. It causes issues when working with Greek/Cyrillic text which is not intended to be considered Japanese. I think these should be removed.

    ZEN_GREEK => /[Α-Ωα-ω]/,
    ZEN_CYRILLIC => /[А-Яа-я]/,
@johnnyshields johnnyshields changed the title Remove ZEN_GREEK and ZEN_CYRILLIC Remove ZEN_GREEK, ZEN_CYRILLIC, and ZEN_LINE Apr 20, 2023
@gimite
Copy link
Owner

gimite commented Apr 21, 2023

Thanks for the feedback. This library is, as you can guess from its name, designed to process text mainly written in Japanese. In context of Japanese, Greek and Cyrillic letters have traditionally been considered Zenkaku. See e.g., https://ja.wikipedia.org/wiki/%E5%85%A8%E8%A7%92%E3%81%A8%E5%8D%8A%E8%A7%92 for background. So this is by design.

The library also provides finer-grained character types and the methods can take target character types as a parameter. So you can exclude Greek and Cyrillic letters from the target if it is the requirement of your application. Also, if you want a more international solution, not just Japanese, I recommend using a library handling Unicode and use its East_Asian_Width attribute.

@gimite gimite closed this as completed Apr 21, 2023
@johnnyshields
Copy link
Author

I don't understand. Characters such as Ω are not zenkaku (double-width) characters. At a minimum, they should be renamed to HAN_GREEK / HAN_CYRILLIC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants