Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ack + UTF FAQ #107

Open
n1vux opened this issue Aug 8, 2019 · 0 comments
Open

Ack + UTF FAQ #107

n1vux opened this issue Aug 8, 2019 · 0 comments

Comments

@n1vux
Copy link
Contributor

n1vux commented Aug 8, 2019

Using Ack with UTF data is now a FAQ, as we have had a second query. (One UTF-16, one UTF-8 Multibyte / Wide character ) Ref ack3:153 and ack3:222 q.v..

AFAIK,

  • Ack does not (yet) honor $LOCALE etc for files scanned; the assumed problem domain is ASCII program source code, not Natural Language.
  • Ack3 is serendipitously able to process Latin 1 UTF8 files -- Eurpean accented characters -- which covers most cases of UTF-8 in e.g. Perl sourcecode
  • Ack can not process UTF-8 multibyte / Wide character data -- everything non-European -- , and can not process files saved as UCS/UTF-16/UTF-32 (even if pure Latin 1 characters).
  • We have a workaround (in above cited issues) for processing files with multibyte characters as appropriate UTF provided all files are processable the same way (ASCII and UTF-8 intermingle OK, but UTF16LE and UTF16BE do not unless they have BOM), but it requires Perl $OLD_PERL_VERSION < 5.029000 . (The use of Encoding on sysread is fatally deprecated in 5.30 (5.29+), which defeats the workaround; Warnings in 5.24-5.28.)
  • Linux does not accept a global Local UTF-16. Weird but true.
  • The reason for not immediately adding UTF de-encoding after our sysread according to global Locale, commandline flag, or file BOM (byte order marker) is the test case combinatorial explosion for our test suite. We won't ship the feature unless we know it's not harming the relied upon functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant