New perl_parity:bool
argument for MosesPunctNormalizer
that fixes differences between the latest Perl implementation and sacremoses. In a future release this will probably become the default and only behaviour. #146
MosesTokenizer
speed up thanks to precompiled regular expressions #133, #139. Same for MosesDetokenizer
#143.
A couple of bugfixes: The order of the protected_patterns
list passed to MosesTokenizer.tokenize()
is no longer significant. Also, use_known
now works as expected MosesTruecaser.truecase()
. #121. Since this change changes the output, I've decided to bump the version to 0.1.0
to signal a possibly breaking change.
Finally, long gone but never released: No more Python 2 support code (bye six
👋)
This is the first release of sacremoses under HPLT stewardship 🎉