Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.1 #149

Merged
merged 20 commits into from
Apr 26, 2024
Merged

v0.4.1 #149

merged 20 commits into from
Apr 26, 2024

Conversation

thammegowda
Copy link
Owner

@thammegowda thammegowda commented May 1, 2023

  • Performance improvement for gzip: use pigz whenever available for reading and writing .gz files.
  • add mtdata cache to download datasets in parallel
  • Better parallelization: parallel and mono data are scheduled at once (previously it was one after the other)
  • Added WMT general test 2022 and 2023
  • mtdata-bcp47 : -p/--pipe to map codes from stdin -> stdout
  • mtdata-bcp47 : --script {suppress-default,suppress-all,express}
 echo -e "en\neng\nfr\nfra\nara\nkan\ntel\neng_Latn\nhin_deva"|  mtdata-bcp47 -p -s express
eng_Latn
eng_Latn
fra_Latn
fra_Latn
ara_Arab
kan_Knda
tel_Telu
eng_Latn
hin_Deva

@thammegowda thammegowda changed the title v0.4.1 [WIP] v0.4.1 May 1, 2023
@thammegowda thammegowda changed the title [WIP] v0.4.1 v0.4.1 Apr 26, 2024
@thammegowda thammegowda merged commit e489ae4 into master Apr 26, 2024
16 checks passed
@thammegowda thammegowda deleted the develop branch May 24, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants