You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Add m2m100 as the new default model to support 100 languages
Added
dlt.lang.m2m100 module: Now has variables for over 100 languages, also auto-complete ready. Example: dlt.lang.m2m100.ENGLISH.
dlt.utils.available_languages, dlt.utils.available_codes: Now supports argument "m2m100"
Available languages for each model family
Script and template to generate available languages
Changed
[BREAKING] dlt.lang.TranslationModel: A new model parameter called model_family in the initialization function. Either "mbart50" or "m2m100". By default, it will be inferred based on model_or_path. Needs to be explicitly set if model_or_path is a path.
[BREAKING] Default model changed to m2m100
Docs and readme about mbart50 were reframed to take into account the new model
dlt.TranslationModel.translate: Improved docstring to be more general.
Tests pertaining to m2m100
scripts/generate_langs.py: Renamed, mechanism now changed to loading from json files
docs/index.md: Expand the "Usage" and "Advanced" sections
README.md: Add acknowledgement about m2m100, significantly trim "Advanced" section, make "Usage" more concise
Fixed
dlt.TranslationModel.available_codes() was returning the languages instead of the codes. It will now correctly return the code.
Removed
Output type hints for TranslationModel.get_transformers_model and TranslationModel.get_tokenizer
[BREAKING] dlt.TranslationModel.bart_model and dlt.TranslationModel.tokenizer are no longer available to be used directly. Please use dlt.TranslationModel.get_transformers_model and dlt.TranslationModel.get_tokenizer instead.