Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] align_one language error with tokenizer #802

Open
Hocine958 opened this issue May 2, 2024 · 1 comment
Open

[BUG] align_one language error with tokenizer #802

Hocine958 opened this issue May 2, 2024 · 1 comment
Assignees
Labels

Comments

@Hocine958
Copy link

Hocine958 commented May 2, 2024

Debugging checklist

[ ] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there?
[X] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version?
[ ] Have you tried rerunning the command with the --clean flag?

Describe the issue
When performing an "align_one" command on japanese files, the "language" passed to to "generate_language_tokenizer()" function (align_one.py line 156) is a string instead of enum, which causes the if in spacy.py at line 56 to be skiped and dict access at line 66 to throw a "KeyError: 'japanese'" exception.

For Reproducing your issue
Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? -> japanese
    • How many files/speakers? -> 1
    • Are you using lab files or TextGrid files for input? -> txt file
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? -> japanese_mfa v3.0.0
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? -> japanese_mfa v3.0.0
    • If it's a model you've trained, what data was it trained on?

Log file

(env) mfauser@46587cd4c6e4:/$ mfa align_one data/japanese/japanese.wav data/japanese/japanese.txt japanese_mfa japanese_mfa data/jap_one_err
Exception ignored in atexit callback: <bound method ExitHooks.history_save_handler of <montreal_forced_aligner.command_line.mfa.ExitHooks object at 0x7fd96f485390>>
Traceback (most recent call last):
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/mfa.py", line 107, in history_save_handler
    raise self.exception
  File "/env/bin/mfa", line 8, in <module>
    sys.exit(mfa_cli())
             ^^^^^^^^^
  File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 360, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/command_line/align_one.py", line 156, in align_one_cli
    tokenizer = generate_language_tokenizer(acoustic_model.meta["language"])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/env/lib/python3.11/site-packages/montreal_forced_aligner/tokenization/spacy.py", line 66, in generate_language_tokenizer
    name = language_model_mapping[language]
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 'japanese'

Desktop (please complete the following information):

  • OS: [e.g. Windows, OSX, Linux] -> Linux
  • Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] -> Ubuntu 20.04
  • Any other details about the setup (Cloud, Docker, etc)

Additional context
Add any other context about the problem here.

@uasolo
Copy link

uasolo commented Jul 23, 2024

I have the same problem (with Mandarin)

pengzhendong added a commit to pengzhendong/Montreal-Forced-Aligner that referenced this issue Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants