Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language and Speaker ID mapping issue #327

Open
tushar-31093 opened this issue Nov 6, 2024 · 5 comments
Open

Language and Speaker ID mapping issue #327

tushar-31093 opened this issue Nov 6, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@tushar-31093
Copy link

image

I simply wanted to try another speaker model. Made the setting the way it should have worked but somehow under the hood it seems broken.

If we leave it to EN-NEWEST and other langauges it works fine. If we weak it to work with other english models, it fails to do so. I can hardcode it to run it for a specific speaker but that ideally should not be the case.

Any suggestions or I simply hardcode it.

Copy link

dosubot bot commented Nov 6, 2024

Hey there, @tushar-31093! 👋 I'm here to help you with any bugs, questions, or contributions. I'm your friendly neighborhood bot, ready to assist while you wait for a human maintainer.

To resolve the issue with the Language and Speaker ID mapping without hardcoding it for a specific speaker, you can dynamically iterate over the available speaker IDs for each language model. The approach involves iterating over speaker_ids.keys() for each language model, which allows you to process each speaker without hardcoding specific speaker IDs. Here's a relevant code snippet that demonstrates this approach:

for language, text in texts.items():
    model = TTS(language=language, device=device)
    speaker_ids = model.hps.data.spk2id
    
    for speaker_key in speaker_ids.keys():
        speaker_id = speaker_ids[speaker_key]
        speaker_key = speaker_key.lower().replace('_', '-')
        
        source_se = torch.load(f'checkpoints_v2/base_speakers/ses/{speaker_key}.pth', map_location=device)
        model.tts_to_file(text, speaker_id, src_path, speed=speed)
        save_path = f'{output_dir}/output_v2_{speaker_key}.wav'

        # Run the tone color converter
        encode_message = "@MyShell"
        tone_color_converter.convert(
            audio_src_path=src_path, 
            src_se=source_se, 
            tgt_se=target_se, 
            output_path=save_path,
            message=encode_message)

This code dynamically loads the speaker embeddings and processes each speaker, providing flexibility in using different English speaker models [1].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the bug Something isn't working label Nov 6, 2024
@tushar-31093
Copy link
Author

tushar-31093 commented Nov 6, 2024 via email

Copy link

dosubot bot commented Nov 6, 2024

If specifying a speaker ID doesn't work, there might be an issue with how the speaker IDs are being mapped or used in the code. Here are a few things to check:

  1. Speaker ID Mapping: Ensure that the speaker_ids dictionary is correctly populated and that the keys match the expected format. The keys should be processed by converting them to lowercase and replacing underscores with hyphens, as shown in the code snippet.

  2. Speaker Embeddings: Verify that the speaker embeddings are correctly loaded from the specified path. The file path should match the processed speaker key.

  3. Error Handling: Check if there are any error messages or exceptions when specifying a speaker ID. This might give clues about what is going wrong.

  4. Model Configuration: Ensure that the model configuration supports the specified speaker ID. There might be limitations or specific configurations required for certain speaker models.

If these checks don't resolve the issue, consider logging the values of speaker_ids, speaker_key, and speaker_id to ensure they are as expected. This can help identify where the process might be failing.

To continue talking to Dosu, mention @dosu.

@tushar-31093
Copy link
Author

tushar-31093 commented Nov 6, 2024 via email

Copy link

dosubot bot commented Nov 6, 2024

It seems like the issue might be related to how the speaker keys are being processed and mapped in the code. Here are a few steps to help troubleshoot and potentially resolve the issue:

  1. Check Speaker Key Processing: Ensure that the speaker keys are being processed correctly. They should be converted to lowercase and have underscores replaced with hyphens. This is crucial for the mapping to work correctly [1].

  2. Verify Language and Style Support: Make sure that the language and style you are trying to use are supported by the model. The code only supports specific styles for English and Chinese, and any unsupported style will cause the mapping to fail [2].

  3. Inspect the speaker_ids Dictionary: Log the contents of the speaker_ids dictionary to verify that it contains the expected keys and values. This can help identify if there are any discrepancies in the mapping.

  4. Debug the Iteration Logic: Review the iteration logic over the speaker_ids to ensure it correctly handles different language variants like 'en-india' or 'en-br'. If these variants are not explicitly supported, you might need to adjust the logic to accommodate them.

By following these steps, you should be able to identify where the mapping might be failing and make the necessary adjustments to support additional English variants.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant