Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fine-tuning scripts #680

Draft
wants to merge 85 commits into
base: dev
Choose a base branch
from
Draft

Add fine-tuning scripts #680

wants to merge 85 commits into from

Conversation

ain-soph
Copy link
Contributor

@ain-soph ain-soph commented Aug 11, 2024

Add fine-tuning scripts. The commands are provided at the top of each file.

There are a few items to note:

  1. I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving utils directory or put scripts into examples folder).
  2. The current fine-tuning scripts are not with very good performance. We need to test different hyper-parameters (lr, etc.) and provide benchmark results.
  3. For the used dataset from Xz乔希, I'm wondering if we should put it in another repo
    https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py

cc @fumiama

@ain-soph
Copy link
Contributor Author

The codes in utils should be put separately in ChatTTS folder.

I put them under ChatTTS.utils.finetune now.

And I removed the dummy data. You may also want to review the Xz dataset codes. I have the google drive link in it and I don't know if I shall put it there.
https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py

Copy link
Member

@fumiama fumiama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give the advice of file arranging first. As for the code, you should wrap more methods into classes to provide a good abstraction. I advice you to use the default log logging and let user to decide the log instance, like what ChatTTS class did. If you agree with me, the log part of your code should be separated into tools/logger and you can see an instance there. You can create another instance or modify that simple instance.

ChatTTS/utils/finetune/dataset.py Outdated Show resolved Hide resolved
ChatTTS/utils/finetune/logger.py Outdated Show resolved Hide resolved
ChatTTS/utils/finetune/output.py Outdated Show resolved Hide resolved
ChatTTS/utils/finetune/model.py Outdated Show resolved Hide resolved
ChatTTS/utils/finetune/train.py Outdated Show resolved Hide resolved
@gafield-liu
Copy link

这里想请教下,如果想针对新的音色进行模型精调,是只训练spk_emb矩阵嘛?还是需要同时训练spk_emb,gpt相关模块呀?

@gafield-liu
Copy link

gafield-liu commented Aug 28, 2024

这里想请教下,如果想针对新的音色进行模型精调,是只训练spk_emb矩阵嘛?还是需要同时训练spk_emb,gpt相关模块呀?

我尝试针对新的音色,固定or训练spk_emb,固定or训练gpt.gpt模块,固定or训练decoder模块,loss使用的就是mel频谱的mse loss和语音logits的交叉熵,但始终不能得到一个很稳定(音色相似or稳定)的模型表现。

想请问可以指导一下吗~
@fumiama @ain-soph

@ain-soph
Copy link
Contributor Author

@gafield-liu 训练效果确实不太行,可能得调一调训练参数。我现在的只是随便写的

@gafield-liu
Copy link

@gafield-liu 训练效果确实不太行,可能得调一调训练参数。我现在的只是随便写的

这里应该缺少了语音embedding的提取模块,随机初始化的话音色精调出来效果不行~

@lpscr
Copy link

lpscr commented Oct 6, 2024

Hi @ain-soph, and @fumiama

Thank you so much for your hard work and the fine-tuning. I found this project just a day ago, and I’m happy to say I was able to fine-tune without any errors using VDAE and GPTSpeakers

I just tried the new update Merge branch '2noise'. today to Fine-tuning DVAE worked fine, but I got an error when trying to fine-tune GPT. Here’s the error message i get

ChatTTS\utils\finetune\model.py", line 204, in get_hidden_states_and_labels
inputs_embeds = chat.gpt.forward(input_ids=input_ids, text_mask=text_mask)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

I really appreciate all your work and would be grateful for any help with this error.

Thanks again for your time!

@ain-soph
Copy link
Contributor Author

ain-soph commented Oct 6, 2024

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days.
It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

@fumiama
Copy link
Member

fumiama commented Oct 9, 2024

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days. It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

Appreciate. I will do it at your next push that you fix the test.

@ain-soph
Copy link
Contributor Author

ain-soph commented Oct 10, 2024

@fumiama The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 .
While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

@fumiama
Copy link
Member

fumiama commented Oct 11, 2024

The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 . While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

Well, if there's nothing MUST require python>=3.12, the compatibility should be kept the same as former version.

@ain-soph
Copy link
Contributor Author

ain-soph commented Nov 5, 2024

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

@fumiama
Copy link
Member

fumiama commented Nov 5, 2024

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

Maybe you should use List[int] to avoid this problem because this is a compatibility issue that can be solved as long as you import List but not use list. Also, there're many devices that stick at old version of python/pytorch for some reasons and we should not drop a version of support except there's a significant point that make us have to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm Algorithm improvements & issues enhancement New feature or request
Projects
None yet
4 participants