do not understand the inference process of xtts 2.0 #3223
Liujingxiu23
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I understand the process is:
1.get "cond_latents" from the prompt waves and get text_embeding from the input text
2.gpt_codes = gpt_inference.generate(text, cond_latents)
3.gpt_latents = gpt(text, gpt_codes,cond_latents)
4.HifiDecoder
why 2+3? not just codes -> codes_embeding ?
Beta Was this translation helpful? Give feedback.
All reactions