Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

Closed
4 of 7 tasks
kiraraenjoyer opened this issue Jan 23, 2025 · 9 comments
Closed
4 of 7 tasks

[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

kiraraenjoyer opened this issue Jan 23, 2025 · 9 comments

Comments

@kiraraenjoyer
Copy link

Voice Changer Version

vcclient_win_cuda_2.0.73-beta.zip

Operational System

Windows 11

GPU

RTX 4060

CUDA Version

12.4

Read carefully and check the options

  • If you use win_cuda_torch_cuda edition, setup cuda? see here
  • If you use win_cuda edition, setup cuda and cudnn? see here
  • If you use mac edition, client is not launched automatically. Use chrome to open application.?
  • I've tried to change the Chunk Size
  • I've tried to set the Index to zero
  • I've read the tutorial
  • I've tried to extract to another folder (or re-extract) the .zip file

Does pre-installed model work?

No

Model Type

RVC

Issue Description

For a moment everything was working great better than I had ever got it to be but then I switched profiles and it refuses to perform like it did.

I got it to sound normal and got it to avoid turning lower pitch sounds into harsh robotic sounds (or maybe it was just that the RVC just decided to work for a moment and has been malfunctioning the entire time instead). I can't reproduce the results of the machine working at all like it had.

There's three other issues I have encountered but do not bother me.

A. On the initial startup, I have to reload the program once or twice to get the RVC to reduce its delay back to normal amounts of delay.
B. Sometimes the program takes ~10% extra GPU on any of the settings, but it is easily solved by closing the program and relaunching once or twice.
C. RVC sounds really choppy and robotic on Discord specifically, but it works fine if I had OBS open. (I have no idea but at least I can fix it).

Some other questions I have are do I need torch_cuda and/or cudnn for VVC to operate properly/optimally and what are all these warnings in the logs and command prompt? I don't understand computer talk and why input sample rate is even at 16000 or how to change it. I reduced my mic input sample rate from 24 bit 96000hz to 16 bit 48000hz which seemed to have somewhat better results but maybe only because something else isn't working?

The logs are also flooded with noise gate checks/reports if that is not supposed to happen?

Application Screenshot

FCPE (but problems apply to every f0),
index is not used,
7200 or 9600 Chunk,
192000 Extra,
Sio,
1 Output Buffer Size Ratio (idk what this does),
.2 crossfade seconds, 0 trancate

Logs on console

WARNING - Input or output sample rate is not set - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 106
WARNING - start_convert_chunk_bulk called - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 247
WARNING - data type resampled is short. padded.:(2257,), shape:(2400,) - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 514

input_sample_rate=48000, output_sample_rate=48000, monitor_sample_rate=48000, vc_input_sample_rate=16000, vc_output_sample_rate=32000, resample_ratio_in=0.3333333333333333, resample_ratio_out=1.5, resample_ratio_monitor=1.5, resample_ratio_pass_through_in_out=1.0

torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[INF0] torchfcpe.mel_tools.nv_mel_extractor: Librosa not found, use torchfcpe.mel_tools.mel_fn_librosa instead.
[INFO]: device is not None, use cuda:0
[INFO] > call by:torchfcpe.tools.spawn_infer_cf_naive_mel_pe_from_pt
[WARN] args.model.use_harmonic_emb is None; use default False
[WARN] > call by:torchfcpe.tools.spawn_cf_naive_mel_pe

vcclient - rvc_pipeline - INFO - noise gate -inf < -69.0 - vcclient_dev\voice_changer\voice_change_manager\vc_pipelines\rvc_pipeline.py - 222

@kiraraenjoyer kiraraenjoyer changed the title [ISSUE for v2]: Need Help with Audio Quality [ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total Jan 23, 2025
@Kuuko-fokkusugaru
Copy link

If you have tweaked settings, chances are that you may have messed something. Try deleting the whole folder and starting over with a new file. Test if the default models works fine. Use an F0 detection that it's GPU friendly like rmvpe_onnx with onnx files (you can convert them with RVC if they are pth).

Since there is no screenshot of the settings, I am not sure what could possibly be wrong. With 0.5 or slightly more it should sound fine already without a huge delay.

The reason why it sounds bad in Discord may be because you are using vb cable which isn't compatible with Discord and some other software. It may work with OBS because it takes the sound differently. Uninstall vb cable and download and install virtual audio cable lite (VAC) which is free and works without quality or stutter issues at all.

@kiraraenjoyer
Copy link
Author

GPU usage isn't the problem it was just that sometimes on launch it would be more than usual and after relaunching it would normalize.

Why is vc_input_sample_rate set at 16000 and how do I change it unless that breaks everything?

@Kuuko-fokkusugaru
Copy link

Try to change that first in your mic settings as it may be set to 16Khz instead of 44 or 48

@kiraraenjoyer
Copy link
Author

i changed my mic from 48 to 96 (mic doesnt have lower than 44 as an option) and the logs still read

"input_sample_rate": 48000,
"output_sample_rate": 48000,
"monitor_sample_rate": 48000,
"vc_input_sample_rate": 16000,
"vc_output_sample_rate": 32000,
"resample_ratio_in": 0.3333333333333333,
"resample_ratio_out": 1.5,

idk why input sample rate stayed 48 but i do know that vc_output_sample_rate is the sample rate the model is trained on but I don't know why the input for the model is 16 which it looks like it resamples to pad out missing data?

@Kuuko-fokkusugaru
Copy link

vb cable?

@Kuuko-fokkusugaru
Copy link

I am not sure if that's the sample rate of the model because I already suggested to try first with the included models and see if these work or not. The default included models are not 16Khz.

That said, the quality of the conversion can depend on many things, but mainly the model itself. Not so much the Khz but the quality of the data used for the training process. And by quality, I do not mean the audio quality itself but the amount of phonemes and different pronunciations used in the audio source. This part is complex and tricky because depending on even how the person speaks, the resulting model can be better or worse.

The software using more GPU is kind of normal, there is a moment of startup and buffering that increases the GPU usage but it should stabilize after a minute. And I already gave you the answer for the stutter and bad quality in Discord, vb cable does not work on Discord properly. There are times that it does, and times it doesn't. It doesn't support correctly certain audio settings and if these settings are being fed to vb cable, Discord will sound choppy and bad.

For the voice model to sound the least distorted as possible, you need to set the right tune setting. This setting is not meant to be a way of customizing the voice. You have to use this setting to make it sound as close as possible to the person of the model. The closer you can pick the same tone, the less likely that you will get distortions when using different pitch tones. That said, the software is still limited (and even more the majority of the trained models). You should give it a try to the latest v1 and see if it performs better for you. For a lot of people, v1 gives superior voice quality.

@kiraraenjoyer
Copy link
Author

kiraraenjoyer commented Jan 25, 2025

VB cable is set at 24 bit 96khz, and i tried VAC-lite but that doesnt change vc_input_sample_rate.

This is the logs for one of the default models

"input_sample_rate": 48000,
"output_sample_rate": 48000,
"monitor_sample_rate": 48000,
"vc_input_sample_rate": 16000, <- (how to change this? VC sample rate not mic or cable :c )
"vc_output_sample_rate": 40000, (sample rate of the model, i use 32k for my models)
"resample_ratio_in": 0.3333333333333333,
"resample_ratio_out": 1.2,

The voice model mostly sounds pretty good but natural drops in pitch or high raises in pitch sound unnatural or robotic which may be because this vc_input_sample_rate is missing data on the high and low ends. Is this something I cannot change without creating the voice model myself or can I write code in the terminal to change it safely?

Also what do you mean tune, tune is pitch in the manual? Either way, I had it where low and high natural pitch didnt cause distortion and I don't know why it didn't cause distortion like normal so I can't recreate it.

v2 beta is better for me in recent updates

@Kuuko-fokkusugaru
Copy link

The use of VAC is not to get higher sample rate, it's to make it stop from stuttering and sounding choppy in discord.
The sampling rate can't be changed but anyway it have nothing to do with the quality of the voice. The issues that you are getting are "normal". The software isn't perfect and has its limitations. Like I said before, using tune to match as close as possible the original voice helps in fixing most of the distortions and robotic voices. Set it lower or higher than the original voice and the robotic issue won't go away. You have to tune this while speaking in your most natural tone. Also keep in mind that you have to do your interpretation too. Keep an easy tone and pitch for the software, avoid huge changes, graspy voice, going too low in pitch, etc.
And yeah, the voice model quality changes a lot how it will sound. It depends a lot on the input data used, quality of the audio, amount of time of training data, amount of time of speech, etc. There are models that simply sounds horrible while others sound very clear. It also depends on your voice so the more clear that you speak, the better.

If you get better results with v2 then clearly isn't a software or setting issue and is just the voice model itself. As long as you followed what I suggested already, there isn't much that you can do other than waiting for the software to improve which I am not sure it may happen at all since development seems a bit abandoned.

@kiraraenjoyer
Copy link
Author

ill just figure it out on my own, i had it working perfectly, i just have to keep trying to recreate it then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants