-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439
Comments
If you have tweaked settings, chances are that you may have messed something. Try deleting the whole folder and starting over with a new file. Test if the default models works fine. Use an F0 detection that it's GPU friendly like rmvpe_onnx with onnx files (you can convert them with RVC if they are pth). Since there is no screenshot of the settings, I am not sure what could possibly be wrong. With 0.5 or slightly more it should sound fine already without a huge delay. The reason why it sounds bad in Discord may be because you are using vb cable which isn't compatible with Discord and some other software. It may work with OBS because it takes the sound differently. Uninstall vb cable and download and install virtual audio cable lite (VAC) which is free and works without quality or stutter issues at all. |
GPU usage isn't the problem it was just that sometimes on launch it would be more than usual and after relaunching it would normalize. Why is vc_input_sample_rate set at 16000 and how do I change it unless that breaks everything? |
Try to change that first in your mic settings as it may be set to 16Khz instead of 44 or 48 |
i changed my mic from 48 to 96 (mic doesnt have lower than 44 as an option) and the logs still read "input_sample_rate": 48000, idk why input sample rate stayed 48 but i do know that vc_output_sample_rate is the sample rate the model is trained on but I don't know why the input for the model is 16 which it looks like it resamples to pad out missing data? |
vb cable? |
I am not sure if that's the sample rate of the model because I already suggested to try first with the included models and see if these work or not. The default included models are not 16Khz. That said, the quality of the conversion can depend on many things, but mainly the model itself. Not so much the Khz but the quality of the data used for the training process. And by quality, I do not mean the audio quality itself but the amount of phonemes and different pronunciations used in the audio source. This part is complex and tricky because depending on even how the person speaks, the resulting model can be better or worse. The software using more GPU is kind of normal, there is a moment of startup and buffering that increases the GPU usage but it should stabilize after a minute. And I already gave you the answer for the stutter and bad quality in Discord, vb cable does not work on Discord properly. There are times that it does, and times it doesn't. It doesn't support correctly certain audio settings and if these settings are being fed to vb cable, Discord will sound choppy and bad. For the voice model to sound the least distorted as possible, you need to set the right tune setting. This setting is not meant to be a way of customizing the voice. You have to use this setting to make it sound as close as possible to the person of the model. The closer you can pick the same tone, the less likely that you will get distortions when using different pitch tones. That said, the software is still limited (and even more the majority of the trained models). You should give it a try to the latest v1 and see if it performs better for you. For a lot of people, v1 gives superior voice quality. |
VB cable is set at 24 bit 96khz, and i tried VAC-lite but that doesnt change vc_input_sample_rate. This is the logs for one of the default models "input_sample_rate": 48000, The voice model mostly sounds pretty good but natural drops in pitch or high raises in pitch sound unnatural or robotic which may be because this vc_input_sample_rate is missing data on the high and low ends. Is this something I cannot change without creating the voice model myself or can I write code in the terminal to change it safely? Also what do you mean tune, tune is pitch in the manual? Either way, I had it where low and high natural pitch didnt cause distortion and I don't know why it didn't cause distortion like normal so I can't recreate it. v2 beta is better for me in recent updates |
The use of VAC is not to get higher sample rate, it's to make it stop from stuttering and sounding choppy in discord. If you get better results with v2 then clearly isn't a software or setting issue and is just the voice model itself. As long as you followed what I suggested already, there isn't much that you can do other than waiting for the software to improve which I am not sure it may happen at all since development seems a bit abandoned. |
ill just figure it out on my own, i had it working perfectly, i just have to keep trying to recreate it then |
Voice Changer Version
vcclient_win_cuda_2.0.73-beta.zip
Operational System
Windows 11
GPU
RTX 4060
CUDA Version
12.4
Read carefully and check the options
Does pre-installed model work?
No
Model Type
RVC
Issue Description
For a moment everything was working great better than I had ever got it to be but then I switched profiles and it refuses to perform like it did.
I got it to sound normal and got it to avoid turning lower pitch sounds into harsh robotic sounds (or maybe it was just that the RVC just decided to work for a moment and has been malfunctioning the entire time instead). I can't reproduce the results of the machine working at all like it had.
There's three other issues I have encountered but do not bother me.
A. On the initial startup, I have to reload the program once or twice to get the RVC to reduce its delay back to normal amounts of delay.
B. Sometimes the program takes ~10% extra GPU on any of the settings, but it is easily solved by closing the program and relaunching once or twice.
C. RVC sounds really choppy and robotic on Discord specifically, but it works fine if I had OBS open. (I have no idea but at least I can fix it).
Some other questions I have are do I need torch_cuda and/or cudnn for VVC to operate properly/optimally and what are all these warnings in the logs and command prompt? I don't understand computer talk and why input sample rate is even at 16000 or how to change it. I reduced my mic input sample rate from 24 bit 96000hz to 16 bit 48000hz which seemed to have somewhat better results but maybe only because something else isn't working?
The logs are also flooded with noise gate checks/reports if that is not supposed to happen?
Application Screenshot
FCPE (but problems apply to every f0),
index is not used,
7200 or 9600 Chunk,
192000 Extra,
Sio,
1 Output Buffer Size Ratio (idk what this does),
.2 crossfade seconds, 0 trancate
Logs on console
WARNING - Input or output sample rate is not set - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 106
WARNING - start_convert_chunk_bulk called - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 247
WARNING - data type resampled is short. padded.:(2257,), shape:(2400,) - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 514
input_sample_rate=48000, output_sample_rate=48000, monitor_sample_rate=48000, vc_input_sample_rate=16000, vc_output_sample_rate=32000, resample_ratio_in=0.3333333333333333, resample_ratio_out=1.5, resample_ratio_monitor=1.5, resample_ratio_pass_through_in_out=1.0
torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[INF0] torchfcpe.mel_tools.nv_mel_extractor: Librosa not found, use torchfcpe.mel_tools.mel_fn_librosa instead.
[INFO]: device is not None, use cuda:0
[INFO] > call by:torchfcpe.tools.spawn_infer_cf_naive_mel_pe_from_pt
[WARN] args.model.use_harmonic_emb is None; use default False
[WARN] > call by:torchfcpe.tools.spawn_cf_naive_mel_pe
vcclient - rvc_pipeline - INFO - noise gate -inf < -69.0 - vcclient_dev\voice_changer\voice_change_manager\vc_pipelines\rvc_pipeline.py - 222
The text was updated successfully, but these errors were encountered: