[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

kiraraenjoyer · 2025-01-23T15:00:45Z

Voice Changer Version

vcclient_win_cuda_2.0.73-beta.zip

Operational System

Windows 11

GPU

RTX 4060

CUDA Version

12.4

Read carefully and check the options

If you use win_cuda_torch_cuda edition, setup cuda? see here
If you use win_cuda edition, setup cuda and cudnn? see here
If you use mac edition, client is not launched automatically. Use chrome to open application.?
I've tried to change the Chunk Size
I've tried to set the Index to zero
I've read the tutorial
I've tried to extract to another folder (or re-extract) the .zip file

Does pre-installed model work?

No

Model Type

RVC

Issue Description

For a moment everything was working great better than I had ever got it to be but then I switched profiles and it refuses to perform like it did.

I got it to sound normal and got it to avoid turning lower pitch sounds into harsh robotic sounds (or maybe it was just that the RVC just decided to work for a moment and has been malfunctioning the entire time instead). I can't reproduce the results of the machine working at all like it had.

There's three other issues I have encountered but do not bother me.

A. On the initial startup, I have to reload the program once or twice to get the RVC to reduce its delay back to normal amounts of delay.
B. Sometimes the program takes ~10% extra GPU on any of the settings, but it is easily solved by closing the program and relaunching once or twice.
C. RVC sounds really choppy and robotic on Discord specifically, but it works fine if I had OBS open. (I have no idea but at least I can fix it).

Some other questions I have are do I need torch_cuda and/or cudnn for VVC to operate properly/optimally and what are all these warnings in the logs and command prompt? I don't understand computer talk and why input sample rate is even at 16000 or how to change it. I reduced my mic input sample rate from 24 bit 96000hz to 16 bit 48000hz which seemed to have somewhat better results but maybe only because something else isn't working?

The logs are also flooded with noise gate checks/reports if that is not supposed to happen?

Application Screenshot

FCPE (but problems apply to every f0),
index is not used,
7200 or 9600 Chunk,
192000 Extra,
Sio,
1 Output Buffer Size Ratio (idk what this does),
.2 crossfade seconds, 0 trancate

Logs on console

WARNING - Input or output sample rate is not set - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 106
WARNING - start_convert_chunk_bulk called - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 247
WARNING - data type resampled is short. padded.:(2257,), shape:(2400,) - vcclient_dev\voice_changer\voice_change_manager\voice_changer.py - 514

input_sample_rate=48000, output_sample_rate=48000, monitor_sample_rate=48000, vc_input_sample_rate=16000, vc_output_sample_rate=32000, resample_ratio_in=0.3333333333333333, resample_ratio_out=1.5, resample_ratio_monitor=1.5, resample_ratio_pass_through_in_out=1.0

torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
[INF0] torchfcpe.mel_tools.nv_mel_extractor: Librosa not found, use torchfcpe.mel_tools.mel_fn_librosa instead.
[INFO]: device is not None, use cuda:0
[INFO] > call by:torchfcpe.tools.spawn_infer_cf_naive_mel_pe_from_pt
[WARN] args.model.use_harmonic_emb is None; use default False
[WARN] > call by:torchfcpe.tools.spawn_cf_naive_mel_pe

vcclient - rvc_pipeline - INFO - noise gate -inf < -69.0 - vcclient_dev\voice_changer\voice_change_manager\vc_pipelines\rvc_pipeline.py - 222

Kuuko-fokkusugaru · 2025-01-23T20:29:12Z

If you have tweaked settings, chances are that you may have messed something. Try deleting the whole folder and starting over with a new file. Test if the default models works fine. Use an F0 detection that it's GPU friendly like rmvpe_onnx with onnx files (you can convert them with RVC if they are pth).

Since there is no screenshot of the settings, I am not sure what could possibly be wrong. With 0.5 or slightly more it should sound fine already without a huge delay.

The reason why it sounds bad in Discord may be because you are using vb cable which isn't compatible with Discord and some other software. It may work with OBS because it takes the sound differently. Uninstall vb cable and download and install virtual audio cable lite (VAC) which is free and works without quality or stutter issues at all.

kiraraenjoyer · 2025-01-24T20:38:08Z

GPU usage isn't the problem it was just that sometimes on launch it would be more than usual and after relaunching it would normalize.

Why is vc_input_sample_rate set at 16000 and how do I change it unless that breaks everything?

Kuuko-fokkusugaru · 2025-01-24T21:24:49Z

Try to change that first in your mic settings as it may be set to 16Khz instead of 44 or 48

kiraraenjoyer · 2025-01-24T22:04:25Z

i changed my mic from 48 to 96 (mic doesnt have lower than 44 as an option) and the logs still read

"input_sample_rate": 48000,
"output_sample_rate": 48000,
"monitor_sample_rate": 48000,
"vc_input_sample_rate": 16000,
"vc_output_sample_rate": 32000,
"resample_ratio_in": 0.3333333333333333,
"resample_ratio_out": 1.5,

idk why input sample rate stayed 48 but i do know that vc_output_sample_rate is the sample rate the model is trained on but I don't know why the input for the model is 16 which it looks like it resamples to pad out missing data?

Kuuko-fokkusugaru · 2025-01-25T05:06:15Z

vb cable?

Kuuko-fokkusugaru · 2025-01-25T05:19:43Z

I am not sure if that's the sample rate of the model because I already suggested to try first with the included models and see if these work or not. The default included models are not 16Khz.

That said, the quality of the conversion can depend on many things, but mainly the model itself. Not so much the Khz but the quality of the data used for the training process. And by quality, I do not mean the audio quality itself but the amount of phonemes and different pronunciations used in the audio source. This part is complex and tricky because depending on even how the person speaks, the resulting model can be better or worse.

The software using more GPU is kind of normal, there is a moment of startup and buffering that increases the GPU usage but it should stabilize after a minute. And I already gave you the answer for the stutter and bad quality in Discord, vb cable does not work on Discord properly. There are times that it does, and times it doesn't. It doesn't support correctly certain audio settings and if these settings are being fed to vb cable, Discord will sound choppy and bad.

For the voice model to sound the least distorted as possible, you need to set the right tune setting. This setting is not meant to be a way of customizing the voice. You have to use this setting to make it sound as close as possible to the person of the model. The closer you can pick the same tone, the less likely that you will get distortions when using different pitch tones. That said, the software is still limited (and even more the majority of the trained models). You should give it a try to the latest v1 and see if it performs better for you. For a lot of people, v1 gives superior voice quality.

kiraraenjoyer · 2025-01-25T10:36:15Z

VB cable is set at 24 bit 96khz, and i tried VAC-lite but that doesnt change vc_input_sample_rate.

This is the logs for one of the default models

"input_sample_rate": 48000,
"output_sample_rate": 48000,
"monitor_sample_rate": 48000,
"vc_input_sample_rate": 16000, <- (how to change this? VC sample rate not mic or cable :c )
"vc_output_sample_rate": 40000, (sample rate of the model, i use 32k for my models)
"resample_ratio_in": 0.3333333333333333,
"resample_ratio_out": 1.2,

The voice model mostly sounds pretty good but natural drops in pitch or high raises in pitch sound unnatural or robotic which may be because this vc_input_sample_rate is missing data on the high and low ends. Is this something I cannot change without creating the voice model myself or can I write code in the terminal to change it safely?

Also what do you mean tune, tune is pitch in the manual? Either way, I had it where low and high natural pitch didnt cause distortion and I don't know why it didn't cause distortion like normal so I can't recreate it.

v2 beta is better for me in recent updates

Kuuko-fokkusugaru · 2025-01-25T13:19:36Z

The use of VAC is not to get higher sample rate, it's to make it stop from stuttering and sounding choppy in discord.
The sampling rate can't be changed but anyway it have nothing to do with the quality of the voice. The issues that you are getting are "normal". The software isn't perfect and has its limitations. Like I said before, using tune to match as close as possible the original voice helps in fixing most of the distortions and robotic voices. Set it lower or higher than the original voice and the robotic issue won't go away. You have to tune this while speaking in your most natural tone. Also keep in mind that you have to do your interpretation too. Keep an easy tone and pitch for the software, avoid huge changes, graspy voice, going too low in pitch, etc.
And yeah, the voice model quality changes a lot how it will sound. It depends a lot on the input data used, quality of the audio, amount of time of training data, amount of time of speech, etc. There are models that simply sounds horrible while others sound very clear. It also depends on your voice so the more clear that you speak, the better.

If you get better results with v2 then clearly isn't a software or setting issue and is just the voice model itself. As long as you followed what I suggested already, there isn't much that you can do other than waiting for the software to improve which I am not sure it may happen at all since development seems a bit abandoned.

kiraraenjoyer · 2025-01-25T13:26:26Z

ill just figure it out on my own, i had it working perfectly, i just have to keep trying to recreate it then

kiraraenjoyer changed the title ~~[ISSUE for v2]: Need Help with Audio Quality~~ [ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total Jan 23, 2025

kiraraenjoyer closed this as completed Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

kiraraenjoyer commented Jan 23, 2025

Kuuko-fokkusugaru commented Jan 23, 2025

kiraraenjoyer commented Jan 24, 2025

Kuuko-fokkusugaru commented Jan 24, 2025

kiraraenjoyer commented Jan 24, 2025

Kuuko-fokkusugaru commented Jan 25, 2025

Kuuko-fokkusugaru commented Jan 25, 2025

kiraraenjoyer commented Jan 25, 2025 •

edited

Loading

Kuuko-fokkusugaru commented Jan 25, 2025

kiraraenjoyer commented Jan 25, 2025

[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

[ISSUE for v2]: Need Help with Audio Quality, 4 Issues Total #1439

Comments

kiraraenjoyer commented Jan 23, 2025

Voice Changer Version

Operational System

GPU

CUDA Version

Read carefully and check the options

Does pre-installed model work?

Model Type

Issue Description

Application Screenshot

Logs on console

Kuuko-fokkusugaru commented Jan 23, 2025

kiraraenjoyer commented Jan 24, 2025

Kuuko-fokkusugaru commented Jan 24, 2025

kiraraenjoyer commented Jan 24, 2025

Kuuko-fokkusugaru commented Jan 25, 2025

Kuuko-fokkusugaru commented Jan 25, 2025

kiraraenjoyer commented Jan 25, 2025 • edited Loading

Kuuko-fokkusugaru commented Jan 25, 2025

kiraraenjoyer commented Jan 25, 2025

kiraraenjoyer commented Jan 25, 2025 •

edited

Loading