Speech Enhancement

Tinkering with speech enhancement models.

Borrowed code, models and techniques from:

Improved Speech Enhancement with the Wave-U-Net ((arXiv)
Wave-U-Net: a multi-scale neural network for end-to-end audio source separation (arXiv)
Speech Denoising with Deep Feature Losses (arXiv, sound examples, GitHub)
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis (arXiv, sound examples, GitHub)

Datasets

The following datasets are used:

The Univeristy of Edinburgh Noisy speech database for speech enhancement problem
The TUT Acoustic scenes 2016 dataset is used to train the scene classifier network, which is used for the loss function. (dataset paper)
The CHiME-Home (Computational Hearing in Multisource Environments) dataset (2015) is also used for the scene classifier, in some experiments
The "train-clean-100" dataset from Librispeech, mixed with the TUT acoustic scenes dataset.

Data format

At the moment, the algorithm uses 32-bit floating-point audio files at a 16kHz sampling rate to perform correctly. You can use sox to convert your file. To convert audiofile.wav to 32-bit floating-point audio at 16kHz sampling rate, run:

sox audiofile.wav -r 16000 -b 32 -e float audiofile.float.wav

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
src		src
tests		tests
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Enhancement

Datasets

Data format

About

Releases

Packages

Languages

samwelkanda/speech-enhancer

Folders and files

Latest commit

History

Repository files navigation

Speech Enhancement

Datasets

Data format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages