Music Source Separation Universal Training Code

wxPython dependancy breaks Colabs with wheel error. Furthermore, no use_amp: true in at least some Roformer's yaml configs causes yaml errors (I'm not sure if you need to add it manually at the bottom of training section as jarredou explained, or forking some previous commit of the original repo is enough.
[OG info follows]

Repository for training models for music source separation. Repository is based on kuielab code for SDX23 challenge. The main idea of this repository is to create training code, which is easy to modify for experiments. Brought to you by MVSep.com.

Models

Model can be chosen with --model_type arg.

Available models for training:

MDX23C based on KUIELab TFC TDF v3 architecture. Key: mdx23c.
Demucs4HT [Paper]. Key: htdemucs.
VitLarge23 based on Segmentation Models Pytorch. Key: segm_models.
TorchSeg based on TorchSeg module. Key: torchseg.
Band Split RoFormer [Paper, Repository] . Key: bs_roformer or bs_roformer_low_mem.
Mel-Band RoFormer [Paper, Repository]. Key: mel_band_roformer or mel_band_roformer_low_mem.
Swin Upernet [Paper] Key: swin_upernet.
BandIt Plus [Paper, Repository] Key: bandit.
SCNet [Paper, Official Repository, Unofficial Repository] Key: scnet.
BandIt v2 [Paper, Repository] Key: bandit_v2.
Apollo [Paper, Repository] Key: apollo.
TS BSMamba2 [Paper, Repository] Key: bs_mamba2.

Note 1: For segm_models there are many different encoders is possible. Look here.
Note 2: Thanks to @lucidrains for recreating the RoFormer models based on papers.
Note 3: For torchseg gives access to more than 800 encoders from timm module. It's similar to segm_models.

How to: Train

To train model you need to:

Choose model type with option --model_type, including: mdx23c, htdemucs, segm_models, mel_band_roformer, bs_roformer.
Choose location of config for model --config_path <config path>. You can find examples of configs in configs folder. Prefixes config_musdb18_ are examples for MUSDB18 dataset.
If you have a check-point from the same model or from another similar model you can use it with option: --start_check_point <weights path>
Choose path where to store results of training --results_path <results folder path>

Training example

python train.py \ 
    --model_type mel_band_roformer \ 
    --config_path configs/config_mel_band_roformer_vocals.yaml \
    --start_check_point results/model.ckpt \
    --results_path results/ \
    --data_path 'datasets/dataset1' 'datasets/dataset2' \
    --valid_path datasets/musdb18hq/test \
    --num_workers 4 \
    --device_ids 0

All training parameters are here.

How to: Inference

Inference example

python inference.py \  
    --model_type mdx23c \
    --config_path configs/config_mdx23c_musdb18.yaml \
    --start_check_point results/last_mdx23c.ckpt \
    --input_folder input/wavs/ \
    --store_dir separation_results/

All inference parameters are here.

Useful notes

All batch sizes in config are adjusted to use with single NVIDIA A6000 48GB. If you have less memory please adjust correspodningly in model config training.batch_size and training.gradient_accumulation_steps.
It's usually always better to start with old weights even if shapes not fully match. Code supports loading weights for not fully same models (but it must have the same architecture). Training will be much faster.

Code description

configs/config_*.yaml - configuration files for models
models/* - set of available models for training and inference
dataset.py - dataset which creates new samples for training
inference.py - process folder with music files and separate them
train.py - main training code
utils.py - common functions used by train/valid
valid.py - validation of model with metrics
ensemble.py - useful script to ensemble results of different models to make results better (see docs).

Pre-trained models

If you trained some good models, please, share them. You can post config and model weights in this issue.

Vocal models

Model Type	Instruments	Metrics (SDR)	Config	Checkpoint
MDX23C	vocals / other	SDR vocals: 10.17	Config	Weights
HTDemucs4 (MVSep finetuned)	vocals / other	SDR vocals: 8.78	Config	Weights
Segm Models (VitLarge23)	vocals / other	SDR vocals: 9.77	Config	Weights
Swin Upernet	vocals / other	SDR vocals: 7.57	Config	Weights
BS Roformer (viperx edition)	vocals / other	SDR vocals: 10.87	Config	Weights
MelBand Roformer (viperx edition)	vocals / other	SDR vocals: 9.67	Config	Weights
MelBand Roformer (KimberleyJensen edition)	vocals / other	SDR vocals: 10.98	Config	Weights

Note: Metrics measured on Multisong Dataset.

Single stem models

Model Type	Instruments	Metrics (SDR)	Config	Checkpoint
HTDemucs4 FT Drums	drums	SDR drums: 11.13	Config	Weights
HTDemucs4 FT Bass	bass	SDR bass: 11.96	Config	Weights
HTDemucs4 FT Other	other	SDR other: 5.85	Config	Weights
HTDemucs4 FT Vocals (Official repository)	vocals	SDR vocals: 8.38	Config	Weights
BS Roformer (viperx edition)	other	SDR other: 6.85	Config	Weights
MelBand Roformer (aufr33 and viperx edition)	crowd	SDR crowd: 5.99	Config	Weights
MelBand Roformer (anvuew edition)	dereverb	---	Config	Weights
MelBand Roformer Denoise (by aufr33)	denoise	---	Config	Weights
MelBand Roformer Denoise Aggressive (by aufr33)	denoise	---	Config	Weights
Apollo LQ MP3 restoration (by JusperLee)	restored	---	Config	Weights
MelBand Roformer Aspiration (by SUC-DriverOld)	aspiration	SDR: 9.85	Config	Weights

Note: All HTDemucs4 FT models output 4 stems, but quality is best only on target stem (all other stems are dummy).

Multi-stem models

Model Type	Instruments	Metrics (SDR)	Config	Checkpoint
MDX23C *	bass / drums / vocals / other	MUSDB test avg: 7.15 (bass: 5.77, drums: 7.93 vocals: 9.23 other: 5.68) Multisong avg: 7.02 (bass: 8.40, drums: 7.73 vocals: 7.36 other: 4.57)	Config	Weights
BandIt Plus	speech / music / effects	DnR test avg: 11.50 (speech: 15.64, music: 9.18 effects: 9.69)	Config	Weights
HTDemucs4	bass / drums / vocals / other	Multisong avg: 9.16 (bass: 11.76, drums: 10.88 vocals: 8.24 other: 5.74)	Config	Weights
HTDemucs4 (6 stems)	bass / drums / vocals / other / piano / guitar	Multisong (bass: 11.22, drums: 10.22 vocals: 8.05 other: --- piano: --- guitar: ---)	Config	Weights
Demucs3 mmi	bass / drums / vocals / other	Multisong avg: 8.88 (bass: 11.17, drums: 10.70 vocals: 8.22 other: 5.42)	Config	Weights
DrumSep htdemucs (by inagoy)	kick / snare / cymbals / toms	---	Config	Weights
DrumSep mdx23c (by aufr33 and jarredou)	kick / snare / toms / hh / ride / crash	---	Config	Weights
SCNet (by starrytong) *	bass / drums / vocals / other	Multisong avg: 8.87 (bass: 11.07, drums: 10.79 vocals: 8.27 other: 5.34)	Config	Weights
SCNet Large *	bass / drums / vocals / other	MUSDB test avg: 9.32 (bass: 8.63, drums: 10.89 vocals: 10.69 other: 7.06) Multisong avg: 9.19 (bass: 11.15, drums: 11.04 vocals: 8.94 other: 5.62)	Config	Weights
SCNet Large (by starrytong) *	bass / drums / vocals / other	MUSDB test avg: 9.70 (bass: 9.38, drums: 11.15 vocals: 10.94 other: 7.31) Multisong avg: 9.28 (bass: 11.27, drums: 11.23 vocals: 9.05 other: 5.57)	Config	Weights
TS BS Mamba2 *	bass / drums / vocals / other	MUSDB test avg: 6.87 (bass: 5.82, drums: 8.14 vocals: 8.35 other: 5.16) Multisong avg: 6.66 (bass: 7.87, drums: 7.92 vocals: 7.01 other: 3.85)	Config	Weights

* Note: Model was trained only on MUSDB18HQ dataset (100 songs train data)

Dataset types

Look here: Dataset types

Augmentations

Look here: Augmentations

Graphical user interface

Look here: GUI

Citation

arxiv paper

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Music Source Separation Universal Training Code

Models

How to: Train

Training example

How to: Inference

Inference example

Useful notes

Code description

Pre-trained models

Vocal models

Single stem models

Multi-stem models

Dataset types

Augmentations

Graphical user interface

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Music Source Separation Universal Training Code

Models

How to: Train

Training example

How to: Inference

Inference example

Useful notes

Code description

Pre-trained models

Vocal models

Single stem models

Multi-stem models

Dataset types

Augmentations

Graphical user interface

Citation