Repository for training models for music source separation. Repository is based on kuielab code for SDX23 challenge. The main idea of this repository is to create training code, which is easy to modify for experiments. Brought to you by MVSep.com.
Model can be chosen with --model_type
arg.
Available models for training:
- MDX23C based on KUIELab TFC TDF v3 architecture. Key:
mdx23c
. - Demucs4HT [Paper]. Key:
htdemucs
. - VitLarge23 based on Segmentation Models Pytorch. Key:
segm_models
. - TorchSeg based on TorchSeg module. Key:
torchseg
. - Band Split RoFormer [Paper, Repository] . Key:
bs_roformer
orbs_roformer_low_mem
. - Mel-Band RoFormer [Paper, Repository]. Key:
mel_band_roformer
ormel_band_roformer_low_mem
. - Swin Upernet [Paper] Key:
swin_upernet
. - BandIt Plus [Paper, Repository] Key:
bandit
. - SCNet [Paper, Official Repository, Unofficial Repository] Key:
scnet
. - BandIt v2 [Paper, Repository] Key:
bandit_v2
. - Apollo [Paper, Repository] Key:
apollo
. - TS BSMamba2 [Paper, Repository] Key:
bs_mamba2
.
- Note 1: For
segm_models
there are many different encoders is possible. Look here. - Note 2: Thanks to @lucidrains for recreating the RoFormer models based on papers.
- Note 3: For
torchseg
gives access to more than 800 encoders fromtimm
module. It's similar tosegm_models
.
To train model you need to:
- Choose model type with option
--model_type
, including:mdx23c
,htdemucs
,segm_models
,mel_band_roformer
,bs_roformer
. - Choose location of config for model
--config_path
<config path>
. You can find examples of configs in configs folder. Prefixesconfig_musdb18_
are examples for MUSDB18 dataset. - If you have a check-point from the same model or from another similar model you can use it with option:
--start_check_point
<weights path>
- Choose path where to store results of training
--results_path
<results folder path>
python train.py \
--model_type mel_band_roformer \
--config_path configs/config_mel_band_roformer_vocals.yaml \
--start_check_point results/model.ckpt \
--results_path results/ \
--data_path 'datasets/dataset1' 'datasets/dataset2' \
--valid_path datasets/musdb18hq/test \
--num_workers 4 \
--device_ids 0
All training parameters are here.
python inference.py \
--model_type mdx23c \
--config_path configs/config_mdx23c_musdb18.yaml \
--start_check_point results/last_mdx23c.ckpt \
--input_folder input/wavs/ \
--store_dir separation_results/
All inference parameters are here.
- All batch sizes in config are adjusted to use with single NVIDIA A6000 48GB. If you have less memory please adjust correspodningly in model config
training.batch_size
andtraining.gradient_accumulation_steps
. - It's usually always better to start with old weights even if shapes not fully match. Code supports loading weights for not fully same models (but it must have the same architecture). Training will be much faster.
configs/config_*.yaml
- configuration files for modelsmodels/*
- set of available models for training and inferencedataset.py
- dataset which creates new samples for traininginference.py
- process folder with music files and separate themtrain.py
- main training codeutils.py
- common functions used by train/validvalid.py
- validation of model with metrics
If you trained some good models, please, share them. You can post config and model weights in this issue.
Model Type | Instruments | Metrics (SDR) | Config | Checkpoint |
---|---|---|---|---|
MDX23C | vocals / other | SDR vocals: 10.17 | Config | Weights |
HTDemucs4 (MVSep finetuned) | vocals / other | SDR vocals: 8.78 | Config | Weights |
Segm Models (VitLarge23) | vocals / other | SDR vocals: 9.77 | Config | Weights |
Swin Upernet | vocals / other | SDR vocals: 7.57 | Config | Weights |
BS Roformer (viperx edition) | vocals / other | SDR vocals: 10.87 | Config | Weights |
MelBand Roformer (viperx edition) | vocals / other | SDR vocals: 9.67 | Config | Weights |
MelBand Roformer (KimberleyJensen edition) | vocals / other | SDR vocals: 10.98 | Config | Weights |
Note: Metrics measured on Multisong Dataset.
Model Type | Instruments | Metrics (SDR) | Config | Checkpoint |
---|---|---|---|---|
HTDemucs4 FT Drums | drums | SDR drums: 11.13 | Config | Weights |
HTDemucs4 FT Bass | bass | SDR bass: 11.96 | Config | Weights |
HTDemucs4 FT Other | other | SDR other: 5.85 | Config | Weights |
HTDemucs4 FT Vocals (Official repository) | vocals | SDR vocals: 8.38 | Config | Weights |
BS Roformer (viperx edition) | other | SDR other: 6.85 | Config | Weights |
MelBand Roformer (aufr33 and viperx edition) | crowd | SDR crowd: 5.99 | Config | Weights |
MelBand Roformer (anvuew edition) | dereverb | --- | Config | Weights |
MelBand Roformer Denoise (by aufr33) | denoise | --- | Config | Weights |
MelBand Roformer Denoise Aggressive (by aufr33) | denoise | --- | Config | Weights |
Apollo LQ MP3 restoration (by JusperLee) | restored | --- | Config | Weights |
MelBand Roformer Aspiration (by SUC-DriverOld) | aspiration | SDR: 9.85 | Config | Weights |
Note: All HTDemucs4 FT models output 4 stems, but quality is best only on target stem (all other stems are dummy).
Model Type | Instruments | Metrics (SDR) | Config | Checkpoint |
---|---|---|---|---|
MDX23C |
bass / drums / vocals / other | MUSDB test avg: 7.15 (bass: 5.77, drums: 7.93 vocals: 9.23 other: 5.68) Multisong avg: 7.02 (bass: 8.40, drums: 7.73 vocals: 7.36 other: 4.57) | Config | Weights |
BandIt Plus | speech / music / effects | DnR test avg: 11.50 (speech: 15.64, music: 9.18 effects: 9.69) | Config | Weights |
HTDemucs4 | bass / drums / vocals / other | Multisong avg: 9.16 (bass: 11.76, drums: 10.88 vocals: 8.24 other: 5.74) | Config | Weights |
HTDemucs4 (6 stems) | bass / drums / vocals / other / piano / guitar | Multisong (bass: 11.22, drums: 10.22 vocals: 8.05 other: --- piano: --- guitar: ---) | Config | Weights |
Demucs3 mmi | bass / drums / vocals / other | Multisong avg: 8.88 (bass: 11.17, drums: 10.70 vocals: 8.22 other: 5.42) | Config | Weights |
DrumSep htdemucs (by inagoy) | kick / snare / cymbals / toms | --- | Config | Weights |
DrumSep mdx23c (by aufr33 and jarredou) | kick / snare / toms / hh / ride / crash | --- | Config | Weights |
SCNet (by starrytong) |
bass / drums / vocals / other | Multisong avg: 8.87 (bass: 11.07, drums: 10.79 vocals: 8.27 other: 5.34) | Config | Weights |
SCNet Large |
bass / drums / vocals / other | MUSDB test avg: 9.32 (bass: 8.63, drums: 10.89 vocals: 10.69 other: 7.06) Multisong avg: 9.19 (bass: 11.15, drums: 11.04 vocals: 8.94 other: 5.62) | Config | Weights |
SCNet Large (by starrytong) |
bass / drums / vocals / other | MUSDB test avg: 9.70 (bass: 9.38, drums: 11.15 vocals: 10.94 other: 7.31) Multisong avg: 9.28 (bass: 11.27, drums: 11.23 vocals: 9.05 other: 5.57) | Config | Weights |
TS BS Mamba2 |
bass / drums / vocals / other | MUSDB test avg: 6.87 (bass: 5.82, drums: 8.14 vocals: 8.35 other: 5.16) Multisong avg: 6.66 (bass: 7.87, drums: 7.92 vocals: 7.01 other: 3.85) | Config | Weights |
* Note: Model was trained only on MUSDB18HQ dataset (100 songs train data)
Look here: Dataset types
Look here: Augmentations
@misc{solovyev2023benchmarks,
title={Benchmarks and leaderboards for sound demixing tasks},
author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
year={2023},
eprint={2305.07489},
archivePrefix={arXiv},
primaryClass={cs.SD}
}