Spaces:
Running on Zero
Running on Zero
| # Music Source Separation Universal Training Code | |
| Repository for training models for music source separation. Repository is based on [kuielab code](https://github.com/kuielab/sdx23/tree/mdx_AB/my_submission/src) for [SDX23 challenge](https://github.com/kuielab/sdx23/tree/mdx_AB/my_submission/src). The main idea of this repository is to create training code, which is easy to modify for experiments. Brought to you by [MVSep.com](https://mvsep.com). | |
| ## Models | |
| Model can be chosen with `--model_type` arg. | |
| Available models for training: | |
| * MDX23C based on [KUIELab TFC TDF v3 architecture](https://github.com/kuielab/sdx23/). Key: `mdx23c`. | |
| * Demucs4HT [[Paper](https://arxiv.org/abs/2211.08553)]. Key: `htdemucs`. | |
| * VitLarge23 based on [Segmentation Models Pytorch](https://github.com/qubvel/segmentation_models.pytorch). Key: `segm_models`. | |
| * TorchSeg based on [TorchSeg module](https://github.com/qubvel/segmentation_models.pytorch). Key: `torchseg`. | |
| * Band Split RoFormer [[Paper](https://arxiv.org/abs/2309.02612), [Repository](https://github.com/lucidrains/BS-RoFormer)] . Key: `bs_roformer`. | |
| * Mel-Band RoFormer [[Paper](https://arxiv.org/abs/2310.01809), [Repository](https://github.com/lucidrains/BS-RoFormer)]. Key: `mel_band_roformer`. | |
| * Swin Upernet [[Paper](https://arxiv.org/abs/2103.14030)] Key: `swin_upernet`. | |
| * BandIt Plus [[Paper](https://arxiv.org/abs/2309.02539), [Repository](https://github.com/karnwatcharasupat/bandit)] Key: `bandit`. | |
| * SCNet [[Paper](https://arxiv.org/abs/2401.13276), [Official Repository](https://github.com/starrytong/SCNet), [Unofficial Repository](https://github.com/amanteur/SCNet-PyTorch)] Key: `scnet`. | |
| * BandIt v2 [[Paper](https://arxiv.org/abs/2407.07275), [Repository](https://github.com/kwatcharasupat/bandit-v2)] Key: `bandit_v2`. | |
| * Apollo [[Paper](https://arxiv.org/html/2409.08514v1), [Repository](https://github.com/JusperLee/Apollo)] Key: `apollo`. | |
| * BSMamba2 [[Paper](https://arxiv.org/abs/2508.14556), [Repository](https://github.com/EuiYeonKim/BSMamba2)] Key: `bs_mamba2`. | |
| * Conformer [[Paper](https://arxiv.org/abs/2005.08100), [Repository](https://github.com/lucidrains/conformer)] Key: `conformer`. | |
| * BS Conformer Key: `bs_conformer` | |
| * SCNet Tran Key: `scnet_tran`. | |
| * SCNet Masked Key: `scnet_masked`. | |
| 1. **Note 1**: For `segm_models` there are many different encoders is possible. [Look here](https://github.com/qubvel/segmentation_models.pytorch#encoders-). | |
| 2. **Note 2**: Thanks to [@lucidrains](https://github.com/lucidrains) for recreating the RoFormer models based on papers. | |
| 3. **Note 3**: For `torchseg` gives access to more than 800 encoders from `timm` module. It's similar to `segm_models`. | |
| ## How to: Train | |
| To train model you need to: | |
| 1) Choose model type with option `--model_type`, including: `mdx23c`, `htdemucs`, `segm_models`, `mel_band_roformer`, `bs_roformer`. | |
| 2) Choose location of config for model `--config_path` `<config path>`. You can find examples of configs in [configs folder](configs/). Prefixes `config_musdb18_` are examples for [MUSDB18 dataset](https://sigsep.github.io/datasets/musdb.html). | |
| 3) If you have a check-point from the same model or from another similar model you can use it with option: `--start_check_point` `<weights path>` | |
| 4) Choose path where to store results of training `--results_path` `<results folder path>` | |
| ### Training example | |
| ```bash | |
| python train.py \ | |
| --model_type mel_band_roformer \ | |
| --config_path configs/config_mel_band_roformer_vocals.yaml \ | |
| --start_check_point results/model.ckpt \ | |
| --results_path results/ \ | |
| --data_path 'datasets/dataset1' 'datasets/dataset2' \ | |
| --valid_path datasets/musdb18hq/test \ | |
| --num_workers 4 \ | |
| --device_ids 0 | |
| ``` | |
| All training parameters are [here](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/utils/settings.py#L20). | |
| ### Training with LoRA | |
| Look here: [LoRA training](docs/LoRA.md) | |
| ## How to: Inference | |
| ### Inference example | |
| ```bash | |
| python inference.py \ | |
| --model_type mdx23c \ | |
| --config_path configs/config_mdx23c_musdb18.yaml \ | |
| --start_check_point results/last_mdx23c.ckpt \ | |
| --input_folder input/wavs/ \ | |
| --store_dir separation_results/ | |
| ``` | |
| All inference parameters are [here](https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/utils/settings.py#L130). | |
| Convert models to ONNX and TensorRT formats [here](https://github.com/ZFTurbo/MSS_ONNX_TensorRT). | |
| ## Useful notes | |
| * All batch sizes in config are adjusted to use with single NVIDIA A6000 48GB. If you have less memory please adjust correspodningly in model config `training.batch_size` and `training.gradient_accumulation_steps`. | |
| * It's usually always better to start with old weights even if shapes not fully match. Code supports loading weights for not fully same models (but it must have the same architecture). Training will be much faster. | |
| ## Code description | |
| * `configs/config_*.yaml` - configuration files for models | |
| * `models/*` - set of available models for training and inference | |
| * `dataset.py` - dataset which creates new samples for training | |
| * `gui-wx.py` - GUI interface for code | |
| * `inference.py` - process folder with music files and separate them | |
| * `train.py` - main training code for single GPU | |
| * `train_ddp.py` - training code for Multi GPU config. Faster than `train.py`. Use it for 2 or more GPUs. | |
| * `utils.py` - common functions used by train/valid | |
| * `valid.py` - validation of model with metrics | |
| * `ensemble.py` - useful script to ensemble results of different models to make results better (see [docs](docs/ensemble.md)). | |
| ## Pre-trained models | |
| Look here: [List of Pre-trained models](docs/pretrained_models.md) | |
| If you trained some good models, please, share them. You can post config and model weights [in this issue](https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1). | |
| ## Dataset types | |
| Look here: [Dataset types](docs/dataset_types.md) | |
| ## Augmentations | |
| Look here: [Augmentations](docs/augmentations.md) | |
| ## Graphical user interface | |
| Look here: [GUI documentation](docs/gui.md) or see tutorial on [Youtube](https://youtu.be/M8JKFeN7HfU) | |
| ## Citation | |
| * [arxiv paper](https://arxiv.org/abs/2305.07489) | |
| ```text | |
| @misc{solovyev2023benchmarks, | |
| title={Benchmarks and leaderboards for sound demixing tasks}, | |
| author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva}, | |
| year={2023}, | |
| eprint={2305.07489}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.SD} | |
| } | |
| ``` | |