miercolesv commited on
Commit
d12179b
·
verified ·
1 Parent(s): 8a6257e

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. Apollo-Enhancement/.gitattributes +35 -0
  2. Apollo-Enhancement/README.md +125 -0
  3. Apollo-Vocal-MSST/.gitattributes +35 -0
  4. Apollo-Vocal-MSST/README.md +10 -0
  5. Apollo-Vocal-MSST/config_apollo_vocals_ep_54.yaml +31 -0
  6. Aspiration-MelBand-Sucial/.gitattributes +40 -0
  7. Aspiration-MelBand-Sucial/README.md +30 -0
  8. Aspiration-MelBand-Sucial/config_aspiration_mel_band_roformer.yaml +77 -0
  9. BS-RoFormer-Anvuew/.gitattributes +35 -0
  10. BS-RoFormer-Anvuew/README.md +5 -0
  11. BS-RoFormer-Anvuew/config.yaml +129 -0
  12. Dereverb-Echo-MelBand-Sucial/.gitattributes +55 -0
  13. Dereverb-Echo-MelBand-Sucial/README.md +85 -0
  14. Dereverb-Echo-MelBand-Sucial/config_dereverb-echo_mel_band_roformer.yaml +77 -0
  15. Dereverb-Echo-MelBand-Sucial/config_dereverb_echo_mbr_v2.yaml +65 -0
  16. Dereverb-MelBand-Anvuew/.gitattributes +35 -0
  17. Dereverb-MelBand-Anvuew/README.md +15 -0
  18. Dereverb-MelBand-Anvuew/dereverb_mel_band_roformer_anvuew.yaml +76 -0
  19. Dereverb-Room-Anvuew/.gitattributes +35 -0
  20. Dereverb-Room-Anvuew/README.md +23 -0
  21. Dereverb-Room-Anvuew/dereverb_room_anvuew.yaml +132 -0
  22. Karaoke-BS-RoFormer-Anvuew/.gitattributes +35 -0
  23. Karaoke-BS-RoFormer-Anvuew/README.md +5 -0
  24. Karaoke-BS-RoFormer-Anvuew/karaoke_bs_roformer_anvuew.yaml +131 -0
  25. MedleyVox-MultiSinger/.gitattributes +55 -0
  26. MedleyVox-MultiSinger/README.md +156 -0
  27. MedleyVox-MultiSinger/multi_singing_librispeech/loss_graph_vocals.png +0 -0
  28. MedleyVox-MultiSinger/multi_singing_librispeech/vocals.json +642 -0
  29. MedleyVox-MultiSinger/multi_singing_librispeech_138/loss_graph_vocals.png +0 -0
  30. MedleyVox-MultiSinger/multi_singing_librispeech_138/vocals.json +812 -0
  31. MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/loss_graph_vocals.png +0 -0
  32. MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/vocals.json +1321 -0
  33. MedleyVox-MultiSinger/singing_librispeech_iSRNet/loss_graph_vocals.png +0 -0
  34. MedleyVox-MultiSinger/singing_librispeech_iSRNet/vocals.json +1180 -0
  35. MedleyVox-MultiSinger/vocal 231/loss_graph_vocals.png +0 -0
  36. MelBand-Roformer-Deux-Becruily/.gitattributes +35 -0
  37. MelBand-Roformer-Deux-Becruily/README.md +8 -0
  38. MelBand-Roformer-Deux-Becruily/config_deux_becruily.yaml +64 -0
  39. MelBandRoformer-Original/.gitattributes +35 -0
  40. MelBandRoformer-Original/README.md +3 -0
  41. MelBandRoformers/.gitattributes +35 -0
  42. MelBandRoformers/bsroformers/karaoke_bs_roformer.yaml +129 -0
  43. MelBandRoformers/melbandroformers/instrumental/inst_gabox.yaml +51 -0
  44. MelBandRoformers/melbandroformers/instrumental/v10.yaml +73 -0
  45. MelBandRoformers/melbandroformers/karaoke/karaokegabox_1750911344.yaml +72 -0
  46. MelBandRoformers/melbandroformers/vocals/voc_gabox.yaml +51 -0
  47. Single_Models/ZFTurbo/Vocals/config_vocals_htdemucs.yaml +123 -0
  48. Single_Models/ZFTurbo/Vocals/config_vocals_mdx23c.yaml +54 -0
  49. Stable-Audio-Open-1.0/LICENSE.md +58 -0
  50. Stable-Audio-Open-1.0/README.md +182 -0
Apollo-Enhancement/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Apollo-Enhancement/README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ datasets:
4
+ - sebchw/musdb18
5
+ pipeline_tag: audio-to-audio
6
+ tags:
7
+ - music
8
+ ---
9
+
10
+ <p align="center">
11
+ <img src="https://cslikai.cn/Apollo/asserts/apollo-logo.png" alt="Logo" width="150"/>
12
+ </p>
13
+
14
+ <p align="center">
15
+ <strong>Kai Li<sup>1,2</sup>, Yi Luo<sup>2</sup></strong><br>
16
+ <strong><sup>1</sup>Tsinghua University, Beijing, China</strong><br>
17
+ <strong><sup>2</sup>Tencent AI Lab, Shenzhen, China</strong><br>
18
+ <a href="#">ArXiv</a> | <a href="https://cslikai.cn/Apollo/">Demo</a>
19
+
20
+ <p align="center">
21
+ <img src="https://visitor-badge.laobi.icu/badge?page_id=JusperLee.Apollo" alt="访客统计" />
22
+ <img src="https://img.shields.io/github/stars/JusperLee/Apollo?style=social" alt="GitHub stars" />
23
+ <img alt="Static Badge" src="https://img.shields.io/badge/license-CC%20BY--SA%204.0-blue">
24
+ </p>
25
+
26
+ <p align="center">
27
+
28
+ # Apollo: Band-sequence Modeling for High-Quality Music Restoration in Compressed Audio
29
+
30
+ ## 📖 Abstract
31
+
32
+ Apollo is a novel music restoration method designed to address distortions and artefacts caused by audio codecs, especially at low bitrates. Operating in the frequency domain, Apollo uses a frequency band-split module, band-sequence modeling, and frequency band reconstruction to restore the audio quality of **MP3-compressed music**. It divides the spectrogram into sub-bands, extracts gain-shape representations, and models both sub-band and temporal information for high-quality audio recovery. Trained with a Generative Adversarial Network (GAN), Apollo outperforms existing SR-GAN models on the **MUSDB18-HQ and MoisesDB** datasets, excelling in complex multi-instrument and vocal scenarios, while maintaining efficiency.
33
+
34
+ ## 🔥 News
35
+
36
+ - [2024.09.10] Apollo is now available on [ArXiv](#) and [Demo](https://cslikai.cn/Apollo/).
37
+ - [2024.09.106] Apollo checkpoints and pre-trained models are available for download.
38
+
39
+ ## ⚡️ Installation
40
+
41
+ clone the repository
42
+
43
+ ```bash
44
+ git clone https://github.com/JusperLee/Apollo.git && cd Apollo
45
+ conda create --name look2hear --file look2hear.yml
46
+ conda activate look2hear
47
+ ```
48
+
49
+ ## 🖥️ Usage
50
+
51
+ ### 🗂️ Datasets
52
+
53
+ Apollo is trained on the MUSDB18-HQ and MoisesDB datasets. To download the datasets, run the following commands:
54
+
55
+ ```bash
56
+ wget https://zenodo.org/records/3338373/files/musdb18hq.zip?download=1
57
+ wget https://ds-website-downloads.55c2710389d9da776875002a7d018e59.r2.cloudflarestorage.com/moisesdb.zip
58
+ ```
59
+ During data preprocessing, we drew inspiration from music separation techniques and implemented the following steps:
60
+
61
+ 1. **Source Activity Detection (SAD):**
62
+ We used a Source Activity Detector (SAD) to remove silent regions from the audio tracks, retaining only the significant portions for training.
63
+
64
+ 2. **Data Augmentation:**
65
+ We performed real-time data augmentation by mixing tracks from different songs. For each mix, we randomly selected between 1 and 8 stems from the 11 available tracks, extracting 3-second clips from each selected stem. These clips were scaled in energy by a random factor within the range of [-10, 10] dB relative to their original levels. The selected clips were then summed together to create simulated mixed music.
66
+
67
+ 3. **Simulating Dynamic Bitrate Compression:**
68
+ We simulated various bitrate scenarios by applying MP3 codecs with bitrates of [24000, 32000, 48000, 64000, 96000, 128000].
69
+
70
+ 4. **Rescaling:**
71
+ To ensure consistency across all samples, we rescaled both the target and the encoded audio based on their maximum absolute values.
72
+
73
+ 5. **Saving as HDF5:**
74
+ After preprocessing, all data (including the source stems, mixed tracks, and compressed audio) was saved in HDF5 format, making it easy to load for training and evaluation purposes.
75
+
76
+ ### 🚀 Training
77
+ To train the Apollo model, run the following command:
78
+
79
+ ```bash
80
+ python train.py --conf_dir=configs/apollo.yml
81
+ ```
82
+
83
+ ### 🎨 Evaluation
84
+ To evaluate the Apollo model, run the following command:
85
+
86
+ ```bash
87
+ python inference.py --in_wav=assets/input.wav --out_wav=assets/output.wav
88
+ ```
89
+
90
+ ## 📊 Results
91
+
92
+ *Here, you can include a brief overview of the performance metrics or results that Apollo achieves using different bitrates*
93
+
94
+ ![](./https://cslikai.cn/Apollo/asserts/bitrates.png)
95
+
96
+
97
+ *Different methods' SDR/SI-SNR/VISQOL scores for various types of music, as well as the number of model parameters and GPU inference time. For the GPU inference time test, a music signal with a sampling rate of 44.1 kHz and a length of 1 second was used.*
98
+ ![](./https://cslikai.cn/Apollo/asserts/types.png)
99
+
100
+ ## License
101
+
102
+ <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
103
+
104
+ ## Acknowledgements
105
+
106
+ Apollo is developed by the **Look2Hear** at Tsinghua University.
107
+
108
+ ## Citation
109
+
110
+ If you use Apollo in your research or project, please cite the following paper:
111
+
112
+ ```
113
+ @article{li2024apollo,
114
+ title={Apollo: Band-sequence Modeling for High-Quality Music Restoration in Compressed Audio},
115
+ author={Li, Kai and Luo, Yi},
116
+ journal={xxxxxx},
117
+ year={2024}
118
+ }
119
+ ```
120
+
121
+ ## Contact
122
+
123
+ For any questions or feedback regarding Apollo, feel free to reach out to us via email: `tsinghua.kaili@gmail.com`
124
+
125
+
Apollo-Vocal-MSST/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Apollo-Vocal-MSST/README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+ Apollo Offical GitHub:https://github.com/JusperLee/Apollo
5
+
6
+ Apollo is a novel music restoration method designed to address distortions and artefacts caused by audio codecs, especially at low bitrates. Operating in the frequency domain, Apollo uses a frequency band-split module, band-sequence modeling, and frequency band reconstruction to restore the audio quality of MP3-compressed music. It divides the spectrogram into sub-bands, extracts gain-shape representations, and models both sub-band and temporal information for high-quality audio recovery. Trained with a Generative Adversarial Network (GAN), Apollo outperforms existing SR-GAN models on the MUSDB18-HQ and MoisesDB datasets, excelling in complex multi-instrument and vocal scenarios, while maintaining efficiency.
7
+
8
+ The open-sourced content includes models for inference at https://github.com/ZFTurbo/Music-Source-Separation-Training and the original weights with fewer training steps. The training was conducted using sucial's project at https://github.com/SUC-DriverOld/Apollo-Training, with a 92-hour high-quality vocal dataset trained for 1 million steps.
9
+
10
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65ef5331b46c5c72e374a3dd/uRJGmwdu--qhKlkMy5HO6.png)
Apollo-Vocal-MSST/config_apollo_vocals_ep_54.yaml ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 441000
3
+ min_mean_abs: 0.0
4
+ num_channels: 2
5
+ sample_rate: 44100
6
+ augmentations:
7
+ enable: false
8
+ inference:
9
+ batch_size: 1
10
+ num_overlap: 4
11
+ model:
12
+ feature_dim: 384
13
+ layer: 8
14
+ sr: 44100
15
+ win: 20
16
+ training:
17
+ batch_size: 1
18
+ coarse_loss_clip: true
19
+ grad_clip: 0
20
+ instruments:
21
+ - restored
22
+ - addition
23
+ lr: 1.0
24
+ num_epochs: 1000
25
+ num_steps: 1000
26
+ optimizer: prodigy
27
+ patience: 2
28
+ q: 0.95
29
+ reduce_factor: 0.95
30
+ target_instrument: restored
31
+ use_amp: true
Aspiration-MelBand-Sucial/.gitattributes ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ example_audio/example_aspiration_less_aggr.wav filter=lfs diff=lfs merge=lfs -text
37
+ example_audio/example_aspiration.wav filter=lfs diff=lfs merge=lfs -text
38
+ example_audio/example_other_less_aggr.wav filter=lfs diff=lfs merge=lfs -text
39
+ example_audio/example_other.wav filter=lfs diff=lfs merge=lfs -text
40
+ example_audio/example_raw.wav filter=lfs diff=lfs merge=lfs -text
Aspiration-MelBand-Sucial/README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+
5
+ You can try listening to the performance of this model [here](https://huggingface.co/Sucial/Aspiration_Mel_Band_Roformer/tree/main/example_audio)
6
+
7
+ How to use the model?<br>
8
+ Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
9
+
10
+ Description: The model is used to separate aspiration, which will be useful for mixing to some mixrs.<br>
11
+ Instruments: aspiration, other<br>
12
+ Dataset: My own datasets(171 songs for training and 17 songs for validation).<br>
13
+ Metrics: Based on the SDR of 17 songs for validation.<br>
14
+ Finetuned from: `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`<br>
15
+ Configs: [config_aspiration_mel_band_roformer.yaml](./config_aspiration_mel_band_roformer.yaml)
16
+
17
+ Model: [aspiration_mel_band_roformer_sdr_18.9845.ckpt](./aspiration_mel_band_roformer_sdr_18.9845.ckpt)<br>
18
+ Epoch: 123<br>
19
+ Instr SDR aspiration: 9.8554<br>
20
+ Instr SDR other: 28.1136<br>
21
+ SDR Avg: 18.9845<br>
22
+
23
+ Model: [aspiration_mel_band_roformer_less_aggr_sdr_18.1201.ckpt](./aspiration_mel_band_roformer_less_aggr_sdr_18.1201.ckpt)<br>
24
+ Epoch: 27<br>
25
+ Instr SDR aspiration: 9.0704<br>
26
+ Instr SDR other: 27.1699<br>
27
+ SDR Avg: 18.1201<br>
28
+
29
+ Training logs:
30
+ ![image](./training_logs.png)
Aspiration-MelBand-Sucial/config_aspiration_mel_band_roformer.yaml ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 801 # don't work (use in model)
5
+ hop_length: 441 # don't work (use in model)
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 8
14
+ stereo: true
15
+ num_stems: 2
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ num_bands: 60
20
+ dim_head: 64
21
+ heads: 8
22
+ attn_dropout: 0.1
23
+ ff_dropout: 0.1
24
+ flash_attn: True
25
+ dim_freqs_in: 1025
26
+ sample_rate: 44100 # needed for mel filter bank from librosa
27
+ stft_n_fft: 2048
28
+ stft_hop_length: 441
29
+ stft_win_length: 2048
30
+ stft_normalized: False
31
+ mask_estimator_depth: 2
32
+ multi_stft_resolution_loss_weight: 1.0
33
+ multi_stft_resolutions_window_sizes: !!python/tuple
34
+ - 4096
35
+ - 2048
36
+ - 1024
37
+ - 512
38
+ - 256
39
+ multi_stft_hop_size: 147
40
+ multi_stft_normalized: False
41
+
42
+ training:
43
+ batch_size: 1
44
+ gradient_accumulation_steps: 8
45
+ grad_clip: 0
46
+ instruments:
47
+ - aspiration
48
+ - other
49
+ lr: 4.0e-05
50
+ patience: 2
51
+ reduce_factor: 0.95
52
+ target_instrument: null
53
+ num_epochs: 1000
54
+ num_steps: 1000
55
+ q: 0.95
56
+ coarse_loss_clip: true
57
+ ema_momentum: 0.999
58
+ optimizer: adam
59
+ other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
60
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
61
+
62
+ augmentations:
63
+ enable: true # enable or disable all augmentations (to fast disable if needed)
64
+ loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
65
+ loudness_min: 0.5
66
+ loudness_max: 1.5
67
+ mixup: false # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
68
+ mixup_probs: !!python/tuple # 2 additional stems of the same type (1st with prob 0.2, 2nd with prob 0.02)
69
+ - 0.2
70
+ - 0.02
71
+ mixup_loudness_min: 0.5
72
+ mixup_loudness_max: 1.5
73
+
74
+ inference:
75
+ batch_size: 4
76
+ dim_t: 801
77
+ num_overlap: 2
BS-RoFormer-Anvuew/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
BS-RoFormer-Anvuew/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ ---
4
+
5
+ dataset by [bascurtiz](https://github.com/bascurtiz)
BS-RoFormer-Anvuew/config.yaml ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 960000
3
+ dim_f: 1024
4
+ dim_t: 801 # don't work (use in model)
5
+ hop_length: 441 # don't work (use in model)
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.0001
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 12
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ freqs_per_bands: !!python/tuple
20
+ - 2
21
+ - 2
22
+ - 2
23
+ - 2
24
+ - 2
25
+ - 2
26
+ - 2
27
+ - 2
28
+ - 2
29
+ - 2
30
+ - 2
31
+ - 2
32
+ - 2
33
+ - 2
34
+ - 2
35
+ - 2
36
+ - 2
37
+ - 2
38
+ - 2
39
+ - 2
40
+ - 2
41
+ - 2
42
+ - 2
43
+ - 2
44
+ - 4
45
+ - 4
46
+ - 4
47
+ - 4
48
+ - 4
49
+ - 4
50
+ - 4
51
+ - 4
52
+ - 4
53
+ - 4
54
+ - 4
55
+ - 4
56
+ - 12
57
+ - 12
58
+ - 12
59
+ - 12
60
+ - 12
61
+ - 12
62
+ - 12
63
+ - 12
64
+ - 24
65
+ - 24
66
+ - 24
67
+ - 24
68
+ - 24
69
+ - 24
70
+ - 24
71
+ - 24
72
+ - 48
73
+ - 48
74
+ - 48
75
+ - 48
76
+ - 48
77
+ - 48
78
+ - 48
79
+ - 48
80
+ - 128
81
+ - 129
82
+ dim_head: 64
83
+ heads: 8
84
+ attn_dropout: 0.0
85
+ ff_dropout: 0.0
86
+ flash_attn: true
87
+ dim_freqs_in: 1025
88
+ stft_n_fft: 2048
89
+ stft_hop_length: 512
90
+ stft_win_length: 2048
91
+ stft_normalized: false
92
+ mask_estimator_depth: 2
93
+ multi_stft_resolution_loss_weight: 1.0
94
+ multi_stft_resolutions_window_sizes: !!python/tuple
95
+ - 4096
96
+ - 2048
97
+ - 1024
98
+ - 512
99
+ - 256
100
+ multi_stft_hop_size: 147
101
+ multi_stft_normalized: False
102
+ mlp_expansion_factor: 4
103
+ use_torch_checkpoint: True
104
+ skip_connection: False
105
+
106
+
107
+ training:
108
+ batch_size: 1
109
+ gradient_accumulation_steps: 1
110
+ grad_clip: 0
111
+ instruments: ['vocals', 'instrument']
112
+ lr: 1.0e-5
113
+ patience: 5
114
+ reduce_factor: 0.9
115
+ target_instrument: vocals
116
+ num_epochs: 1000
117
+ num_steps: 1000
118
+ q: 0.95
119
+ coarse_loss_clip: true
120
+ ema_momentum: 0.999
121
+ optimizer: adam
122
+ other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
123
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
124
+
125
+
126
+ inference:
127
+ batch_size: 2
128
+ dim_t: 1876
129
+ num_overlap: 4
Dereverb-Echo-MelBand-Sucial/.gitattributes ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ examples/example_dry.wav filter=lfs diff=lfs merge=lfs -text
37
+ examples/example_other.wav filter=lfs diff=lfs merge=lfs -text
38
+ examples/example_raw.wav filter=lfs diff=lfs merge=lfs -text
39
+ examples/other_v1.wav filter=lfs diff=lfs merge=lfs -text
40
+ examples/raw.wav filter=lfs diff=lfs merge=lfs -text
41
+ examples/dry_v1.wav filter=lfs diff=lfs merge=lfs -text
42
+ examples/dry_v2.wav filter=lfs diff=lfs merge=lfs -text
43
+ examples/other_v2.wav filter=lfs diff=lfs merge=lfs -text
44
+ example/de_super_big_reverb_mbr_ep_346/model_super_reverb_dry.flac filter=lfs diff=lfs merge=lfs -text
45
+ example/de_super_big_reverb_mbr_ep_346/model_super_reverb_other.flac filter=lfs diff=lfs merge=lfs -text
46
+ example/de_super_big_reverb_mbr_ep_346/raw.flac filter=lfs diff=lfs merge=lfs -text
47
+ example/dereverb_echo_mbr_fused_model/model_fused_reverb_dry.flac filter=lfs diff=lfs merge=lfs -text
48
+ example/dereverb_echo_mbr_fused_model/model_fused_reverb_other.flac filter=lfs diff=lfs merge=lfs -text
49
+ example/dereverb_echo_mbr_fused_model/raw.flac filter=lfs diff=lfs merge=lfs -text
50
+ example/dereverb_echo_mbr_v2_sdr_dry_13.4843/dry_v2.wav filter=lfs diff=lfs merge=lfs -text
51
+ example/dereverb_echo_mbr_v2_sdr_dry_13.4843/other_v2.wav filter=lfs diff=lfs merge=lfs -text
52
+ example/dereverb_echo_mbr_v2_sdr_dry_13.4843/raw.wav filter=lfs diff=lfs merge=lfs -text
53
+ example/dereverb-echo_mbr_v1_sdr_10.0169/dry_v1.wav filter=lfs diff=lfs merge=lfs -text
54
+ example/dereverb-echo_mbr_v1_sdr_10.0169/other_v1.wav filter=lfs diff=lfs merge=lfs -text
55
+ example/dereverb-echo_mbr_v1_sdr_10.0169/raw.wav filter=lfs diff=lfs merge=lfs -text
Dereverb-Echo-MelBand-Sucial/README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+
5
+ ## Description
6
+
7
+ These models are used to separate reverb and delay effects in vocals. In addition, **these models also have the ability to remove most of the harmonies.** I added random high cut after the reverberation and delay effects in the dataset, so these model's handling of high frequencies is not particularly aggressive.<br>
8
+ You can try listening to the performance of these models [here](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/tree/main/example)!
9
+
10
+ ## How to use the model?
11
+
12
+ Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
13
+
14
+ ## Models
15
+
16
+ ### ===Note: The following models are only effective for vocals!===
17
+
18
+ ### 1. Fused Models(I personally recommend using this model)
19
+
20
+ I used [a model fusion script](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) to fuse three models with the same model structure. The three models and their corresponding fusion ratios are as follows:<br>
21
+ **0.5 * dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt + 0.25 * de_big_reverb_mbr_ep_362.ckpt + 0.25 * de_super_big_reverb_mbr_ep_346.ckpt**<br>
22
+ Therefore, the fused model has the ability to remove both small and large reverberations simultaneously. However, I did not carefully adjust the fusion ratio of each model. If any experts are willing to help me adjust it carefully, I would be very grateful!
23
+
24
+ config: the same as v2 models and big reverb models: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
25
+ fused_model: [dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt](./dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt)
26
+
27
+ ### 2. Big reverb Models
28
+
29
+ There are two models for removing large reverberation in total: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt) and [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt). In general, for large reverberations, using the `de_big_reverb_mbr` model is sufficient. The `de_super_big_reverb_mbr` model is trained for extremely large reverberations and is generally less commonly used. The configuration files of these two models and the v2 model share the same configuration file. And they are all finetuned from `dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt`.
30
+
31
+ config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
32
+ Model_de_big_reverb: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt)<br>
33
+ Model_de_super_big_reverb: [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt)
34
+
35
+ In order to better validate the model's performance, I have added two indicators, `f0_fitness` and `uv_fitness`, as follows:<br>
36
+ Calculate the F0 and voiced/unvoiced (UV) fitness between a reference and an estimated audio signal. These two metrics are only of reference value for vocals.<br>
37
+ The F0 fitness measures how similar the fundamental frequency (F0) of the reference and estimated signals are, while the UV fitness evaluates the accuracy of voiced/unvoiced detection between the two signals. Both are computed by extracting F0 and UV information using pitch analysis and then calculating the Pearson correlation between the corresponding F0 and UV sequences. The F0 fitness can also be used to compare the completeness of the extracted fundamental frequency (F0) for human voice signals. The values of these two metrics are both -1 to 1, and the closer the value is to 1, the better the fit.
38
+
39
+ For these two models, I used different validation sets for verification (so SDR has no practical reference significance), and the validation results are as follows:
40
+ ```
41
+ de_big_reverb_mbr_ep_362.ckpt
42
+ Num overlap: 2
43
+ Instr dry sdr: 14.0030 (Std: 2.9492)
44
+ Instr dry bleedless: 43.6501 (Std: 10.1362)
45
+ Instr dry fullness: 21.7776 (Std: 5.9445)
46
+ Instr dry f0_fitness: 0.8405 (Std: 0.1520)
47
+ Instr dry uv_fitness: 0.9759 (Std: 0.0162)
48
+
49
+ de_super_big_reverb_mbr_ep_346.ckpt
50
+ Num overlap: 2
51
+ Instr dry sdr: 11.3164 (Std: 2.4877)
52
+ Instr dry bleedless: 43.3989 (Std: 10.7918)
53
+ Instr dry fullness: 17.5554 (Std: 4.0178)
54
+ Instr dry f0_fitness: 0.7845 (Std: 0.1864)
55
+ Instr dry uv_fitness: 0.9662 (Std: 0.0172)
56
+ ```
57
+
58
+ ### 3. V2 Models
59
+
60
+ Config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
61
+ Model: [dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt](./dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt)<br>
62
+ Instr dry sdr: 13.4843 (Std: 4.8675)
63
+
64
+ Finetuned from: `dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt`<br>
65
+ Used 1000+ songs to Finetune.
66
+
67
+ ### 4. V1 Models
68
+
69
+ Configs: [config_dereverb-echo_mel_band_roformer.yaml](./config_dereverb-echo_mel_band_roformer.yaml)<br>
70
+ Model: [dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt](./dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt)<br>
71
+ Instr dry sdr: 13.1507, Instr other sdr: 6.8830, Metric avg sdr: 10.0169
72
+
73
+ Instruments: [dry, other]<br>
74
+ Finetuned from: `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`<br>
75
+ Datasets:
76
+ - Training datasets: 270 songs from [opencpop](https://github.com/wenet-e2e/opencpop) and [GTSinger](https://github.com/GTSinger/GTSinger)
77
+ - Validation datasets: 30 songs from my own collection
78
+ - All random reverbs and delay effects are generated by [this python script](./scripts/create_reverb_delay.py) and sorted into the mustb18 dataset format.
79
+
80
+ ## Thanks
81
+
82
+ - Mel-Band-Roformer [[Paper](https://arxiv.org/abs/2310.01809), [Repository](https://github.com/lucidrains/BS-RoFormer)]
83
+ - [ZFTurbo](https://github.com/ZFTurbo)'s training code [[Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)]
84
+ - [CN17161](https://github.com/CN17161) provided GPUs.
85
+ - [Glucy-2](https://github.com/Glucy-2) provided technical assistance.
Dereverb-Echo-MelBand-Sucial/config_dereverb-echo_mel_band_roformer.yaml ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 801 # don't work (use in model)
5
+ hop_length: 441 # don't work (use in model)
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 8
14
+ stereo: true
15
+ num_stems: 2
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ num_bands: 60
20
+ dim_head: 64
21
+ heads: 8
22
+ attn_dropout: 0.1
23
+ ff_dropout: 0.1
24
+ flash_attn: True
25
+ dim_freqs_in: 1025
26
+ sample_rate: 44100 # needed for mel filter bank from librosa
27
+ stft_n_fft: 2048
28
+ stft_hop_length: 441
29
+ stft_win_length: 2048
30
+ stft_normalized: False
31
+ mask_estimator_depth: 2
32
+ multi_stft_resolution_loss_weight: 1.0
33
+ multi_stft_resolutions_window_sizes: !!python/tuple
34
+ - 4096
35
+ - 2048
36
+ - 1024
37
+ - 512
38
+ - 256
39
+ multi_stft_hop_size: 147
40
+ multi_stft_normalized: False
41
+
42
+ training:
43
+ batch_size: 1
44
+ gradient_accumulation_steps: 8
45
+ grad_clip: 0
46
+ instruments:
47
+ - dry
48
+ - other
49
+ lr: 4.0e-05
50
+ patience: 2
51
+ reduce_factor: 0.95
52
+ target_instrument: null
53
+ num_epochs: 1000
54
+ num_steps: 1000
55
+ q: 0.95
56
+ coarse_loss_clip: true
57
+ ema_momentum: 0.999
58
+ optimizer: adam
59
+ other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
60
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
61
+
62
+ augmentations:
63
+ enable: true # enable or disable all augmentations (to fast disable if needed)
64
+ loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
65
+ loudness_min: 0.5
66
+ loudness_max: 1.5
67
+ mixup: false # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
68
+ mixup_probs: !!python/tuple # 2 additional stems of the same type (1st with prob 0.2, 2nd with prob 0.02)
69
+ - 0.2
70
+ - 0.02
71
+ mixup_loudness_min: 0.5
72
+ mixup_loudness_max: 1.5
73
+
74
+ inference:
75
+ batch_size: 4
76
+ dim_t: 801
77
+ num_overlap: 4
Dereverb-Echo-MelBand-Sucial/config_dereverb_echo_mbr_v2.yaml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 801
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 8
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ num_bands: 60
20
+ dim_head: 64
21
+ heads: 8
22
+ attn_dropout: 0.1
23
+ ff_dropout: 0.1
24
+ flash_attn: True
25
+ dim_freqs_in: 1025
26
+ sample_rate: 44100
27
+ stft_n_fft: 2048
28
+ stft_hop_length: 441
29
+ stft_win_length: 2048
30
+ stft_normalized: False
31
+ mask_estimator_depth: 2
32
+ multi_stft_resolution_loss_weight: 1.0
33
+ multi_stft_resolutions_window_sizes: !!python/tuple
34
+ - 4096
35
+ - 2048
36
+ - 1024
37
+ - 512
38
+ - 256
39
+ multi_stft_hop_size: 147
40
+ multi_stft_normalized: False
41
+
42
+ training:
43
+ batch_size: 1
44
+ gradient_accumulation_steps: 8
45
+ grad_clip: 0
46
+ instruments:
47
+ - dry
48
+ - other
49
+ lr: 1.0e-05
50
+ patience: 2
51
+ reduce_factor: 0.95
52
+ target_instrument: dry
53
+ num_epochs: 1000
54
+ num_steps: 1000
55
+ q: 0.95
56
+ coarse_loss_clip: true
57
+ ema_momentum: 0.999
58
+ optimizer: adam
59
+ other_fix: false
60
+ use_amp: true
61
+
62
+ inference:
63
+ batch_size: 1
64
+ dim_t: 801
65
+ num_overlap: 4
Dereverb-MelBand-Anvuew/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Dereverb-MelBand-Anvuew/README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ ---
4
+
5
+ 专用于人声的去混响模型,适用于 https://github.com/ZFTurbo/Music-Source-Separation-Training
6
+
7
+ 微调自[kim的人声乐器分离模型](https://huggingface.co/KimberleyJSN/melbandroformer),所以保留了一些分离人声与乐器的能力
8
+
9
+ 训练数据的干声为单声道单人演唱/说话,所以模型会尽可能使分离后的干声为单声道,导致在分离一些人声为立体声或者经过叠轨等处理的音频时可能会出问题,`dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt`是使用为这些情况的样本挑出来的`dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt`训练中期的权重
10
+
11
+ `dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt`与`dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt`两个权重由于训练代码[bug](https://github.com/ZFTurbo/Music-Source-Separation-Training/commit/0ca5691f22ea71d9afe297926d6e1517cdb38e55),在训练时混响与人声实际并未对齐,所以这两个模型表现出一定程度的去除人声与乐器分离后的残留(通常是弦乐)以及部分和声(非中央声道)的能力
12
+
13
+ `dereverb_mel_band_roformer_mono_anvuew_sdr_20.4029.ckpt`则因修复了对齐bug,去混响的能力更强了,但是去人声乐器分离残留以及和声的能力下降了
14
+
15
+
Dereverb-MelBand-Anvuew/dereverb_mel_band_roformer_anvuew.yaml ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 256
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 384
13
+ depth: 6
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ num_bands: 60
19
+ dim_head: 64
20
+ heads: 8
21
+ attn_dropout: 0
22
+ ff_dropout: 0
23
+ flash_attn: True
24
+ dim_freqs_in: 1025
25
+ sample_rate: 44100 # needed for mel filter bank from librosa
26
+ stft_n_fft: 2048
27
+ stft_hop_length: 441
28
+ stft_win_length: 2048
29
+ stft_normalized: False
30
+ mask_estimator_depth: 2
31
+ multi_stft_resolution_loss_weight: 1.0
32
+ multi_stft_resolutions_window_sizes: !!python/tuple
33
+ - 4096
34
+ - 2048
35
+ - 1024
36
+ - 512
37
+ - 256
38
+ multi_stft_hop_size: 147
39
+ multi_stft_normalized: False
40
+
41
+ training:
42
+ batch_size: 3
43
+ gradient_accumulation_steps: 1
44
+ grad_clip: 0
45
+ instruments:
46
+ - noreverb
47
+ - reverb
48
+ lr: 5.0e-05
49
+ patience: 2
50
+ reduce_factor: 0.95
51
+ target_instrument: noreverb
52
+ num_epochs: 1000
53
+ num_steps: 4000
54
+ q: 0.95
55
+ coarse_loss_clip: false
56
+ ema_momentum: 0.999
57
+ optimizer: adamw
58
+ other_fix: true # it's needed for checking on multisong dataset if other is actually instrumental
59
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
60
+
61
+ augmentations:
62
+ enable: true # enable or disable all augmentations (to fast disable if needed)
63
+ loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
64
+ loudness_min: 0.1
65
+ loudness_max: 1.0
66
+ mixup: false # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
67
+ mixup_probs: !!python/tuple # 2 additional stems of the same type (1st with prob 0.2, 2nd with prob 0.02)
68
+ - 0.2
69
+ - 0.02
70
+ mixup_loudness_min: 0.5
71
+ mixup_loudness_max: 1.5
72
+
73
+ inference:
74
+ batch_size: 1
75
+ dim_t: 801
76
+ num_overlap: 2
Dereverb-Room-Anvuew/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Dereverb-Room-Anvuew/README.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ ---
4
+ A dereverb model specifically for mono vocal room reverb.
5
+
6
+ **Model type:** `bs_roformer`
7
+ **Channels:** mono
8
+ **Reverb in training data:** only convolutional reverbs, generated with [pyroomacoustics](https://github.com/LCAV/pyroomacoustics)
9
+ **Example:**
10
+ - input.flac
11
+ <audio controls>
12
+ <source src="https://huggingface.co/anvuew/dereverb_room/resolve/main/example/input.flac" type="audio/flac">
13
+ </audio>
14
+ - noreverb.flac
15
+ <audio controls>
16
+ <source src="https://huggingface.co/anvuew/dereverb_room/resolve/main/example/noreverb.flac" type="audio/flac">
17
+ </audio>
18
+ - reverb.flac
19
+ <audio controls>
20
+ <source src="https://huggingface.co/anvuew/dereverb_room/resolve/main/example/reverb.flac" type="audio/flac">
21
+ </audio>
22
+
23
+ for refercence [dereverb_mel_band_roformer_mono](https://huggingface.co/anvuew/dereverb_mel_band_roformer/blob/main/dereverb_mel_band_roformer_mono_anvuew_sdr_20.4029.ckpt) got SDR: 7.6685 on same valid set.
Dereverb-Room-Anvuew/dereverb_room_anvuew.yaml ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 384000
3
+ dim_f: 1024
4
+ dim_t: 801 # don't work (use in model)
5
+ hop_length: 441 # don't work (use in model)
6
+ n_fft: 2048
7
+ num_channels: 1
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 128
13
+ depth: 16
14
+ stereo: false
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ freqs_per_bands: !!python/tuple
20
+ - 2
21
+ - 2
22
+ - 2
23
+ - 2
24
+ - 2
25
+ - 2
26
+ - 3
27
+ - 3
28
+ - 3
29
+ - 3
30
+ - 3
31
+ - 4
32
+ - 4
33
+ - 4
34
+ - 4
35
+ - 4
36
+ - 5
37
+ - 5
38
+ - 5
39
+ - 5
40
+ - 6
41
+ - 6
42
+ - 6
43
+ - 6
44
+ - 7
45
+ - 7
46
+ - 7
47
+ - 8
48
+ - 8
49
+ - 8
50
+ - 9
51
+ - 9
52
+ - 10
53
+ - 10
54
+ - 11
55
+ - 12
56
+ - 13
57
+ - 14
58
+ - 15
59
+ - 16
60
+ - 17
61
+ - 18
62
+ - 19
63
+ - 20
64
+ - 21
65
+ - 22
66
+ - 23
67
+ - 24
68
+ - 25
69
+ - 27
70
+ - 29
71
+ - 31
72
+ - 33
73
+ - 35
74
+ - 37
75
+ - 39
76
+ - 41
77
+ - 43
78
+ - 45
79
+ - 48
80
+ - 52
81
+ - 57
82
+ - 64
83
+ dim_head: 16
84
+ heads: 8
85
+ attn_dropout: 0.0
86
+ ff_dropout: 0.0
87
+ flash_attn: true
88
+ dim_freqs_in: 1025
89
+ stft_n_fft: 2048
90
+ stft_hop_length: 512
91
+ stft_win_length: 2048
92
+ stft_normalized: False
93
+ mask_estimator_depth: 3
94
+ multi_stft_resolution_loss_weight: 1.0
95
+ multi_stft_resolutions_window_sizes: !!python/tuple
96
+ - 4096
97
+ - 2048
98
+ - 1024
99
+ - 512
100
+ - 256
101
+ multi_stft_hop_size: 147
102
+ multi_stft_normalized: False
103
+ mlp_expansion_factor: 4
104
+ use_torch_checkpoint: True
105
+ skip_connection: False
106
+
107
+
108
+ training:
109
+ batch_size: 4
110
+ gradient_accumulation_steps: 1
111
+ grad_clip: 1000.0
112
+ instruments: ['noreverb', 'reverb']
113
+ lr: 5.0e-5
114
+ patience: 5
115
+ reduce_factor: 0.75
116
+ target_instrument: noreverb
117
+ num_epochs: 1000
118
+ num_steps: 1000
119
+ q: 0.95
120
+ coarse_loss_clip: true
121
+ ema_momentum: 0.999
122
+ optimizer: adam
123
+
124
+ other_fix: False # it's needed for checking on multisong dataset if other is actually instrumental
125
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
126
+
127
+
128
+
129
+ inference:
130
+ batch_size: 1
131
+ dim_t: 871
132
+ num_overlap: 2
Karaoke-BS-RoFormer-Anvuew/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
Karaoke-BS-RoFormer-Anvuew/README.md ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ ---
4
+
5
+ dataset by [becruily](https://huggingface.co/becruily)
Karaoke-BS-RoFormer-Anvuew/karaoke_bs_roformer_anvuew.yaml ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 640000
3
+ dim_f: 1024
4
+ dim_t: 801 # don't work (use in model)
5
+ hop_length: 441 # don't work (use in model)
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 12
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ freqs_per_bands: !!python/tuple
20
+ - 2
21
+ - 2
22
+ - 2
23
+ - 2
24
+ - 2
25
+ - 2
26
+ - 2
27
+ - 2
28
+ - 2
29
+ - 2
30
+ - 2
31
+ - 2
32
+ - 2
33
+ - 2
34
+ - 2
35
+ - 2
36
+ - 2
37
+ - 2
38
+ - 2
39
+ - 2
40
+ - 2
41
+ - 2
42
+ - 2
43
+ - 2
44
+ - 4
45
+ - 4
46
+ - 4
47
+ - 4
48
+ - 4
49
+ - 4
50
+ - 4
51
+ - 4
52
+ - 4
53
+ - 4
54
+ - 4
55
+ - 4
56
+ - 12
57
+ - 12
58
+ - 12
59
+ - 12
60
+ - 12
61
+ - 12
62
+ - 12
63
+ - 12
64
+ - 24
65
+ - 24
66
+ - 24
67
+ - 24
68
+ - 24
69
+ - 24
70
+ - 24
71
+ - 24
72
+ - 48
73
+ - 48
74
+ - 48
75
+ - 48
76
+ - 48
77
+ - 48
78
+ - 48
79
+ - 48
80
+ - 128
81
+ - 129
82
+ dim_head: 64
83
+ heads: 8
84
+ attn_dropout: 0.0
85
+ ff_dropout: 0.0
86
+ flash_attn: true
87
+ dim_freqs_in: 1025
88
+ stft_n_fft: 2048
89
+ stft_hop_length: 512
90
+ stft_win_length: 2048
91
+ stft_normalized: false
92
+ mask_estimator_depth: 2
93
+ multi_stft_resolution_loss_weight: 1.0
94
+ multi_stft_resolutions_window_sizes: !!python/tuple
95
+ - 4096
96
+ - 2048
97
+ - 1024
98
+ - 512
99
+ - 256
100
+ multi_stft_hop_size: 147
101
+ multi_stft_normalized: False
102
+ mlp_expansion_factor: 4
103
+ use_torch_checkpoint: True
104
+ skip_connection: False
105
+
106
+
107
+ training:
108
+ batch_size: 1
109
+ gradient_accumulation_steps: 1
110
+ grad_clip: 0
111
+ instruments: ['Vocals', 'Instrumental']
112
+ lr: 5.0e-5
113
+ patience: 7
114
+ reduce_factor: 0.75
115
+ target_instrument: Vocals
116
+ num_epochs: 1000
117
+ num_steps: 1000
118
+ q: 0.95
119
+ coarse_loss_clip: true
120
+ ema_momentum: 0.999
121
+ optimizer: adam
122
+ other_fix: False # it's needed for checking on multisong dataset if other is actually instrumental
123
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
124
+
125
+
126
+
127
+
128
+ inference:
129
+ batch_size: 2
130
+ dim_t: 1251
131
+ num_overlap: 4
MedleyVox-MultiSinger/.gitattributes ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ vocals[[:space:]]135/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
37
+ vocals[[:space:]]163/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
38
+ vocals[[:space:]]188/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
39
+ vocals[[:space:]]200/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
40
+ singing_librispeech_iSRNet/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
41
+ eval_results/singing_librispeech_iSRNet/examples/ex_68/mixture.wav filter=lfs diff=lfs merge=lfs -text
42
+ eval_results/singing_librispeech_iSRNet/examples/ex_68/s0_estimate.wav filter=lfs diff=lfs merge=lfs -text
43
+ eval_results/singing_librispeech_iSRNet/examples/ex_68/s0.wav filter=lfs diff=lfs merge=lfs -text
44
+ eval_results/singing_librispeech_iSRNet/examples/ex_68/s1_estimate.wav filter=lfs diff=lfs merge=lfs -text
45
+ eval_results/singing_librispeech_iSRNet/examples/ex_68/s1.wav filter=lfs diff=lfs merge=lfs -text
46
+ eval_results/singing_librispeech/examples/ex_69/mixture.wav filter=lfs diff=lfs merge=lfs -text
47
+ eval_results/singing_librispeech/examples/ex_69/s0_estimate.wav filter=lfs diff=lfs merge=lfs -text
48
+ eval_results/singing_librispeech/examples/ex_69/s0.wav filter=lfs diff=lfs merge=lfs -text
49
+ eval_results/singing_librispeech/examples/ex_69/s1_estimate.wav filter=lfs diff=lfs merge=lfs -text
50
+ eval_results/singing_librispeech/examples/ex_69/s1.wav filter=lfs diff=lfs merge=lfs -text
51
+ vocal[[:space:]]231/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
52
+ vocals[[:space:]]238/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
53
+ singing_librispeech_ft_iSRNet/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
54
+ multi_singing_librispeech/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
55
+ multi_singing_librispeech_138/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
MedleyVox-MultiSinger/README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ library_name: asteroid
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ This model aims to separate duets, unisons, or any other number of voices from a given audio track.
9
+
10
+ ## Model Details
11
+
12
+ ### Model Description
13
+
14
+ <!-- Provide a longer summary of what this model is. -->
15
+
16
+ - **Developed by:** Carson Evans
17
+ - **Model type:** Audio Separation
18
+ - **License:** Creative Commons Attribution 4.0
19
+
20
+ ### Model Sources [optional]
21
+
22
+ <!-- Provide the basic links for the model. -->
23
+
24
+ - **Repository:** https://github.com/CBeast25/MedleyVox
25
+ - **Paper:** https://arxiv.org/abs/2211.07302
26
+ - **Demo:** https://catnip-leaf-c6a.notion.site/Audio-Samples-of-MedleyVox-An-Evaluation-Dataset-for-Multiple-Singing-Voices-Separation-30074b2c88d24f46b68d9293f6095962
27
+
28
+ ## Uses
29
+
30
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
31
+
32
+ ### Direct Use
33
+
34
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
35
+
36
+ [More Information Needed]
37
+
38
+ ### Downstream Use [optional]
39
+
40
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
41
+
42
+ [More Information Needed]
43
+
44
+ ### Out-of-Scope Use
45
+
46
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
47
+
48
+ [More Information Needed]
49
+
50
+ ## Bias, Risks, and Limitations
51
+
52
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
53
+
54
+ [More Information Needed]
55
+
56
+ ### Recommendations
57
+
58
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
59
+
60
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
61
+
62
+ ## How to Get Started with the Model
63
+
64
+ Use the code below to get started with the model.
65
+
66
+ [More Information Needed]
67
+
68
+ ## Training Details
69
+
70
+ ### Training Data
71
+
72
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
73
+
74
+ [More Information Needed]
75
+
76
+ ### Training Procedure
77
+
78
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
79
+
80
+ #### Preprocessing
81
+
82
+ [More Information Needed]
83
+
84
+
85
+ #### Training Hyperparameters
86
+
87
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
88
+
89
+ #### Speeds, Sizes, Times [optional]
90
+
91
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
92
+
93
+ [More Information Needed]
94
+
95
+ ## Evaluation
96
+
97
+ <!-- This section describes the evaluation protocols and provides the results. -->
98
+
99
+ ### Testing Data, Factors & Metrics
100
+
101
+ #### Testing Data
102
+
103
+ <!-- This should link to a Dataset Card if possible. -->
104
+
105
+ [More Information Needed]
106
+
107
+ #### Factors
108
+
109
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
110
+
111
+ [More Information Needed]
112
+
113
+ #### Metrics
114
+
115
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
116
+
117
+ [More Information Needed]
118
+
119
+ ### Results
120
+
121
+ [More Information Needed]
122
+
123
+ #### Summary
124
+
125
+
126
+ ## Environmental Impact
127
+
128
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
129
+
130
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
131
+
132
+ - **Hardware Type:** [More Information Needed]
133
+ - **Hours used:** [More Information Needed]
134
+ - **Cloud Provider:** [More Information Needed]
135
+ - **Compute Region:** [More Information Needed]
136
+ - **Carbon Emitted:** [More Information Needed]
137
+
138
+ ## Citation [optional]
139
+
140
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
141
+
142
+ **BibTeX:**
143
+
144
+ [More Information Needed]
145
+
146
+ **APA:**
147
+
148
+ [More Information Needed]
149
+
150
+ ## Glossary [optional]
151
+
152
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
153
+
154
+ ## Model Card Contact
155
+
156
+ carson.evans@colostate.edu
MedleyVox-MultiSinger/multi_singing_librispeech/loss_graph_vocals.png ADDED
MedleyVox-MultiSinger/multi_singing_librispeech/vocals.json ADDED
@@ -0,0 +1,642 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "args": {
3
+ "above_freq": 300.0,
4
+ "architecture": "conv_tasnet_stft",
5
+ "batch_size": 58,
6
+ "beta1": 0.5,
7
+ "beta2": 0.9,
8
+ "bn_chan": 256,
9
+ "continual_train": false,
10
+ "dataset": "multi_singing_librispeech",
11
+ "db_normalize": false,
12
+ "ema": true,
13
+ "encoder_activation": null,
14
+ "entity": "carson2050",
15
+ "epochs": 200,
16
+ "eps": 1e-08,
17
+ "exp_name": "multi_singing_librispeech",
18
+ "ff_activation": "relu",
19
+ "gpu": 0,
20
+ "gradient_clip": null,
21
+ "hid_chan": 1024,
22
+ "load_ema_online_model": false,
23
+ "lr": 0.0002,
24
+ "lr_decay_gamma": 0.5,
25
+ "lr_decay_patience": 20,
26
+ "lr_scheduler": "step_lr",
27
+ "mask_act": "linear",
28
+ "max_n_src": 4,
29
+ "min_n_src": 2,
30
+ "mixed_precision": false,
31
+ "mixture_consistency": "mixture_consistency",
32
+ "multi_spec_loss_log_scale": false,
33
+ "n_blocks": 6,
34
+ "n_filter": 512,
35
+ "n_kernel": 512,
36
+ "n_nodes": 1,
37
+ "n_repeats": 4,
38
+ "n_src": 2,
39
+ "nb_workers": 8,
40
+ "nfft": 2048,
41
+ "ngpus_per_node": 1,
42
+ "nhop": 512,
43
+ "no_cuda": false,
44
+ "no_mask": false,
45
+ "no_mask_residual": false,
46
+ "optimizer": "adam",
47
+ "output": "results/singing_sep/checkpoint/multi_singing_librispeech",
48
+ "output_directory": "results/singing_sep",
49
+ "part_of_data": null,
50
+ "patience": 50,
51
+ "pitch_formant_augment_prob": 0.4,
52
+ "port": null,
53
+ "project": "MedleyVox_home_pt2",
54
+ "quiet": false,
55
+ "rank": 0,
56
+ "reduced_training_data_ratio": 1.0,
57
+ "resume": "results/singing_sep/checkpoint/multi_singing_librispeech",
58
+ "same_singer_dict_path": [
59
+ [
60
+ "../data/24k/OpenSinger",
61
+ "./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
62
+ "OpenSinger"
63
+ ],
64
+ [
65
+ "../data/24k/k_multisinger",
66
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
67
+ "k_multisinger"
68
+ ],
69
+ [
70
+ "../data/24k/CSD",
71
+ "./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
72
+ "CSD"
73
+ ],
74
+ [
75
+ "../data/24k/jsut-song_ver1",
76
+ "./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
77
+ "jsut-song_ver1"
78
+ ],
79
+ [
80
+ "../data/24k/jvs_music_ver1",
81
+ "./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
82
+ "jvs_music_ver1"
83
+ ],
84
+ [
85
+ "../data/24k/k_multitimbre",
86
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
87
+ "k_multitimbre"
88
+ ],
89
+ [
90
+ "../data/24k/kiritan_revised",
91
+ "./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
92
+ "kiritan"
93
+ ],
94
+ [
95
+ "../data/24k/musdb_a_train",
96
+ "./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
97
+ "musdb_a_train"
98
+ ],
99
+ [
100
+ "../data/24k/NUS",
101
+ "./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
102
+ "NUS"
103
+ ],
104
+ [
105
+ "../data/24k/VocalSet",
106
+ "./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
107
+ "VocalSet"
108
+ ]
109
+ ],
110
+ "same_singer_ratio": 0.2,
111
+ "same_song_dict_path": [
112
+ [
113
+ "../data/24k/k_multisinger",
114
+ "./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
115
+ "k_multisinger"
116
+ ]
117
+ ],
118
+ "same_song_ratio": 0.2,
119
+ "same_speaker_dict_path": [
120
+ [
121
+ "../data/24k/LibriSpeech_train-clean-100",
122
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
123
+ "LibriSpeech_train-clean-100"
124
+ ],
125
+ [
126
+ "../data/24k/LibriSpeech_train-clean-360",
127
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
128
+ "LibriSpeech_train-clean-360"
129
+ ]
130
+ ],
131
+ "same_speaker_ratio": 0.15,
132
+ "sample_rate": 24000,
133
+ "seed": 777,
134
+ "seq_dur": 3.0,
135
+ "sing_sing_ratio": 0.15,
136
+ "sing_speech_ratio": 0.15,
137
+ "skip_chan": 256,
138
+ "song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
139
+ "speech_train_root": [
140
+ "../data/24k/LibriSpeech_train-clean-360",
141
+ "../data/24k/LibriSpeech_train-clean-100"
142
+ ],
143
+ "sr_input_res": false,
144
+ "sr_out_mix_consistency": false,
145
+ "srnet": "orig",
146
+ "start_from_best": false,
147
+ "sweep": false,
148
+ "target": "vocals",
149
+ "train_loss_func": [
150
+ "pit_snr",
151
+ "multi_spectral_l1"
152
+ ],
153
+ "train_root": [
154
+ "../data/24k/CSD",
155
+ "../data/24k/NUS",
156
+ "../data/24k/TONAS",
157
+ "../data/24k/VocalSet",
158
+ "../data/24k/jsut-song_ver1",
159
+ "../data/24k/jvs_music_ver1",
160
+ "../data/24k/kiritan_revised",
161
+ "../data/24k/vocadito",
162
+ "../data/24k/musdb_a_train",
163
+ "../data/24k/OpenSinger",
164
+ "../data/24k/medleyDB_v1_in_musdb",
165
+ "../data/24k/k_multisinger",
166
+ "../data/24k/k_multitimbre"
167
+ ],
168
+ "unison_prob": 0.3,
169
+ "use_wandb": true,
170
+ "valid_loss_func": [
171
+ "pit_si_sdr"
172
+ ],
173
+ "valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
174
+ "valid_root": [
175
+ [
176
+ "../data/24k/musdb_a_test",
177
+ "../data/24k/musdb_a_test",
178
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
179
+ "sing_sing_diff"
180
+ ],
181
+ [
182
+ "../data/24k/musdb_a_test",
183
+ "../data/24k/musdb_a_test",
184
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
185
+ "sing_sing_unison"
186
+ ],
187
+ [
188
+ "../data/24k/musdb_a_test",
189
+ "../data/24k/musdb_a_test",
190
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
191
+ "sing_sing_same_singer"
192
+ ],
193
+ [
194
+ "../data/24k/LibriSpeech_dev-clean",
195
+ "../data/24k/LibriSpeech_dev-clean",
196
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
197
+ "speech_speech_diff"
198
+ ],
199
+ [
200
+ "../data/24k/LibriSpeech_dev-clean",
201
+ "../data/24k/LibriSpeech_dev-clean",
202
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
203
+ "speech_speech_unison"
204
+ ],
205
+ [
206
+ "../data/24k/LibriSpeech_dev-clean",
207
+ "../data/24k/LibriSpeech_dev-clean",
208
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
209
+ "speech_speech_same_speaker"
210
+ ],
211
+ [
212
+ "../data/24k/musdb_a_test",
213
+ "../data/24k/LibriSpeech_dev-clean",
214
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
215
+ "singing_speech"
216
+ ]
217
+ ],
218
+ "valid_root_orpit": [
219
+ [
220
+ "../data/24k/musdb_a_test",
221
+ "../data/24k/musdb_a_test",
222
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
223
+ "sing_sing_diff"
224
+ ],
225
+ [
226
+ "../data/24k/musdb_a_test",
227
+ "../data/24k/musdb_a_test",
228
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
229
+ "sing_sing_unison"
230
+ ],
231
+ [
232
+ "../data/24k/musdb_a_test",
233
+ "../data/24k/musdb_a_test",
234
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
235
+ "sing_sing_same_singer"
236
+ ],
237
+ [
238
+ "../data/24k/LibriSpeech_dev-clean",
239
+ "../data/24k/LibriSpeech_dev-clean",
240
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
241
+ "speech_speech_diff"
242
+ ],
243
+ [
244
+ "../data/24k/LibriSpeech_dev-clean",
245
+ "../data/24k/LibriSpeech_dev-clean",
246
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
247
+ "speech_speech_unison"
248
+ ],
249
+ [
250
+ "../data/24k/LibriSpeech_dev-clean",
251
+ "../data/24k/LibriSpeech_dev-clean",
252
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
253
+ "speech_speech_same_speaker"
254
+ ],
255
+ [
256
+ "../data/24k/musdb_a_test",
257
+ "../data/24k/LibriSpeech_dev-clean",
258
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
259
+ "singing_speech"
260
+ ]
261
+ ],
262
+ "weight_decay": 1e-06,
263
+ "world_size": 1
264
+ },
265
+ "best_epoch": 92,
266
+ "best_loss": -7.675145898546491,
267
+ "epochs_trained": 93,
268
+ "num_bad_epochs": 1,
269
+ "train_loss_history": [
270
+ -1.4921716451644897,
271
+ -2.8664329051971436,
272
+ -3.3393054008483887,
273
+ -3.638092517852783,
274
+ -3.879303216934204,
275
+ -4.088993072509766,
276
+ -4.227543830871582,
277
+ -4.412248134613037,
278
+ -4.58026123046875,
279
+ -4.71837043762207,
280
+ -4.800468444824219,
281
+ -4.882855415344238,
282
+ -5.011181831359863,
283
+ -5.128243923187256,
284
+ -5.150334358215332,
285
+ -5.240769386291504,
286
+ -5.357062816619873,
287
+ -5.35420560836792,
288
+ -5.427922248840332,
289
+ -5.536999225616455,
290
+ -5.6000895500183105,
291
+ -5.664849758148193,
292
+ -5.704154968261719,
293
+ -5.791101455688477,
294
+ -5.794349670410156,
295
+ -5.784161567687988,
296
+ -5.824007511138916,
297
+ -5.931461811065674,
298
+ -5.981809139251709,
299
+ -6.045787334442139,
300
+ -6.045494079589844,
301
+ -6.075621128082275,
302
+ -6.086508750915527,
303
+ -6.123781681060791,
304
+ -6.192169666290283,
305
+ -6.248963832855225,
306
+ -6.183308124542236,
307
+ -6.25191593170166,
308
+ -6.301548004150391,
309
+ -6.259702682495117,
310
+ -6.338959217071533,
311
+ -6.372439861297607,
312
+ -6.436537742614746,
313
+ -6.462899684906006,
314
+ -6.449411392211914,
315
+ -6.489621639251709,
316
+ -6.461447238922119,
317
+ -6.558005332946777,
318
+ -6.603482723236084,
319
+ -6.555445194244385,
320
+ -6.579801082611084,
321
+ -6.639071464538574,
322
+ -6.648660182952881,
323
+ -6.6866631507873535,
324
+ -6.767474174499512,
325
+ -6.750443935394287,
326
+ -6.7306742668151855,
327
+ -6.7853617668151855,
328
+ -6.818509101867676,
329
+ -6.761360168457031,
330
+ -6.795668601989746,
331
+ -6.82188606262207,
332
+ -6.795504093170166,
333
+ -6.914917469024658,
334
+ -6.921656131744385,
335
+ -6.950718402862549,
336
+ -6.966548919677734,
337
+ -6.965545654296875,
338
+ -6.964168548583984,
339
+ -6.888548374176025,
340
+ -6.932443141937256,
341
+ -6.930734634399414,
342
+ -6.949597358703613,
343
+ -6.947843551635742,
344
+ -6.959360599517822,
345
+ -6.974522590637207,
346
+ -7.005373954772949,
347
+ -7.039368629455566,
348
+ -7.008696556091309,
349
+ -7.064368724822998,
350
+ -7.038439750671387,
351
+ -7.046519756317139,
352
+ -7.052777290344238,
353
+ -7.06027889251709,
354
+ -7.048835277557373,
355
+ -7.095891952514648,
356
+ -7.080573558807373,
357
+ -7.120383262634277,
358
+ -7.1035075187683105,
359
+ -7.147456645965576,
360
+ -7.133329391479492,
361
+ -7.134939670562744,
362
+ -7.155049800872803
363
+ ],
364
+ "train_time_history": [
365
+ 4810.419310808182,
366
+ 4810.429551362991,
367
+ 4780.074353456497,
368
+ 4780.08434343338,
369
+ 4793.850719213486,
370
+ 4793.851686954498,
371
+ 4799.262031078339,
372
+ 4799.2719786167145,
373
+ 4776.265509605408,
374
+ 4776.275769710541,
375
+ 4800.915772199631,
376
+ 4800.925550937653,
377
+ 4782.19565987587,
378
+ 4870.2729279994965,
379
+ 4864.202353715897,
380
+ 5526.39341044426,
381
+ 5526.402764797211,
382
+ 5210.057184457779,
383
+ 5210.0663821697235,
384
+ 5192.114199876785,
385
+ 5192.115474700928,
386
+ 5119.568732976913,
387
+ 5119.579450130463,
388
+ 4854.391019105911,
389
+ 4854.4009165763855,
390
+ 4825.207883834839,
391
+ 4825.218036174774,
392
+ 4839.370161294937,
393
+ 4839.3797080516815,
394
+ 4829.168277978897,
395
+ 4829.178178310394,
396
+ 4831.754481077194,
397
+ 4831.764403104782,
398
+ 4840.167069673538,
399
+ 4840.1764142513275,
400
+ 4839.306309938431,
401
+ 4839.315984725952,
402
+ 4835.479310274124,
403
+ 4835.489530324936,
404
+ 4991.815203428268,
405
+ 4826.761980772018,
406
+ 4826.772101163864,
407
+ 4819.3496108055115,
408
+ 4819.358952999115,
409
+ 4820.984974384308,
410
+ 4820.995014190674,
411
+ 4820.539752483368,
412
+ 4820.548979997635,
413
+ 4814.873534917831,
414
+ 4814.876268863678,
415
+ 4812.354250907898,
416
+ 4812.363839626312,
417
+ 4822.391925573349,
418
+ 4822.40118765831,
419
+ 4809.686738491058,
420
+ 4809.697638034821,
421
+ 4832.5055372715,
422
+ 4832.515355587006,
423
+ 4831.67563867569,
424
+ 4831.685403108597,
425
+ 4824.845934391022,
426
+ 4824.85514998436,
427
+ 4835.57625246048,
428
+ 4835.587289094925,
429
+ 4817.744952201843,
430
+ 4817.7542552948,
431
+ 4807.804133653641,
432
+ 4807.814810037613,
433
+ 4818.521605968475,
434
+ 4818.532015800476,
435
+ 4981.354954957962,
436
+ 4981.368631839752,
437
+ 4875.586889028549,
438
+ 4875.597553014755,
439
+ 4801.111567258835,
440
+ 4801.1219182014465,
441
+ 4799.074081897736,
442
+ 4799.08514547348,
443
+ 4809.401276350021,
444
+ 4809.41465306282,
445
+ 4809.043102502823,
446
+ 4809.04475402832,
447
+ 4818.2070748806,
448
+ 4818.209503889084,
449
+ 4796.3679666519165,
450
+ 4796.377726793289,
451
+ 4794.153427362442,
452
+ 4794.155965805054,
453
+ 4804.1572597026825,
454
+ 4804.168130159378,
455
+ 4797.392125368118,
456
+ 4797.401923418045,
457
+ 4797.116873264313,
458
+ 4797.12747836113,
459
+ 4799.205674409866,
460
+ 4799.215870857239,
461
+ 4969.960748910904,
462
+ 4969.971879482269,
463
+ 5270.599810838699,
464
+ 5270.6101796627045,
465
+ 4881.989181518555,
466
+ 4882.000226974487,
467
+ 4867.6136746406555,
468
+ 4867.624637126923,
469
+ 5128.904933452606,
470
+ 5128.915862798691,
471
+ 4879.79870891571,
472
+ 4879.80947971344,
473
+ 4969.744366407394,
474
+ 4969.754128456116,
475
+ 4907.097052812576,
476
+ 4907.107843637466,
477
+ 4812.9132516384125,
478
+ 4812.9242560863495,
479
+ 4815.909214496613,
480
+ 4815.920344591141,
481
+ 4806.699935913086,
482
+ 4806.70260477066,
483
+ 4831.170897245407,
484
+ 4831.180289506912,
485
+ 4839.252681255341,
486
+ 4839.262135982513,
487
+ 4833.886634111404,
488
+ 4833.8898758888245,
489
+ 4830.524186134338,
490
+ 4830.53564286232,
491
+ 4824.74093079567,
492
+ 4824.747734546661,
493
+ 4818.754670858383,
494
+ 4818.764072179794,
495
+ 4816.966838121414,
496
+ 4816.977759599686,
497
+ 4831.582427740097,
498
+ 4831.592094898224,
499
+ 4804.51261639595,
500
+ 4993.473606586456,
501
+ 4815.601177692413,
502
+ 4815.610737085342,
503
+ 4789.788247346878,
504
+ 4865.854624032974,
505
+ 4865.864605426788,
506
+ 5030.177618980408,
507
+ 5030.188777208328,
508
+ 4769.904754638672,
509
+ 4895.086503267288,
510
+ 4895.0979063510895,
511
+ 4869.957269668579,
512
+ 4869.958615779877,
513
+ 4962.7930123806,
514
+ 4962.803097486496,
515
+ 5163.898764133453,
516
+ 5163.90244436264,
517
+ 4803.290739297867,
518
+ 4803.3002672195435,
519
+ 4819.002298593521,
520
+ 4819.014036178589,
521
+ 4812.0083973407745,
522
+ 5130.011174440384,
523
+ 5130.020927429199,
524
+ 5238.748838424683,
525
+ 5162.933927536011,
526
+ 5162.94544506073,
527
+ 5014.217702865601,
528
+ 5014.227581739426,
529
+ 5119.955267906189,
530
+ 5119.966482877731,
531
+ 4877.71505856514,
532
+ 4947.4076771736145,
533
+ 4947.418792486191,
534
+ 4980.132425069809,
535
+ 4980.143876552582,
536
+ 5166.483239412308,
537
+ 5166.49423623085,
538
+ 4906.088274717331,
539
+ 4906.0993638038635,
540
+ 4880.329564332962,
541
+ 4880.339328289032,
542
+ 4873.104112148285,
543
+ 4873.119816303253,
544
+ 4885.143585205078,
545
+ 5015.694309234619
546
+ ],
547
+ "valid_loss_history": [
548
+ -3.0233164174216136,
549
+ -4.020770004817417,
550
+ -4.493505137307303,
551
+ -4.805826323372977,
552
+ -5.009723663330078,
553
+ -5.3551515851702005,
554
+ -5.507791314806257,
555
+ -5.709285395486014,
556
+ -5.82812111718314,
557
+ -5.88963999067034,
558
+ -5.970332486288888,
559
+ -5.988547257014683,
560
+ -6.113276481628418,
561
+ -6.132954188755581,
562
+ -6.210943358285086,
563
+ -6.279647214072091,
564
+ -6.3300862312316895,
565
+ -6.36109277180263,
566
+ -6.427222183772495,
567
+ -6.453009741646903,
568
+ -6.489914894104004,
569
+ -6.48867974962507,
570
+ -6.536007336207798,
571
+ -6.532879625047956,
572
+ -6.572577135903495,
573
+ -6.5566478456769675,
574
+ -6.660695620945522,
575
+ -6.70451055254255,
576
+ -6.65756470816476,
577
+ -6.701659406934466,
578
+ -6.7815567425319125,
579
+ -6.804818085261753,
580
+ -6.783689567020962,
581
+ -6.844764641353062,
582
+ -6.868685790470669,
583
+ -6.888231481824603,
584
+ -6.942419528961182,
585
+ -6.951289176940918,
586
+ -6.975076675415039,
587
+ -6.991657052721296,
588
+ -7.000387941087995,
589
+ -7.082024574279785,
590
+ -7.087371553693499,
591
+ -7.114969117300851,
592
+ -7.163520812988281,
593
+ -7.1951784406389505,
594
+ -7.216815676007952,
595
+ -7.201807635171073,
596
+ -7.183896745954241,
597
+ -7.227273804800851,
598
+ -7.232961927141462,
599
+ -7.2955668313162665,
600
+ -7.2690509387425015,
601
+ -7.273542472294399,
602
+ -7.281754766191755,
603
+ -7.311358247484479,
604
+ -7.287418706076486,
605
+ -7.261871746608189,
606
+ -7.2840664727347235,
607
+ -7.316314697265625,
608
+ -7.376913070678711,
609
+ -7.367326668330601,
610
+ -7.438824789864676,
611
+ -7.427623748779297,
612
+ -7.45092739377703,
613
+ -7.4810590744018555,
614
+ -7.43196405683245,
615
+ -7.413298266274588,
616
+ -7.448171275002616,
617
+ -7.470413276127407,
618
+ -7.478131294250488,
619
+ -7.494483879634312,
620
+ -7.459411212376186,
621
+ -7.515866688319615,
622
+ -7.571803229195731,
623
+ -7.573634147644043,
624
+ -7.562024729592459,
625
+ -7.550929818834577,
626
+ -7.53609037399292,
627
+ -7.563671180180141,
628
+ -7.578108038221087,
629
+ -7.531997203826904,
630
+ -7.600094999585833,
631
+ -7.639314787728446,
632
+ -7.65882362638201,
633
+ -7.624989918300083,
634
+ -7.5906588690621515,
635
+ -7.606484276907785,
636
+ -7.59099394934518,
637
+ -7.611005442483084,
638
+ -7.649059908730643,
639
+ -7.675145898546491,
640
+ -7.672644138336182
641
+ ]
642
+ }
MedleyVox-MultiSinger/multi_singing_librispeech_138/loss_graph_vocals.png ADDED
MedleyVox-MultiSinger/multi_singing_librispeech_138/vocals.json ADDED
@@ -0,0 +1,812 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "args": {
3
+ "above_freq": 300.0,
4
+ "architecture": "conv_tasnet_stft",
5
+ "batch_size": 58,
6
+ "beta1": 0.5,
7
+ "beta2": 0.9,
8
+ "bn_chan": 256,
9
+ "continual_train": false,
10
+ "dataset": "multi_singing_librispeech",
11
+ "db_normalize": false,
12
+ "ema": true,
13
+ "encoder_activation": null,
14
+ "entity": "carson2050",
15
+ "epochs": 200,
16
+ "eps": 1e-08,
17
+ "exp_name": "multi_singing_librispeech",
18
+ "ff_activation": "relu",
19
+ "gpu": 0,
20
+ "gradient_clip": null,
21
+ "hid_chan": 1024,
22
+ "load_ema_online_model": false,
23
+ "lr": 0.0002,
24
+ "lr_decay_gamma": 0.5,
25
+ "lr_decay_patience": 20,
26
+ "lr_scheduler": "step_lr",
27
+ "mask_act": "linear",
28
+ "max_n_src": 4,
29
+ "min_n_src": 2,
30
+ "mixed_precision": false,
31
+ "mixture_consistency": "mixture_consistency",
32
+ "multi_spec_loss_log_scale": false,
33
+ "n_blocks": 6,
34
+ "n_filter": 512,
35
+ "n_kernel": 512,
36
+ "n_nodes": 1,
37
+ "n_repeats": 4,
38
+ "n_src": 2,
39
+ "nb_workers": 8,
40
+ "nfft": 2048,
41
+ "ngpus_per_node": 1,
42
+ "nhop": 512,
43
+ "no_cuda": false,
44
+ "no_mask": false,
45
+ "no_mask_residual": false,
46
+ "optimizer": "adam",
47
+ "output": "results/singing_sep/checkpoint/multi_singing_librispeech",
48
+ "output_directory": "results/singing_sep",
49
+ "part_of_data": null,
50
+ "patience": 50,
51
+ "pitch_formant_augment_prob": 0.4,
52
+ "port": null,
53
+ "project": "MedleyVox_home_pt2",
54
+ "quiet": false,
55
+ "rank": 0,
56
+ "reduced_training_data_ratio": 1.0,
57
+ "resume": "results/singing_sep/checkpoint/multi_singing_librispeech",
58
+ "same_singer_dict_path": [
59
+ [
60
+ "../data/24k/OpenSinger",
61
+ "./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
62
+ "OpenSinger"
63
+ ],
64
+ [
65
+ "../data/24k/k_multisinger",
66
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
67
+ "k_multisinger"
68
+ ],
69
+ [
70
+ "../data/24k/CSD",
71
+ "./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
72
+ "CSD"
73
+ ],
74
+ [
75
+ "../data/24k/jsut-song_ver1",
76
+ "./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
77
+ "jsut-song_ver1"
78
+ ],
79
+ [
80
+ "../data/24k/jvs_music_ver1",
81
+ "./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
82
+ "jvs_music_ver1"
83
+ ],
84
+ [
85
+ "../data/24k/k_multitimbre",
86
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
87
+ "k_multitimbre"
88
+ ],
89
+ [
90
+ "../data/24k/kiritan_revised",
91
+ "./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
92
+ "kiritan"
93
+ ],
94
+ [
95
+ "../data/24k/musdb_a_train",
96
+ "./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
97
+ "musdb_a_train"
98
+ ],
99
+ [
100
+ "../data/24k/NUS",
101
+ "./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
102
+ "NUS"
103
+ ],
104
+ [
105
+ "../data/24k/VocalSet",
106
+ "./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
107
+ "VocalSet"
108
+ ]
109
+ ],
110
+ "same_singer_ratio": 0.2,
111
+ "same_song_dict_path": [
112
+ [
113
+ "../data/24k/k_multisinger",
114
+ "./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
115
+ "k_multisinger"
116
+ ]
117
+ ],
118
+ "same_song_ratio": 0.2,
119
+ "same_speaker_dict_path": [
120
+ [
121
+ "../data/24k/LibriSpeech_train-clean-100",
122
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
123
+ "LibriSpeech_train-clean-100"
124
+ ],
125
+ [
126
+ "../data/24k/LibriSpeech_train-clean-360",
127
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
128
+ "LibriSpeech_train-clean-360"
129
+ ]
130
+ ],
131
+ "same_speaker_ratio": 0.15,
132
+ "sample_rate": 24000,
133
+ "seed": 777,
134
+ "seq_dur": 3.0,
135
+ "sing_sing_ratio": 0.15,
136
+ "sing_speech_ratio": 0.15,
137
+ "skip_chan": 256,
138
+ "song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
139
+ "speech_train_root": [
140
+ "../data/24k/LibriSpeech_train-clean-360",
141
+ "../data/24k/LibriSpeech_train-clean-100"
142
+ ],
143
+ "sr_input_res": false,
144
+ "sr_out_mix_consistency": false,
145
+ "srnet": "orig",
146
+ "start_from_best": true,
147
+ "sweep": false,
148
+ "target": "vocals",
149
+ "train_loss_func": [
150
+ "pit_snr",
151
+ "multi_spectral_l1"
152
+ ],
153
+ "train_root": [
154
+ "../data/24k/CSD",
155
+ "../data/24k/NUS",
156
+ "../data/24k/TONAS",
157
+ "../data/24k/VocalSet",
158
+ "../data/24k/jsut-song_ver1",
159
+ "../data/24k/jvs_music_ver1",
160
+ "../data/24k/kiritan_revised",
161
+ "../data/24k/vocadito",
162
+ "../data/24k/musdb_a_train",
163
+ "../data/24k/OpenSinger",
164
+ "../data/24k/medleyDB_v1_in_musdb",
165
+ "../data/24k/k_multisinger",
166
+ "../data/24k/k_multitimbre"
167
+ ],
168
+ "unison_prob": 0.3,
169
+ "use_wandb": true,
170
+ "valid_loss_func": [
171
+ "pit_si_sdr"
172
+ ],
173
+ "valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
174
+ "valid_root": [
175
+ [
176
+ "../data/24k/musdb_a_test",
177
+ "../data/24k/musdb_a_test",
178
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
179
+ "sing_sing_diff"
180
+ ],
181
+ [
182
+ "../data/24k/musdb_a_test",
183
+ "../data/24k/musdb_a_test",
184
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
185
+ "sing_sing_unison"
186
+ ],
187
+ [
188
+ "../data/24k/musdb_a_test",
189
+ "../data/24k/musdb_a_test",
190
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
191
+ "sing_sing_same_singer"
192
+ ],
193
+ [
194
+ "../data/24k/LibriSpeech_dev-clean",
195
+ "../data/24k/LibriSpeech_dev-clean",
196
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
197
+ "speech_speech_diff"
198
+ ],
199
+ [
200
+ "../data/24k/LibriSpeech_dev-clean",
201
+ "../data/24k/LibriSpeech_dev-clean",
202
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
203
+ "speech_speech_unison"
204
+ ],
205
+ [
206
+ "../data/24k/LibriSpeech_dev-clean",
207
+ "../data/24k/LibriSpeech_dev-clean",
208
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
209
+ "speech_speech_same_speaker"
210
+ ],
211
+ [
212
+ "../data/24k/musdb_a_test",
213
+ "../data/24k/LibriSpeech_dev-clean",
214
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
215
+ "singing_speech"
216
+ ]
217
+ ],
218
+ "valid_root_orpit": [
219
+ [
220
+ "../data/24k/musdb_a_test",
221
+ "../data/24k/musdb_a_test",
222
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
223
+ "sing_sing_diff"
224
+ ],
225
+ [
226
+ "../data/24k/musdb_a_test",
227
+ "../data/24k/musdb_a_test",
228
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
229
+ "sing_sing_unison"
230
+ ],
231
+ [
232
+ "../data/24k/musdb_a_test",
233
+ "../data/24k/musdb_a_test",
234
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
235
+ "sing_sing_same_singer"
236
+ ],
237
+ [
238
+ "../data/24k/LibriSpeech_dev-clean",
239
+ "../data/24k/LibriSpeech_dev-clean",
240
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
241
+ "speech_speech_diff"
242
+ ],
243
+ [
244
+ "../data/24k/LibriSpeech_dev-clean",
245
+ "../data/24k/LibriSpeech_dev-clean",
246
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
247
+ "speech_speech_unison"
248
+ ],
249
+ [
250
+ "../data/24k/LibriSpeech_dev-clean",
251
+ "../data/24k/LibriSpeech_dev-clean",
252
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
253
+ "speech_speech_same_speaker"
254
+ ],
255
+ [
256
+ "../data/24k/musdb_a_test",
257
+ "../data/24k/LibriSpeech_dev-clean",
258
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
259
+ "singing_speech"
260
+ ]
261
+ ],
262
+ "weight_decay": 1e-06,
263
+ "world_size": 1
264
+ },
265
+ "best_epoch": 133,
266
+ "best_loss": -8.078724997384208,
267
+ "epochs_trained": 138,
268
+ "num_bad_epochs": 5,
269
+ "train_loss_history": [
270
+ -1.4921716451644897,
271
+ -2.8664329051971436,
272
+ -3.3393054008483887,
273
+ -3.638092517852783,
274
+ -3.879303216934204,
275
+ -4.088993072509766,
276
+ -4.227543830871582,
277
+ -4.412248134613037,
278
+ -4.58026123046875,
279
+ -4.71837043762207,
280
+ -4.800468444824219,
281
+ -4.882855415344238,
282
+ -5.011181831359863,
283
+ -5.128243923187256,
284
+ -5.150334358215332,
285
+ -5.240769386291504,
286
+ -5.357062816619873,
287
+ -5.35420560836792,
288
+ -5.427922248840332,
289
+ -5.536999225616455,
290
+ -5.6000895500183105,
291
+ -5.664849758148193,
292
+ -5.704154968261719,
293
+ -5.791101455688477,
294
+ -5.794349670410156,
295
+ -5.784161567687988,
296
+ -5.824007511138916,
297
+ -5.931461811065674,
298
+ -5.981809139251709,
299
+ -6.045787334442139,
300
+ -6.045494079589844,
301
+ -6.075621128082275,
302
+ -6.086508750915527,
303
+ -6.123781681060791,
304
+ -6.192169666290283,
305
+ -6.248963832855225,
306
+ -6.183308124542236,
307
+ -6.25191593170166,
308
+ -6.301548004150391,
309
+ -6.259702682495117,
310
+ -6.338959217071533,
311
+ -6.372439861297607,
312
+ -6.436537742614746,
313
+ -6.462899684906006,
314
+ -6.449411392211914,
315
+ -6.489621639251709,
316
+ -6.461447238922119,
317
+ -6.558005332946777,
318
+ -6.603482723236084,
319
+ -6.555445194244385,
320
+ -6.579801082611084,
321
+ -6.639071464538574,
322
+ -6.648660182952881,
323
+ -6.6866631507873535,
324
+ -6.767474174499512,
325
+ -6.750443935394287,
326
+ -6.7306742668151855,
327
+ -6.7853617668151855,
328
+ -6.818509101867676,
329
+ -6.761360168457031,
330
+ -6.795668601989746,
331
+ -6.82188606262207,
332
+ -6.795504093170166,
333
+ -6.914917469024658,
334
+ -6.921656131744385,
335
+ -6.950718402862549,
336
+ -6.966548919677734,
337
+ -6.965545654296875,
338
+ -6.964168548583984,
339
+ -6.888548374176025,
340
+ -6.932443141937256,
341
+ -6.930734634399414,
342
+ -6.949597358703613,
343
+ -6.947843551635742,
344
+ -6.959360599517822,
345
+ -6.974522590637207,
346
+ -7.005373954772949,
347
+ -7.039368629455566,
348
+ -7.008696556091309,
349
+ -7.064368724822998,
350
+ -7.038439750671387,
351
+ -7.046519756317139,
352
+ -7.052777290344238,
353
+ -7.06027889251709,
354
+ -7.048835277557373,
355
+ -7.095891952514648,
356
+ -7.080573558807373,
357
+ -7.120383262634277,
358
+ -7.1035075187683105,
359
+ -7.147456645965576,
360
+ -7.133329391479492,
361
+ -7.134939670562744,
362
+ -7.155049800872803,
363
+ -7.205596446990967,
364
+ -7.225539207458496,
365
+ -7.267192840576172,
366
+ -7.250244617462158,
367
+ -7.287757873535156,
368
+ -7.301974296569824,
369
+ -7.254255294799805,
370
+ -7.3585429191589355,
371
+ -7.332351207733154,
372
+ -7.346045970916748,
373
+ -7.384589672088623,
374
+ -7.356956958770752,
375
+ -7.392472743988037,
376
+ -7.418970584869385,
377
+ -7.446511745452881,
378
+ -7.445052623748779,
379
+ -7.42110538482666,
380
+ -7.461490631103516,
381
+ -7.509364128112793,
382
+ -7.508744716644287,
383
+ -7.480283260345459,
384
+ -7.561679363250732,
385
+ -7.4522271156311035,
386
+ -7.437519550323486,
387
+ -7.473508834838867,
388
+ -7.49954080581665,
389
+ -7.418591499328613,
390
+ -7.478306293487549,
391
+ -7.459006309509277,
392
+ -7.478801250457764,
393
+ -7.432499408721924,
394
+ -7.565118312835693,
395
+ -7.627929210662842,
396
+ -7.529797554016113,
397
+ -7.611763954162598,
398
+ -7.658102989196777,
399
+ -7.665148735046387,
400
+ -7.690982818603516,
401
+ -7.733800411224365,
402
+ -7.499368667602539,
403
+ -7.578357696533203,
404
+ -7.613222122192383,
405
+ -7.657804489135742,
406
+ -7.653645038604736,
407
+ -7.742368221282959
408
+ ],
409
+ "train_time_history": [
410
+ 4810.419310808182,
411
+ 4810.429551362991,
412
+ 4780.074353456497,
413
+ 4780.08434343338,
414
+ 4793.850719213486,
415
+ 4793.851686954498,
416
+ 4799.262031078339,
417
+ 4799.2719786167145,
418
+ 4776.265509605408,
419
+ 4776.275769710541,
420
+ 4800.915772199631,
421
+ 4800.925550937653,
422
+ 4782.19565987587,
423
+ 4870.2729279994965,
424
+ 4864.202353715897,
425
+ 5526.39341044426,
426
+ 5526.402764797211,
427
+ 5210.057184457779,
428
+ 5210.0663821697235,
429
+ 5192.114199876785,
430
+ 5192.115474700928,
431
+ 5119.568732976913,
432
+ 5119.579450130463,
433
+ 4854.391019105911,
434
+ 4854.4009165763855,
435
+ 4825.207883834839,
436
+ 4825.218036174774,
437
+ 4839.370161294937,
438
+ 4839.3797080516815,
439
+ 4829.168277978897,
440
+ 4829.178178310394,
441
+ 4831.754481077194,
442
+ 4831.764403104782,
443
+ 4840.167069673538,
444
+ 4840.1764142513275,
445
+ 4839.306309938431,
446
+ 4839.315984725952,
447
+ 4835.479310274124,
448
+ 4835.489530324936,
449
+ 4991.815203428268,
450
+ 4826.761980772018,
451
+ 4826.772101163864,
452
+ 4819.3496108055115,
453
+ 4819.358952999115,
454
+ 4820.984974384308,
455
+ 4820.995014190674,
456
+ 4820.539752483368,
457
+ 4820.548979997635,
458
+ 4814.873534917831,
459
+ 4814.876268863678,
460
+ 4812.354250907898,
461
+ 4812.363839626312,
462
+ 4822.391925573349,
463
+ 4822.40118765831,
464
+ 4809.686738491058,
465
+ 4809.697638034821,
466
+ 4832.5055372715,
467
+ 4832.515355587006,
468
+ 4831.67563867569,
469
+ 4831.685403108597,
470
+ 4824.845934391022,
471
+ 4824.85514998436,
472
+ 4835.57625246048,
473
+ 4835.587289094925,
474
+ 4817.744952201843,
475
+ 4817.7542552948,
476
+ 4807.804133653641,
477
+ 4807.814810037613,
478
+ 4818.521605968475,
479
+ 4818.532015800476,
480
+ 4981.354954957962,
481
+ 4981.368631839752,
482
+ 4875.586889028549,
483
+ 4875.597553014755,
484
+ 4801.111567258835,
485
+ 4801.1219182014465,
486
+ 4799.074081897736,
487
+ 4799.08514547348,
488
+ 4809.401276350021,
489
+ 4809.41465306282,
490
+ 4809.043102502823,
491
+ 4809.04475402832,
492
+ 4818.2070748806,
493
+ 4818.209503889084,
494
+ 4796.3679666519165,
495
+ 4796.377726793289,
496
+ 4794.153427362442,
497
+ 4794.155965805054,
498
+ 4804.1572597026825,
499
+ 4804.168130159378,
500
+ 4797.392125368118,
501
+ 4797.401923418045,
502
+ 4797.116873264313,
503
+ 4797.12747836113,
504
+ 4799.205674409866,
505
+ 4799.215870857239,
506
+ 4969.960748910904,
507
+ 4969.971879482269,
508
+ 5270.599810838699,
509
+ 5270.6101796627045,
510
+ 4881.989181518555,
511
+ 4882.000226974487,
512
+ 4867.6136746406555,
513
+ 4867.624637126923,
514
+ 5128.904933452606,
515
+ 5128.915862798691,
516
+ 4879.79870891571,
517
+ 4879.80947971344,
518
+ 4969.744366407394,
519
+ 4969.754128456116,
520
+ 4907.097052812576,
521
+ 4907.107843637466,
522
+ 4812.9132516384125,
523
+ 4812.9242560863495,
524
+ 4815.909214496613,
525
+ 4815.920344591141,
526
+ 4806.699935913086,
527
+ 4806.70260477066,
528
+ 4831.170897245407,
529
+ 4831.180289506912,
530
+ 4839.252681255341,
531
+ 4839.262135982513,
532
+ 4833.886634111404,
533
+ 4833.8898758888245,
534
+ 4830.524186134338,
535
+ 4830.53564286232,
536
+ 4824.74093079567,
537
+ 4824.747734546661,
538
+ 4818.754670858383,
539
+ 4818.764072179794,
540
+ 4816.966838121414,
541
+ 4816.977759599686,
542
+ 4831.582427740097,
543
+ 4831.592094898224,
544
+ 4804.51261639595,
545
+ 4993.473606586456,
546
+ 4815.601177692413,
547
+ 4815.610737085342,
548
+ 4789.788247346878,
549
+ 4865.854624032974,
550
+ 4865.864605426788,
551
+ 5030.177618980408,
552
+ 5030.188777208328,
553
+ 4769.904754638672,
554
+ 4895.086503267288,
555
+ 4895.0979063510895,
556
+ 4869.957269668579,
557
+ 4869.958615779877,
558
+ 4962.7930123806,
559
+ 4962.803097486496,
560
+ 5163.898764133453,
561
+ 5163.90244436264,
562
+ 4803.290739297867,
563
+ 4803.3002672195435,
564
+ 4819.002298593521,
565
+ 4819.014036178589,
566
+ 4812.0083973407745,
567
+ 5130.011174440384,
568
+ 5130.020927429199,
569
+ 5238.748838424683,
570
+ 5162.933927536011,
571
+ 5162.94544506073,
572
+ 5014.217702865601,
573
+ 5014.227581739426,
574
+ 5119.955267906189,
575
+ 5119.966482877731,
576
+ 4877.71505856514,
577
+ 4947.4076771736145,
578
+ 4947.418792486191,
579
+ 4980.132425069809,
580
+ 4980.143876552582,
581
+ 5166.483239412308,
582
+ 5166.49423623085,
583
+ 4906.088274717331,
584
+ 4906.0993638038635,
585
+ 4880.329564332962,
586
+ 4880.339328289032,
587
+ 4873.104112148285,
588
+ 4873.119816303253,
589
+ 4885.143585205078,
590
+ 5015.694309234619,
591
+ 5418.331888914108,
592
+ 5144.6408631801605,
593
+ 5144.652040481567,
594
+ 5510.665446281433,
595
+ 5510.677114725113,
596
+ 4798.924424171448,
597
+ 4798.935866594315,
598
+ 4812.6511833667755,
599
+ 4812.66309094429,
600
+ 4802.269027709961,
601
+ 4802.280553340912,
602
+ 4802.639967918396,
603
+ 4802.649654150009,
604
+ 4785.577591180801,
605
+ 4785.58954501152,
606
+ 4792.5177211761475,
607
+ 4792.5278561115265,
608
+ 4783.97540807724,
609
+ 4835.069321632385,
610
+ 4857.578319787979,
611
+ 4857.590538024902,
612
+ 4861.290355205536,
613
+ 4861.301545619965,
614
+ 4851.626524686813,
615
+ 4597.863308668137,
616
+ 4597.874926805496,
617
+ 4598.068494558334,
618
+ 4598.072705507278,
619
+ 4607.527726888657,
620
+ 4607.5391726493835,
621
+ 4593.976358413696,
622
+ 4593.987717866898,
623
+ 4608.605073928833,
624
+ 4608.616888284683,
625
+ 4604.218909025192,
626
+ 4604.230967283249,
627
+ 4601.031387329102,
628
+ 4936.647894382477,
629
+ 4936.660204172134,
630
+ 4616.291204214096,
631
+ 4828.3469569683075,
632
+ 5099.975877046585,
633
+ 5099.978621482849,
634
+ 5208.007155179977,
635
+ 5208.019089221954,
636
+ 4924.660996437073,
637
+ 4631.26912856102,
638
+ 4631.2805788517,
639
+ 4622.089585542679,
640
+ 4669.914644002914,
641
+ 4669.9266991615295,
642
+ 5124.021832227707,
643
+ 5124.033423900604,
644
+ 4878.845312595367,
645
+ 4878.8553302288055,
646
+ 4774.273676395416,
647
+ 4936.869963884354,
648
+ 4936.871921777725,
649
+ 4737.119180679321,
650
+ 4737.129071235657,
651
+ 4738.499984264374,
652
+ 4738.504971981049,
653
+ 4728.711101770401,
654
+ 4728.721209287643,
655
+ 4720.726502895355,
656
+ 4720.738488435745,
657
+ 4740.279819250107,
658
+ 4740.292397260666,
659
+ 4727.014559984207,
660
+ 4771.3221616744995,
661
+ 4771.332203388214,
662
+ 4708.109707355499,
663
+ 4708.121538639069,
664
+ 4709.580441951752,
665
+ 4709.592922925949,
666
+ 4704.416685819626,
667
+ 4704.427433013916,
668
+ 4723.0361750125885,
669
+ 4723.046160697937,
670
+ 4742.564235210419
671
+ ],
672
+ "valid_loss_history": [
673
+ -3.0233164174216136,
674
+ -4.020770004817417,
675
+ -4.493505137307303,
676
+ -4.805826323372977,
677
+ -5.009723663330078,
678
+ -5.3551515851702005,
679
+ -5.507791314806257,
680
+ -5.709285395486014,
681
+ -5.82812111718314,
682
+ -5.88963999067034,
683
+ -5.970332486288888,
684
+ -5.988547257014683,
685
+ -6.113276481628418,
686
+ -6.132954188755581,
687
+ -6.210943358285086,
688
+ -6.279647214072091,
689
+ -6.3300862312316895,
690
+ -6.36109277180263,
691
+ -6.427222183772495,
692
+ -6.453009741646903,
693
+ -6.489914894104004,
694
+ -6.48867974962507,
695
+ -6.536007336207798,
696
+ -6.532879625047956,
697
+ -6.572577135903495,
698
+ -6.5566478456769675,
699
+ -6.660695620945522,
700
+ -6.70451055254255,
701
+ -6.65756470816476,
702
+ -6.701659406934466,
703
+ -6.7815567425319125,
704
+ -6.804818085261753,
705
+ -6.783689567020962,
706
+ -6.844764641353062,
707
+ -6.868685790470669,
708
+ -6.888231481824603,
709
+ -6.942419528961182,
710
+ -6.951289176940918,
711
+ -6.975076675415039,
712
+ -6.991657052721296,
713
+ -7.000387941087995,
714
+ -7.082024574279785,
715
+ -7.087371553693499,
716
+ -7.114969117300851,
717
+ -7.163520812988281,
718
+ -7.1951784406389505,
719
+ -7.216815676007952,
720
+ -7.201807635171073,
721
+ -7.183896745954241,
722
+ -7.227273804800851,
723
+ -7.232961927141462,
724
+ -7.2955668313162665,
725
+ -7.2690509387425015,
726
+ -7.273542472294399,
727
+ -7.281754766191755,
728
+ -7.311358247484479,
729
+ -7.287418706076486,
730
+ -7.261871746608189,
731
+ -7.2840664727347235,
732
+ -7.316314697265625,
733
+ -7.376913070678711,
734
+ -7.367326668330601,
735
+ -7.438824789864676,
736
+ -7.427623748779297,
737
+ -7.45092739377703,
738
+ -7.4810590744018555,
739
+ -7.43196405683245,
740
+ -7.413298266274588,
741
+ -7.448171275002616,
742
+ -7.470413276127407,
743
+ -7.478131294250488,
744
+ -7.494483879634312,
745
+ -7.459411212376186,
746
+ -7.515866688319615,
747
+ -7.571803229195731,
748
+ -7.573634147644043,
749
+ -7.562024729592459,
750
+ -7.550929818834577,
751
+ -7.53609037399292,
752
+ -7.563671180180141,
753
+ -7.578108038221087,
754
+ -7.531997203826904,
755
+ -7.600094999585833,
756
+ -7.639314787728446,
757
+ -7.65882362638201,
758
+ -7.624989918300083,
759
+ -7.5906588690621515,
760
+ -7.606484276907785,
761
+ -7.59099394934518,
762
+ -7.611005442483084,
763
+ -7.649059908730643,
764
+ -7.675145898546491,
765
+ -7.672644138336182,
766
+ -7.692336968013218,
767
+ -7.745299407414028,
768
+ -7.703637259347098,
769
+ -7.7217142922537665,
770
+ -7.755190917423794,
771
+ -7.724456242152622,
772
+ -7.718445096697126,
773
+ -7.70384795325143,
774
+ -7.769181455884661,
775
+ -7.737368106842041,
776
+ -7.775186266217913,
777
+ -7.79545715876988,
778
+ -7.832847731454032,
779
+ -7.88108880179269,
780
+ -7.83744832447597,
781
+ -7.77746193749564,
782
+ -7.796889645712716,
783
+ -7.857666560581753,
784
+ -7.839055401938302,
785
+ -7.861930165972028,
786
+ -7.839671952383859,
787
+ -7.864107472555978,
788
+ -7.894429411206927,
789
+ -7.918878350939069,
790
+ -7.923129831041608,
791
+ -7.894604206085205,
792
+ -7.947047778538296,
793
+ -8.002508435930524,
794
+ -7.973720823015485,
795
+ -7.974254812513079,
796
+ -8.013154642922538,
797
+ -8.002701963697161,
798
+ -8.019224030630928,
799
+ -8.05652904510498,
800
+ -8.018050602504186,
801
+ -7.991105215890067,
802
+ -8.010712623596191,
803
+ -8.022557122366768,
804
+ -8.00721972329276,
805
+ -8.078724997384208,
806
+ -8.01572070802961,
807
+ -8.054968425205775,
808
+ -7.982895714896066,
809
+ -7.920760086604527,
810
+ -7.992146287645612
811
+ ]
812
+ }
MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/loss_graph_vocals.png ADDED
MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/vocals.json ADDED
@@ -0,0 +1,1321 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "args": {
3
+ "above_freq": 3000.0,
4
+ "architecture": "conv_tasnet_stft",
5
+ "batch_size": 5,
6
+ "beta1": 0.5,
7
+ "beta2": 0.9,
8
+ "bn_chan": 256,
9
+ "continual_train": true,
10
+ "dataset": "singing_librispeech",
11
+ "db_normalize": false,
12
+ "ema": true,
13
+ "encoder_activation": null,
14
+ "entity": "carson2050",
15
+ "epochs": 280,
16
+ "eps": 1e-08,
17
+ "exp_name": "singin_librispeech_ft_iSRNet",
18
+ "ff_activation": "relu",
19
+ "gpu": 0,
20
+ "gradient_clip": 5.0,
21
+ "hid_chan": 1024,
22
+ "load_ema_online_model": false,
23
+ "lr": 2e-05,
24
+ "lr_decay_gamma": 0.5,
25
+ "lr_decay_patience": 3,
26
+ "lr_scheduler": "cos_warmup",
27
+ "mask_act": "linear",
28
+ "max_n_src": 4,
29
+ "min_n_src": 2,
30
+ "mixed_precision": false,
31
+ "mixture_consistency": "sfsrnet",
32
+ "multi_spec_loss_log_scale": false,
33
+ "n_blocks": 6,
34
+ "n_filter": 512,
35
+ "n_kernel": 512,
36
+ "n_nodes": 1,
37
+ "n_repeats": 4,
38
+ "n_src": 2,
39
+ "nb_workers": 8,
40
+ "nfft": 2048,
41
+ "ngpus_per_node": 1,
42
+ "nhop": 512,
43
+ "no_cuda": false,
44
+ "no_mask": false,
45
+ "no_mask_residual": false,
46
+ "optimizer": "adam",
47
+ "output": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep/checkpoint/singin_librispeech_ft_iSRNet",
48
+ "output_directory": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep",
49
+ "part_of_data": null,
50
+ "patience": 15,
51
+ "pitch_formant_augment_prob": 0.4,
52
+ "port": null,
53
+ "project": "MedleyVox_home",
54
+ "quiet": false,
55
+ "rank": 0,
56
+ "reduced_training_data_ratio": 0.1,
57
+ "resume": "results/singing_sep/checkpoint/singing_librispeech_ft2",
58
+ "same_singer_dict_path": [
59
+ [
60
+ "../data/24k/OpenSinger",
61
+ "./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
62
+ "OpenSinger"
63
+ ],
64
+ [
65
+ "../data/24k/k_multisinger",
66
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
67
+ "k_multisinger"
68
+ ],
69
+ [
70
+ "../data/24k/CSD",
71
+ "./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
72
+ "CSD"
73
+ ],
74
+ [
75
+ "../data/24k/jsut-song_ver1",
76
+ "./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
77
+ "jsut-song_ver1"
78
+ ],
79
+ [
80
+ "../data/24k/jvs_music_ver1",
81
+ "./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
82
+ "jvs_music_ver1"
83
+ ],
84
+ [
85
+ "../data/24k/k_multitimbre",
86
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
87
+ "k_multitimbre"
88
+ ],
89
+ [
90
+ "../data/24k/kiritan_revised",
91
+ "./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
92
+ "kiritan"
93
+ ],
94
+ [
95
+ "../data/24k/musdb_a_train",
96
+ "./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
97
+ "musdb_a_train"
98
+ ],
99
+ [
100
+ "../data/24k/NUS",
101
+ "./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
102
+ "NUS"
103
+ ],
104
+ [
105
+ "../data/24k/VocalSet",
106
+ "./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
107
+ "VocalSet"
108
+ ]
109
+ ],
110
+ "same_singer_ratio": 0.2,
111
+ "same_song_dict_path": [
112
+ [
113
+ "../data/24k/k_multisinger",
114
+ "./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
115
+ "k_multisinger"
116
+ ]
117
+ ],
118
+ "same_song_ratio": 0.2,
119
+ "same_speaker_dict_path": [
120
+ [
121
+ "../data/24k/LibriSpeech_train-clean-100",
122
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
123
+ "LibriSpeech_train-clean-100"
124
+ ],
125
+ [
126
+ "../data/24k/LibriSpeech_train-clean-360",
127
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
128
+ "LibriSpeech_train-clean-360"
129
+ ]
130
+ ],
131
+ "same_speaker_ratio": 0.15,
132
+ "sample_rate": 24000,
133
+ "seed": 777,
134
+ "seq_dur": 3.0,
135
+ "sing_sing_ratio": 0.15,
136
+ "sing_speech_ratio": 0.15,
137
+ "skip_chan": 256,
138
+ "song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
139
+ "speech_train_root": [
140
+ "../data/24k/LibriSpeech_train-clean-360",
141
+ "../data/24k/LibriSpeech_train-clean-100"
142
+ ],
143
+ "sr_input_res": false,
144
+ "sr_out_mix_consistency": false,
145
+ "srnet": "convnext",
146
+ "start_from_best": true,
147
+ "sweep": false,
148
+ "target": "vocals",
149
+ "train_loss_func": [
150
+ "pit_snr",
151
+ "multi_spectral_l1",
152
+ "snr"
153
+ ],
154
+ "train_root": [
155
+ "../data/24k/CSD",
156
+ "../data/24k/NUS",
157
+ "../data/24k/TONAS",
158
+ "../data/24k/VocalSet",
159
+ "../data/24k/jsut-song_ver1",
160
+ "../data/24k/jvs_music_ver1",
161
+ "../data/24k/kiritan_revised",
162
+ "../data/24k/vocadito",
163
+ "../data/24k/musdb_a_train",
164
+ "../data/24k/OpenSinger",
165
+ "../data/24k/medleyDB_v1_in_musdb",
166
+ "../data/24k/k_multisinger",
167
+ "../data/24k/k_multitimbre"
168
+ ],
169
+ "unison_prob": 0.3,
170
+ "use_wandb": true,
171
+ "valid_loss_func": [
172
+ "pit_si_sdr"
173
+ ],
174
+ "valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
175
+ "valid_root": [
176
+ [
177
+ "../data/24k/musdb_a_test",
178
+ "../data/24k/musdb_a_test",
179
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
180
+ "sing_sing_diff"
181
+ ],
182
+ [
183
+ "../data/24k/musdb_a_test",
184
+ "../data/24k/musdb_a_test",
185
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
186
+ "sing_sing_unison"
187
+ ],
188
+ [
189
+ "../data/24k/musdb_a_test",
190
+ "../data/24k/musdb_a_test",
191
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
192
+ "sing_sing_same_singer"
193
+ ],
194
+ [
195
+ "../data/24k/LibriSpeech_dev-clean",
196
+ "../data/24k/LibriSpeech_dev-clean",
197
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
198
+ "speech_speech_diff"
199
+ ],
200
+ [
201
+ "../data/24k/LibriSpeech_dev-clean",
202
+ "../data/24k/LibriSpeech_dev-clean",
203
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
204
+ "speech_speech_unison"
205
+ ],
206
+ [
207
+ "../data/24k/LibriSpeech_dev-clean",
208
+ "../data/24k/LibriSpeech_dev-clean",
209
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
210
+ "speech_speech_same_speaker"
211
+ ],
212
+ [
213
+ "../data/24k/musdb_a_test",
214
+ "../data/24k/LibriSpeech_dev-clean",
215
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
216
+ "singing_speech"
217
+ ]
218
+ ],
219
+ "valid_root_orpit": [
220
+ [
221
+ "../data/24k/musdb_a_test",
222
+ "../data/24k/musdb_a_test",
223
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
224
+ "sing_sing_diff"
225
+ ],
226
+ [
227
+ "../data/24k/musdb_a_test",
228
+ "../data/24k/musdb_a_test",
229
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
230
+ "sing_sing_unison"
231
+ ],
232
+ [
233
+ "../data/24k/musdb_a_test",
234
+ "../data/24k/musdb_a_test",
235
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
236
+ "sing_sing_same_singer"
237
+ ],
238
+ [
239
+ "../data/24k/LibriSpeech_dev-clean",
240
+ "../data/24k/LibriSpeech_dev-clean",
241
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
242
+ "speech_speech_diff"
243
+ ],
244
+ [
245
+ "../data/24k/LibriSpeech_dev-clean",
246
+ "../data/24k/LibriSpeech_dev-clean",
247
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
248
+ "speech_speech_unison"
249
+ ],
250
+ [
251
+ "../data/24k/LibriSpeech_dev-clean",
252
+ "../data/24k/LibriSpeech_dev-clean",
253
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
254
+ "speech_speech_same_speaker"
255
+ ],
256
+ [
257
+ "../data/24k/musdb_a_test",
258
+ "../data/24k/LibriSpeech_dev-clean",
259
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
260
+ "singing_speech"
261
+ ]
262
+ ],
263
+ "weight_decay": 1e-06,
264
+ "world_size": 1
265
+ },
266
+ "best_epoch": 267,
267
+ "best_loss": -9.572482517787389,
268
+ "epochs_trained": 267,
269
+ "num_bad_epochs": 0,
270
+ "train_loss_history": [
271
+ -1.3311041593551636,
272
+ -3.4447357654571533,
273
+ -4.284253120422363,
274
+ -4.726616382598877,
275
+ -5.099369049072266,
276
+ -5.331325054168701,
277
+ -5.553539752960205,
278
+ -5.740077018737793,
279
+ -5.918744087219238,
280
+ -6.005505561828613,
281
+ -6.201973915100098,
282
+ -6.26826286315918,
283
+ -6.3942413330078125,
284
+ -6.4803619384765625,
285
+ -6.592747688293457,
286
+ -6.6781134605407715,
287
+ -6.777161121368408,
288
+ -6.848526477813721,
289
+ -6.911881923675537,
290
+ -7.017796993255615,
291
+ -7.12304162979126,
292
+ -7.14536190032959,
293
+ -7.289445400238037,
294
+ -7.409412384033203,
295
+ -7.7652082443237305,
296
+ -7.837531089782715,
297
+ -7.850446701049805,
298
+ -7.941095352172852,
299
+ -7.939220428466797,
300
+ -8.047593116760254,
301
+ -8.07531452178955,
302
+ -8.134244918823242,
303
+ -8.143590927124023,
304
+ -8.190814018249512,
305
+ -8.217510223388672,
306
+ -8.175138473510742,
307
+ -7.989644527435303,
308
+ -8.09794807434082,
309
+ -8.24197006225586,
310
+ -8.232804298400879,
311
+ -8.328511238098145,
312
+ -8.389233589172363,
313
+ -8.267472267150879,
314
+ -8.301199913024902,
315
+ -8.36364459991455,
316
+ -8.43917465209961,
317
+ -8.493982315063477,
318
+ -8.481128692626953,
319
+ -8.429868698120117,
320
+ -8.501734733581543,
321
+ -8.54090404510498,
322
+ -8.568470001220703,
323
+ -8.50845718383789,
324
+ -8.597081184387207,
325
+ -8.513223648071289,
326
+ -8.38924503326416,
327
+ -8.502962112426758,
328
+ -8.518073081970215,
329
+ -8.56679916381836,
330
+ -8.698277473449707,
331
+ -8.630810737609863,
332
+ -8.755276679992676,
333
+ -8.700800895690918,
334
+ -8.74862003326416,
335
+ -8.734071731567383,
336
+ -8.633768081665039,
337
+ -8.633097648620605,
338
+ -8.872031211853027,
339
+ -8.828736305236816,
340
+ -8.753975868225098,
341
+ -8.886126518249512,
342
+ -8.758654594421387,
343
+ -8.883810997009277,
344
+ -8.952722549438477,
345
+ -8.945046424865723,
346
+ -8.907071113586426,
347
+ -8.891634941101074,
348
+ -8.91631031036377,
349
+ -8.951156616210938,
350
+ -8.931319236755371,
351
+ -8.960397720336914,
352
+ -8.841835975646973,
353
+ -8.834044456481934,
354
+ -8.786222457885742,
355
+ -8.903646469116211,
356
+ -8.947869300842285,
357
+ -8.696074485778809,
358
+ -8.99515438079834,
359
+ -9.005078315734863,
360
+ -8.934849739074707,
361
+ -8.99370002746582,
362
+ -9.030400276184082,
363
+ -9.101688385009766,
364
+ -9.08572006225586,
365
+ -9.075435638427734,
366
+ -9.125774383544922,
367
+ -9.102258682250977,
368
+ -9.160833358764648,
369
+ -8.999387741088867,
370
+ -8.929178237915039,
371
+ -9.085306167602539,
372
+ -9.149312019348145,
373
+ -9.201435089111328,
374
+ -9.119452476501465,
375
+ -9.192963600158691,
376
+ -9.153352737426758,
377
+ -9.16665267944336,
378
+ -9.187670707702637,
379
+ -9.213151931762695,
380
+ -9.295731544494629,
381
+ -9.204228401184082,
382
+ -9.2329683303833,
383
+ -9.198917388916016,
384
+ -9.242225646972656,
385
+ -9.251509666442871,
386
+ -9.233222007751465,
387
+ -9.235602378845215,
388
+ -9.264388084411621,
389
+ -9.286247253417969,
390
+ -9.287186622619629,
391
+ -9.327977180480957,
392
+ -9.304702758789062,
393
+ -9.34760570526123,
394
+ -9.314836502075195,
395
+ -9.300081253051758,
396
+ -9.20028018951416,
397
+ -9.35509967803955,
398
+ -9.345370292663574,
399
+ -9.36442756652832,
400
+ -9.351317405700684,
401
+ -9.352913856506348,
402
+ -9.388010025024414,
403
+ -9.326189994812012,
404
+ -9.411141395568848,
405
+ -9.424927711486816,
406
+ -9.376615524291992,
407
+ -9.394768714904785,
408
+ -9.382343292236328,
409
+ -9.345908164978027,
410
+ -9.387025833129883,
411
+ -9.397958755493164,
412
+ -9.370079040527344,
413
+ -9.419344902038574,
414
+ -9.414657592773438,
415
+ -9.450013160705566,
416
+ -9.424891471862793,
417
+ -9.468652725219727,
418
+ -9.437067031860352,
419
+ -9.452010154724121,
420
+ -9.476055145263672,
421
+ -9.454631805419922,
422
+ -9.519726753234863,
423
+ -9.494053840637207,
424
+ -9.349456787109375,
425
+ -9.444249153137207,
426
+ -9.432062149047852,
427
+ -9.469500541687012,
428
+ -9.506385803222656,
429
+ -9.541167259216309,
430
+ -9.514572143554688,
431
+ -9.517498016357422,
432
+ -9.508042335510254,
433
+ -9.524667739868164,
434
+ -9.513023376464844,
435
+ -9.518259048461914,
436
+ -9.491355895996094,
437
+ -9.527623176574707,
438
+ -9.503666877746582,
439
+ -9.575556755065918,
440
+ -9.51135540008545,
441
+ -9.574329376220703,
442
+ -9.559322357177734,
443
+ -9.576539993286133,
444
+ -9.587591171264648,
445
+ -9.615789413452148,
446
+ -9.586484909057617,
447
+ -9.597373008728027,
448
+ -9.565719604492188,
449
+ -9.580348014831543,
450
+ -9.544068336486816,
451
+ -9.576735496520996,
452
+ -9.617915153503418,
453
+ -9.634200096130371,
454
+ -9.50833511352539,
455
+ -9.633086204528809,
456
+ -9.622976303100586,
457
+ -9.628181457519531,
458
+ -9.385575294494629,
459
+ -9.312309265136719,
460
+ -8.996809005737305,
461
+ -9.591567993164062,
462
+ -9.602102279663086,
463
+ -9.606905937194824,
464
+ -9.660425186157227,
465
+ -9.59228229522705,
466
+ -9.66215991973877,
467
+ -9.652912139892578,
468
+ -9.683008193969727,
469
+ -9.550703048706055,
470
+ -9.616209983825684,
471
+ -9.262633323669434,
472
+ -9.222973823547363,
473
+ -9.16146469116211,
474
+ -9.264670372009277,
475
+ -9.241007804870605,
476
+ -9.89056396484375,
477
+ -9.639961242675781,
478
+ -9.945752143859863,
479
+ -10.058592796325684,
480
+ -9.94412899017334,
481
+ -9.352773666381836,
482
+ -9.2145357131958,
483
+ -9.298417091369629,
484
+ -9.264565467834473,
485
+ -9.239808082580566,
486
+ -9.254988670349121,
487
+ -9.295654296875,
488
+ -9.311361312866211,
489
+ -9.360262870788574,
490
+ -9.345294952392578,
491
+ -9.313271522521973,
492
+ -9.594743728637695,
493
+ -9.626945495605469,
494
+ -9.65655517578125,
495
+ -9.62312126159668,
496
+ -9.634873390197754,
497
+ -9.6712007522583,
498
+ -9.6635160446167,
499
+ -9.65237045288086,
500
+ -9.622418403625488,
501
+ -9.854077339172363,
502
+ -9.864712715148926,
503
+ -9.863157272338867,
504
+ -9.855356216430664,
505
+ -9.89089584350586,
506
+ -9.856369972229004,
507
+ -9.876996040344238,
508
+ -9.889692306518555,
509
+ -9.916558265686035,
510
+ -10.030950546264648,
511
+ -10.064481735229492,
512
+ -10.070244789123535,
513
+ -10.073690414428711,
514
+ -10.170597076416016,
515
+ -10.180663108825684,
516
+ -10.210295677185059,
517
+ -10.190850257873535,
518
+ -10.214118957519531,
519
+ -7.201298236846924,
520
+ -8.108235359191895,
521
+ -8.210838317871094,
522
+ -8.138957023620605,
523
+ -8.41263484954834,
524
+ -8.359173774719238,
525
+ -8.656364440917969,
526
+ -8.237541198730469,
527
+ -8.040386199951172,
528
+ -8.405668258666992,
529
+ -8.21469497680664,
530
+ -8.536149978637695,
531
+ -8.825751304626465,
532
+ -8.615899085998535,
533
+ -8.655414581298828,
534
+ -8.60315990447998,
535
+ -8.940108299255371,
536
+ -9.022863388061523,
537
+ -8.983457565307617
538
+ ],
539
+ "train_time_history": [
540
+ 4284.811353683472,
541
+ 4284.813168525696,
542
+ 4239.820109844208,
543
+ 4358.5235912799835,
544
+ 4358.525362968445,
545
+ 4289.520437240601,
546
+ 4289.5296330451965,
547
+ 4233.677313089371,
548
+ 4233.679363965988,
549
+ 4209.371140003204,
550
+ 4209.381086587906,
551
+ 4202.905996799469,
552
+ 4469.978202342987,
553
+ 4469.989181756973,
554
+ 4247.160337924957,
555
+ 4247.1704177856445,
556
+ 4190.890568256378,
557
+ 4190.900403022766,
558
+ 4185.636907577515,
559
+ 4185.647009372711,
560
+ 4180.687466144562,
561
+ 4215.30419754982,
562
+ 4215.314230442047,
563
+ 4206.753845453262,
564
+ 4206.76371717453,
565
+ 4206.280591726303,
566
+ 4206.290879011154,
567
+ 4222.331785202026,
568
+ 4222.341979503632,
569
+ 4220.95298576355,
570
+ 4220.962949752808,
571
+ 4199.66743016243,
572
+ 4199.67768073082,
573
+ 4200.696933507919,
574
+ 4200.706924915314,
575
+ 4200.071183204651,
576
+ 4200.073669195175,
577
+ 4201.461757183075,
578
+ 4201.47197842598,
579
+ 4212.675180196762,
580
+ 4212.685215473175,
581
+ 4266.539958238602,
582
+ 4266.55042219162,
583
+ 4254.028660058975,
584
+ 4254.030869007111,
585
+ 4586.545968532562,
586
+ 4586.556686401367,
587
+ 4486.801070451736,
588
+ 4486.811651468277,
589
+ 4201.306690454483,
590
+ 4201.308066606522,
591
+ 4204.077554225922,
592
+ 4204.087781906128,
593
+ 4194.944247722626,
594
+ 4194.954358577728,
595
+ 4193.961704969406,
596
+ 4222.590797185898,
597
+ 4222.594073057175,
598
+ 4221.6570999622345,
599
+ 4221.666466474533,
600
+ 4221.045345544815,
601
+ 4221.055670261383,
602
+ 4214.11606669426,
603
+ 4214.125596284866,
604
+ 4479.404296398163,
605
+ 4479.414994955063,
606
+ 4262.62514591217,
607
+ 4262.635618209839,
608
+ 4214.268101215363,
609
+ 4214.2785403728485,
610
+ 4218.142910718918,
611
+ 4218.15364408493,
612
+ 4215.917347192764,
613
+ 4215.927803516388,
614
+ 4218.397645950317,
615
+ 4218.408536672592,
616
+ 4233.58446598053,
617
+ 4233.59490442276,
618
+ 4318.161808013916,
619
+ 4318.171140432358,
620
+ 4237.026048898697,
621
+ 4237.036669015884,
622
+ 4220.925004482269,
623
+ 4220.9352016448975,
624
+ 4226.221168041229,
625
+ 4223.1825070381165,
626
+ 4223.192782878876,
627
+ 4219.102268218994,
628
+ 4219.113127231598,
629
+ 4216.297616004944,
630
+ 4216.308108329773,
631
+ 4217.926244974136,
632
+ 4217.937202453613,
633
+ 4426.571401119232,
634
+ 4426.573066711426,
635
+ 4612.790915489197,
636
+ 4612.801674365997,
637
+ 4719.1595368385315,
638
+ 4719.169989824295,
639
+ 4305.255445480347,
640
+ 4305.266388177872,
641
+ 4221.674624681473,
642
+ 4221.686189174652,
643
+ 4229.138904571533,
644
+ 4178.568962574005,
645
+ 4178.5717051029205,
646
+ 4178.647545337677,
647
+ 4178.650447130203,
648
+ 4169.984578132629,
649
+ 4169.995152950287,
650
+ 4173.8019506931305,
651
+ 4173.804402589798,
652
+ 4179.692799806595,
653
+ 4179.695784330368,
654
+ 4176.926806688309,
655
+ 4176.937863111496,
656
+ 4189.7040383815765,
657
+ 4189.7144474983215,
658
+ 4194.854960680008,
659
+ 4194.8661851882935,
660
+ 4488.314256668091,
661
+ 4488.324142932892,
662
+ 4301.72206735611,
663
+ 4301.732882022858,
664
+ 4203.297667264938,
665
+ 4203.307426214218,
666
+ 4212.263510465622,
667
+ 4212.2729642391205,
668
+ 4202.838434457779,
669
+ 4202.8495717048645,
670
+ 4206.559844255447,
671
+ 4206.570970535278,
672
+ 4202.594026565552,
673
+ 4202.6052367687225,
674
+ 4204.671685695648,
675
+ 4204.675058603287,
676
+ 4201.653420209885,
677
+ 4201.664590358734,
678
+ 4203.356340646744,
679
+ 4203.3675968647,
680
+ 4226.834460258484,
681
+ 4226.84539103508,
682
+ 4432.4133422374725,
683
+ 4432.424476385117,
684
+ 4194.520195245743,
685
+ 4194.531393289566,
686
+ 4185.361557483673,
687
+ 4185.372809171677,
688
+ 4178.024575471878,
689
+ 4178.035531282425,
690
+ 4183.264570951462,
691
+ 4183.275583267212,
692
+ 4178.5521404743195,
693
+ 4178.563311338425,
694
+ 4178.228582620621,
695
+ 4178.238200426102,
696
+ 4181.432615280151,
697
+ 4181.443482160568,
698
+ 4181.636572599411,
699
+ 4181.647958517075,
700
+ 4180.119422197342,
701
+ 4180.130319356918,
702
+ 4181.348428249359,
703
+ 4181.3601496219635,
704
+ 4182.4969573020935,
705
+ 4182.508371829987,
706
+ 4255.815136909485,
707
+ 4255.824706077576,
708
+ 4447.2853989601135,
709
+ 4447.294949054718,
710
+ 4375.476977586746,
711
+ 4375.488611936569,
712
+ 4216.147409915924,
713
+ 4216.157112836838,
714
+ 4184.855574131012,
715
+ 4184.867551803589,
716
+ 4182.2731301784515,
717
+ 4182.284587860107,
718
+ 4182.427225112915,
719
+ 4182.438867807388,
720
+ 4181.939938545227,
721
+ 4181.951656103134,
722
+ 4183.5050485134125,
723
+ 4183.516293287277,
724
+ 4180.313590764999,
725
+ 4180.325238704681,
726
+ 4184.185824394226,
727
+ 4184.196978807449,
728
+ 4175.860624790192,
729
+ 4175.8725233078,
730
+ 4174.206290960312,
731
+ 4174.217987298965,
732
+ 4225.280811309814,
733
+ 4346.787808179855,
734
+ 4346.791662693024,
735
+ 4299.203949213028,
736
+ 4334.719336986542,
737
+ 4334.72660279274,
738
+ 4307.453342437744,
739
+ 4307.463569164276,
740
+ 4243.263749361038,
741
+ 4243.27504825592,
742
+ 4232.403777837753,
743
+ 4232.415019750595,
744
+ 4234.023860692978,
745
+ 4234.036010503769,
746
+ 4232.419568777084,
747
+ 4232.430717229843,
748
+ 4228.692707538605,
749
+ 4228.695293188095,
750
+ 4235.275017976761,
751
+ 4235.286781549454,
752
+ 4231.93186712265,
753
+ 4231.934266328812,
754
+ 4237.727004766464,
755
+ 4237.736963748932,
756
+ 4448.2472088336945,
757
+ 4448.257912635803,
758
+ 4283.024597644806,
759
+ 4283.03609752655,
760
+ 4270.3121337890625,
761
+ 4270.324274778366,
762
+ 4244.299434423447,
763
+ 4244.311620950699,
764
+ 4363.46278834343,
765
+ 4180.62579703331,
766
+ 4180.635629653931,
767
+ 4363.069185256958,
768
+ 4220.090236663818,
769
+ 4220.102267503738,
770
+ 4190.208593130112,
771
+ 4190.220735549927,
772
+ 4181.494255304337,
773
+ 4181.50580906868,
774
+ 4186.210835933685,
775
+ 4186.214511394501,
776
+ 4188.612834215164,
777
+ 4188.625131607056,
778
+ 4182.178534984589,
779
+ 4182.189949512482,
780
+ 4183.857384443283,
781
+ 4183.869287014008,
782
+ 4183.761756181717,
783
+ 4241.330404281616,
784
+ 4241.341110467911,
785
+ 4207.978038311005,
786
+ 4207.990997314453,
787
+ 4209.410867214203,
788
+ 4209.421168088913,
789
+ 4207.717931270599,
790
+ 4207.730401754379,
791
+ 4204.301562309265,
792
+ 4204.313354253769,
793
+ 4297.861345052719,
794
+ 4297.873908042908,
795
+ 4282.807532548904,
796
+ 4282.820100307465,
797
+ 4269.668355226517,
798
+ 4269.680841684341,
799
+ 4198.918546676636,
800
+ 4198.928604364395,
801
+ 4239.654682636261,
802
+ 4239.659080028534,
803
+ 4419.87956905365,
804
+ 4419.889652013779,
805
+ 4302.591921806335,
806
+ 4302.60400891304,
807
+ 4199.097110033035,
808
+ 4199.109765052795,
809
+ 4202.586899995804,
810
+ 4202.596865415573,
811
+ 4223.580963373184,
812
+ 4236.571214199066,
813
+ 4236.583789110184,
814
+ 4266.631365537643,
815
+ 4266.643340587616,
816
+ 4206.533836603165,
817
+ 4206.543870687485,
818
+ 4196.797498226166,
819
+ 4196.809820890427,
820
+ 4202.778592824936,
821
+ 4202.791028261185,
822
+ 4200.911655426025,
823
+ 4200.922192811966,
824
+ 4218.757748126984,
825
+ 4218.7700316905975,
826
+ 4197.834621667862,
827
+ 4197.8472237586975,
828
+ 4194.553659200668,
829
+ 4194.558137655258,
830
+ 4210.2872478961945,
831
+ 4210.291656970978,
832
+ 4269.952535390854,
833
+ 4269.963551998138,
834
+ 4214.965420722961,
835
+ 4214.9777710437775,
836
+ 4268.254637956619,
837
+ 4268.267082452774,
838
+ 4188.457591295242,
839
+ 4188.467690706253,
840
+ 4188.935349225998,
841
+ 4188.947833776474,
842
+ 4192.73951125145,
843
+ 4192.749709367752,
844
+ 4188.534428119659,
845
+ 4188.53829908371,
846
+ 4196.497691392899,
847
+ 4196.510225534439,
848
+ 4318.416720151901,
849
+ 4318.4267864227295,
850
+ 4209.298709154129,
851
+ 4204.6052923202515,
852
+ 4204.609621763229,
853
+ 4192.598699092865,
854
+ 4192.6110072135925,
855
+ 4264.5488522052765,
856
+ 4264.562687158585,
857
+ 4342.3707575798035,
858
+ 4342.3756980896,
859
+ 4299.415410995483,
860
+ 4299.425767421722,
861
+ 4285.986501693726,
862
+ 4285.999414205551,
863
+ 4251.881839513779,
864
+ 4251.89198923111,
865
+ 4217.251371145248,
866
+ 4217.262971401215,
867
+ 4265.004074335098,
868
+ 4265.016601800919,
869
+ 4422.643936634064,
870
+ 4453.576984167099,
871
+ 4453.588968753815,
872
+ 4183.795456409454,
873
+ 4183.80871462822,
874
+ 4183.177849292755,
875
+ 4183.1909646987915,
876
+ 4190.727601289749,
877
+ 4190.740168809891,
878
+ 4185.585786104202,
879
+ 4185.596675872803,
880
+ 4186.326423406601,
881
+ 4186.3365132808685,
882
+ 4188.701127767563,
883
+ 4188.713495969772,
884
+ 4183.693524837494,
885
+ 4183.706875085831,
886
+ 4182.603164672852,
887
+ 4182.169225692749,
888
+ 4182.182250261307,
889
+ 4183.1377918720245,
890
+ 4183.142628669739,
891
+ 4179.616315603256,
892
+ 4179.626562833786,
893
+ 4304.994537830353,
894
+ 4305.007478475571,
895
+ 4361.554908275604,
896
+ 4361.56044960022,
897
+ 4368.104673624039,
898
+ 4368.11031460762,
899
+ 4246.525162935257,
900
+ 4246.5380046367645,
901
+ 4183.925352096558,
902
+ 4232.265904188156,
903
+ 4232.277180671692,
904
+ 4238.892568349838,
905
+ 4238.905729055405,
906
+ 4187.827491521835,
907
+ 4187.84108877182,
908
+ 4190.126079082489,
909
+ 4190.13965845108,
910
+ 4190.435103654861,
911
+ 4190.440406799316,
912
+ 4191.884477853775,
913
+ 4191.897578239441,
914
+ 4187.4977107048035,
915
+ 4172.838095903397,
916
+ 4172.843760967255,
917
+ 4177.684302330017,
918
+ 4177.6969130039215,
919
+ 4172.654875993729,
920
+ 4172.667930603027,
921
+ 4174.483522415161,
922
+ 4174.496375083923,
923
+ 4166.372047901154,
924
+ 4166.384793281555,
925
+ 4283.736061811447,
926
+ 4257.7525935173035,
927
+ 4257.7630007267,
928
+ 4203.545964479446,
929
+ 4203.558753013611,
930
+ 4198.144237518311,
931
+ 4198.157437801361,
932
+ 4194.472889661789,
933
+ 4194.487104177475,
934
+ 4197.728852272034,
935
+ 4197.739155769348,
936
+ 4202.638717889786,
937
+ 4179.445859909058,
938
+ 4179.456418514252,
939
+ 4170.633600950241,
940
+ 4170.638606786728,
941
+ 4173.595223903656,
942
+ 4345.430767297745,
943
+ 4345.4413626194,
944
+ 4403.088153839111,
945
+ 4403.099495649338,
946
+ 4243.333677768707,
947
+ 4243.347104310989,
948
+ 4341.46756529808,
949
+ 4341.480928659439,
950
+ 4317.847608089447,
951
+ 4317.858085870743,
952
+ 4196.552426815033,
953
+ 4196.5675711631775,
954
+ 4192.795216798782,
955
+ 4192.80850481987,
956
+ 4247.769198179245,
957
+ 4247.783056497574,
958
+ 4450.5884919166565,
959
+ 4450.602509021759,
960
+ 4386.362091779709,
961
+ 4386.375445127487,
962
+ 4194.8344893455505,
963
+ 4194.847893476486,
964
+ 4365.374780893326,
965
+ 4365.388372182846,
966
+ 4594.672197341919,
967
+ 4594.6835501194,
968
+ 4450.229032039642,
969
+ 4450.242944955826,
970
+ 4768.1948499679565,
971
+ 4768.208532333374,
972
+ 4320.927686691284,
973
+ 4320.932461023331,
974
+ 4389.044877767563,
975
+ 4389.060523271561,
976
+ 4506.545570850372,
977
+ 4506.56330370903,
978
+ 4187.451607465744,
979
+ 4492.475999116898,
980
+ 4492.487664937973,
981
+ 4207.333253145218,
982
+ 4207.347226142883,
983
+ 4454.522627592087,
984
+ 4417.526381015778,
985
+ 4195.074825525284,
986
+ 4195.089487314224,
987
+ 4224.457670927048,
988
+ 4224.472229957581,
989
+ 4764.19175863266,
990
+ 4764.202345132828,
991
+ 4315.793431043625,
992
+ 4315.799200534821,
993
+ 4265.365842103958,
994
+ 4252.945762634277,
995
+ 4478.979041814804,
996
+ 4478.992881536484,
997
+ 4318.3227870464325,
998
+ 4318.337471246719,
999
+ 4319.654689788818,
1000
+ 4319.666926622391,
1001
+ 4320.253043174744,
1002
+ 4320.26652598381,
1003
+ 4316.997335195541,
1004
+ 4317.007848501205,
1005
+ 4317.8134751319885,
1006
+ 4317.827590227127,
1007
+ 4315.411971092224,
1008
+ 4315.42355298996,
1009
+ 4325.969897270203,
1010
+ 4325.972640752792,
1011
+ 4311.006960868835,
1012
+ 4311.017538309097,
1013
+ 4324.960598230362,
1014
+ 3680.7179527282715,
1015
+ 3680.7326424121857,
1016
+ 3721.3555817604065,
1017
+ 3721.3586716651917,
1018
+ 3818.7410044670105,
1019
+ 3818.744511604309,
1020
+ 3689.685672521591,
1021
+ 3689.7003977298737,
1022
+ 3688.9338262081146,
1023
+ 3688.9487912654877,
1024
+ 3734.376760005951,
1025
+ 3734.3914697170258,
1026
+ 3721.3628540039062,
1027
+ 3721.37668967247,
1028
+ 3655.3936855793,
1029
+ 3655.4077792167664,
1030
+ 3610.097437620163,
1031
+ 3610.111466407776,
1032
+ 3715.0868566036224,
1033
+ 3715.099429130554,
1034
+ 3636.3001956939697,
1035
+ 3636.3086059093475,
1036
+ 3668.0241372585297,
1037
+ 3668.034808397293,
1038
+ 3659.740085363388,
1039
+ 3659.7512934207916,
1040
+ 3611.7954156398773,
1041
+ 3611.810293197632,
1042
+ 3611.7872862815857,
1043
+ 3611.802482843399,
1044
+ 3612.5097110271454,
1045
+ 3612.520439386368,
1046
+ 3609.9256060123444,
1047
+ 3609.9406599998474,
1048
+ 3615.199702978134,
1049
+ 3615.213776111603,
1050
+ 3614.617516040802
1051
+ ],
1052
+ "valid_loss_history": [
1053
+ -2.2420080729893277,
1054
+ -3.6040473665509904,
1055
+ -4.652349131447928,
1056
+ -5.269411563873291,
1057
+ -5.602223873138428,
1058
+ -5.948959009987967,
1059
+ -6.180064678192139,
1060
+ -6.373329707554409,
1061
+ -6.4635710035051614,
1062
+ -6.628378936222622,
1063
+ -6.765629632132394,
1064
+ -6.878908634185791,
1065
+ -6.975889819008963,
1066
+ -7.089849744524274,
1067
+ -7.137168339320591,
1068
+ -7.214839458465576,
1069
+ -7.248862539018903,
1070
+ -7.323270389011928,
1071
+ -7.374068532671247,
1072
+ -7.447478975568499,
1073
+ -7.470496041434152,
1074
+ -7.578763212476458,
1075
+ -7.638515608651297,
1076
+ -7.603791032518659,
1077
+ -7.658165522984096,
1078
+ -7.660087721688407,
1079
+ -7.711926255907331,
1080
+ -7.763034411839077,
1081
+ -7.80566440309797,
1082
+ -7.829599516732352,
1083
+ -7.908110482352121,
1084
+ -7.871029717581613,
1085
+ -7.790640013558524,
1086
+ -7.807113443102155,
1087
+ -7.826304980686733,
1088
+ -7.77531235558646,
1089
+ -7.879563399723598,
1090
+ -7.897988796234131,
1091
+ -7.845814909253802,
1092
+ -7.848473821367536,
1093
+ -7.912371976034982,
1094
+ -7.943405968802316,
1095
+ -8.085525648934501,
1096
+ -8.010899543762207,
1097
+ -8.028815746307373,
1098
+ -8.061845302581787,
1099
+ -8.02747140611921,
1100
+ -8.03413268498012,
1101
+ -8.033596924373082,
1102
+ -8.068816934313093,
1103
+ -8.067536762782506,
1104
+ -8.144167695726667,
1105
+ -8.148260184696742,
1106
+ -8.180625711168561,
1107
+ -8.180845873696464,
1108
+ -8.25086770738874,
1109
+ -8.261961323874337,
1110
+ -8.260808059147426,
1111
+ -8.186679295131139,
1112
+ -8.165157794952393,
1113
+ -8.194125039236885,
1114
+ -8.254536492483956,
1115
+ -8.292360033307757,
1116
+ -8.267435346330915,
1117
+ -8.27747208731515,
1118
+ -8.366285255977086,
1119
+ -8.354675361088344,
1120
+ -8.365063190460205,
1121
+ -8.427791595458984,
1122
+ -8.452910355159215,
1123
+ -8.395057133265905,
1124
+ -8.455147879464286,
1125
+ -8.485073634556361,
1126
+ -8.504877976008824,
1127
+ -8.502339363098145,
1128
+ -8.485261576516288,
1129
+ -8.50761045728411,
1130
+ -8.482435567038399,
1131
+ -8.516456604003906,
1132
+ -8.503895146506173,
1133
+ -8.515655858176094,
1134
+ -8.574515002114433,
1135
+ -8.580681255885533,
1136
+ -8.593669959477015,
1137
+ -8.538264206477574,
1138
+ -8.570460319519043,
1139
+ -8.610838617597308,
1140
+ -8.576563426426478,
1141
+ -8.631826945713588,
1142
+ -8.593990189688546,
1143
+ -8.584804126194545,
1144
+ -8.616937228611537,
1145
+ -8.616405078342982,
1146
+ -8.636415685926165,
1147
+ -8.736162253788539,
1148
+ -8.684600080762591,
1149
+ -8.751097747257777,
1150
+ -8.744481086730957,
1151
+ -8.760670593806676,
1152
+ -8.81410721370152,
1153
+ -8.762031418936592,
1154
+ -8.731195313589913,
1155
+ -8.680067879813057,
1156
+ -8.73148284639631,
1157
+ -8.770104340144567,
1158
+ -8.83363403592791,
1159
+ -8.797364848000663,
1160
+ -8.756126131330218,
1161
+ -8.717773846217565,
1162
+ -8.755549158368792,
1163
+ -8.798967293330602,
1164
+ -8.80781262261527,
1165
+ -8.879967212677002,
1166
+ -8.83057907649449,
1167
+ -8.910664354051862,
1168
+ -8.930669920785087,
1169
+ -8.850233895438057,
1170
+ -8.87684679031372,
1171
+ -8.860790797642299,
1172
+ -8.854635306767054,
1173
+ -8.871529306684222,
1174
+ -8.870055334908622,
1175
+ -8.814562388828822,
1176
+ -8.895111628941127,
1177
+ -8.95235286440168,
1178
+ -8.978583880833217,
1179
+ -8.970093931470599,
1180
+ -8.94366032736642,
1181
+ -8.930564199175153,
1182
+ -8.896938255855016,
1183
+ -9.003027439117432,
1184
+ -8.967686380658831,
1185
+ -8.945790427071708,
1186
+ -8.978134904588972,
1187
+ -8.926983833312988,
1188
+ -8.911829403468541,
1189
+ -9.004649843488421,
1190
+ -8.982011726924352,
1191
+ -9.004248074122838,
1192
+ -9.022075244358607,
1193
+ -9.055972508021764,
1194
+ -9.095445496695381,
1195
+ -9.014348983764648,
1196
+ -9.017100266047887,
1197
+ -9.06740631375994,
1198
+ -9.062205382755824,
1199
+ -9.006571020398821,
1200
+ -9.060756206512451,
1201
+ -9.114073821476527,
1202
+ -9.12088053567069,
1203
+ -9.146572181156703,
1204
+ -9.129499162946429,
1205
+ -9.162499564034599,
1206
+ -9.146372726985387,
1207
+ -9.138916151864189,
1208
+ -9.140360014779228,
1209
+ -9.14337342126029,
1210
+ -9.13001537322998,
1211
+ -9.089552674974714,
1212
+ -9.172866821289062,
1213
+ -9.200943265642438,
1214
+ -9.191112245832171,
1215
+ -9.207633904048375,
1216
+ -9.147029059273857,
1217
+ -9.17673145021711,
1218
+ -9.129148755754743,
1219
+ -9.157607623508998,
1220
+ -9.13064786366054,
1221
+ -9.154420512063163,
1222
+ -9.181631565093994,
1223
+ -9.155359063829694,
1224
+ -9.158296721322197,
1225
+ -9.156671251569476,
1226
+ -9.154706001281738,
1227
+ -9.167226382664271,
1228
+ -9.163607052394322,
1229
+ -9.209595475878034,
1230
+ -9.310745784214564,
1231
+ -9.238739694867816,
1232
+ -9.288273334503174,
1233
+ -9.2847033228193,
1234
+ -9.313508306230817,
1235
+ -9.334877354758126,
1236
+ -9.270281859806605,
1237
+ -9.189015797206334,
1238
+ -9.247245516095843,
1239
+ -9.272651195526123,
1240
+ -9.430454867226738,
1241
+ -9.431772300175258,
1242
+ -9.406911509377617,
1243
+ -9.434791496821813,
1244
+ -9.40122835976737,
1245
+ -9.331563881465367,
1246
+ -9.266850130898613,
1247
+ -9.263189588274274,
1248
+ -9.341036796569824,
1249
+ -9.302794524601527,
1250
+ -9.364838123321533,
1251
+ -9.468104021889824,
1252
+ -9.427109173366002,
1253
+ -9.488504341670446,
1254
+ -9.461405617850167,
1255
+ -9.434092794145856,
1256
+ -9.448193890707833,
1257
+ -9.491405623299736,
1258
+ -9.586788518088204,
1259
+ -9.494200706481934,
1260
+ -9.47681747164045,
1261
+ -9.457686015537806,
1262
+ -9.591959748949323,
1263
+ -9.581428391592842,
1264
+ -9.579002380371094,
1265
+ -9.538570063454765,
1266
+ -9.59873376573835,
1267
+ -9.606725556509835,
1268
+ -9.610026700156075,
1269
+ -9.668677466256279,
1270
+ -9.631781101226807,
1271
+ -9.603316238948278,
1272
+ -9.663758277893066,
1273
+ -9.63963794708252,
1274
+ -9.662949085235596,
1275
+ -9.706490448543004,
1276
+ -9.720975807734899,
1277
+ -9.734819480351039,
1278
+ -9.786265100751605,
1279
+ -9.737053121839251,
1280
+ -9.700168677738734,
1281
+ -9.778143337794713,
1282
+ -9.780944415501185,
1283
+ -9.77445820399693,
1284
+ -9.772279262542725,
1285
+ -9.786255019051689,
1286
+ -9.787315436771937,
1287
+ -9.809428351266044,
1288
+ -9.77673625946045,
1289
+ -9.78390223639352,
1290
+ -9.798577308654785,
1291
+ -9.79927212851388,
1292
+ -9.762826034000941,
1293
+ -9.76457827431815,
1294
+ -9.798729487827845,
1295
+ -9.776831013815743,
1296
+ -9.773336342402867,
1297
+ -9.794628483908516,
1298
+ -9.795281887054443,
1299
+ -9.76816953931536,
1300
+ -9.776653221675328,
1301
+ -7.936585630689349,
1302
+ -9.18690013885498,
1303
+ -9.344774450574603,
1304
+ -9.366782733372279,
1305
+ -9.370089326586042,
1306
+ -9.374211038861956,
1307
+ -9.36532722200666,
1308
+ -9.383607932499476,
1309
+ -9.390820026397705,
1310
+ -9.426494870867048,
1311
+ -9.435582705906459,
1312
+ -9.458767277853829,
1313
+ -9.49565941946847,
1314
+ -9.505523000444684,
1315
+ -9.5315888268607,
1316
+ -9.545233454023089,
1317
+ -9.539818559374128,
1318
+ -9.561526230403356,
1319
+ -9.572482517787389
1320
+ ]
1321
+ }
MedleyVox-MultiSinger/singing_librispeech_iSRNet/loss_graph_vocals.png ADDED
MedleyVox-MultiSinger/singing_librispeech_iSRNet/vocals.json ADDED
@@ -0,0 +1,1180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "args": {
3
+ "above_freq": 3000.0,
4
+ "architecture": "conv_tasnet_stft",
5
+ "batch_size": 5,
6
+ "beta1": 0.5,
7
+ "beta2": 0.9,
8
+ "bn_chan": 256,
9
+ "continual_train": true,
10
+ "dataset": "singing_librispeech",
11
+ "db_normalize": false,
12
+ "ema": true,
13
+ "encoder_activation": null,
14
+ "entity": "carson2050",
15
+ "epochs": 230,
16
+ "eps": 1e-08,
17
+ "exp_name": "singing_librispeech_iSRNet",
18
+ "ff_activation": "relu",
19
+ "gpu": 0,
20
+ "gradient_clip": 5.0,
21
+ "hid_chan": 1024,
22
+ "load_ema_online_model": false,
23
+ "lr": 3e-05,
24
+ "lr_decay_gamma": 0.5,
25
+ "lr_decay_patience": 6,
26
+ "lr_scheduler": "step_lr",
27
+ "mask_act": "linear",
28
+ "max_n_src": 4,
29
+ "min_n_src": 2,
30
+ "mixed_precision": false,
31
+ "mixture_consistency": "sfsrnet",
32
+ "multi_spec_loss_log_scale": false,
33
+ "n_blocks": 6,
34
+ "n_filter": 512,
35
+ "n_kernel": 512,
36
+ "n_nodes": 1,
37
+ "n_repeats": 4,
38
+ "n_src": 2,
39
+ "nb_workers": 10,
40
+ "nfft": 2048,
41
+ "ngpus_per_node": 1,
42
+ "nhop": 512,
43
+ "no_cuda": false,
44
+ "no_mask": false,
45
+ "no_mask_residual": false,
46
+ "optimizer": "adam",
47
+ "output": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep/checkpoint/singing_librispeech_iSRNet",
48
+ "output_directory": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep",
49
+ "part_of_data": null,
50
+ "patience": 15,
51
+ "pitch_formant_augment_prob": 0.4,
52
+ "port": null,
53
+ "project": "MedleyVox_home",
54
+ "quiet": false,
55
+ "rank": 0,
56
+ "reduced_training_data_ratio": 0.1,
57
+ "resume": "results/singing_sep/checkpoint/singing_librispeech_iSRNet",
58
+ "same_singer_dict_path": [
59
+ [
60
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/OpenSinger",
61
+ "./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
62
+ "OpenSinger"
63
+ ],
64
+ [
65
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multisinger",
66
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
67
+ "k_multisinger"
68
+ ],
69
+ [
70
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/CSD",
71
+ "./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
72
+ "CSD"
73
+ ],
74
+ [
75
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jsut-song_ver1",
76
+ "./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
77
+ "jsut-song_ver1"
78
+ ],
79
+ [
80
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jvs_music_ver1",
81
+ "./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
82
+ "jvs_music_ver1"
83
+ ],
84
+ [
85
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multitimbre",
86
+ "./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
87
+ "k_multitimbre"
88
+ ],
89
+ [
90
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/kiritan_revised",
91
+ "./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
92
+ "kiritan"
93
+ ],
94
+ [
95
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_train",
96
+ "./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
97
+ "musdb_a_train"
98
+ ],
99
+ [
100
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/NUS",
101
+ "./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
102
+ "NUS"
103
+ ],
104
+ [
105
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/VocalSet",
106
+ "./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
107
+ "VocalSet"
108
+ ]
109
+ ],
110
+ "same_singer_ratio": 0.2,
111
+ "same_song_dict_path": [
112
+ [
113
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multisinger",
114
+ "./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
115
+ "k_multisinger"
116
+ ]
117
+ ],
118
+ "same_song_ratio": 0.2,
119
+ "same_speaker_dict_path": [
120
+ [
121
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-100",
122
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
123
+ "LibriSpeech_train-clean-100"
124
+ ],
125
+ [
126
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-360",
127
+ "./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
128
+ "LibriSpeech_train-clean-360"
129
+ ]
130
+ ],
131
+ "same_speaker_ratio": 0.15,
132
+ "sample_rate": 24000,
133
+ "seed": 777,
134
+ "seq_dur": 3.0,
135
+ "sing_sing_ratio": 0.15,
136
+ "sing_speech_ratio": 0.15,
137
+ "skip_chan": 256,
138
+ "song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
139
+ "speech_train_root": [
140
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-360",
141
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-100"
142
+ ],
143
+ "sr_input_res": false,
144
+ "sr_out_mix_consistency": false,
145
+ "srnet": "convnext",
146
+ "start_from_best": true,
147
+ "sweep": false,
148
+ "target": "vocals",
149
+ "train_loss_func": [
150
+ "pit_snr",
151
+ "multi_spectral_l1",
152
+ "snr"
153
+ ],
154
+ "train_root": [
155
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/CSD",
156
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/NUS",
157
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/TONAS",
158
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/VocalSet",
159
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jsut-song_ver1",
160
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jvs_music_ver1",
161
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/kiritan_revised",
162
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/vocadito",
163
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_train",
164
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/OpenSinger",
165
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/medleyDB_v1_in_musdb",
166
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multisinger",
167
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multitimbre"
168
+ ],
169
+ "unison_prob": 0.3,
170
+ "use_wandb": true,
171
+ "valid_loss_func": [
172
+ "pit_si_sdr"
173
+ ],
174
+ "valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
175
+ "valid_root": [
176
+ [
177
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
178
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
179
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
180
+ "sing_sing_diff"
181
+ ],
182
+ [
183
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
184
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
185
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
186
+ "sing_sing_unison"
187
+ ],
188
+ [
189
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
190
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
191
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
192
+ "sing_sing_same_singer"
193
+ ],
194
+ [
195
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
196
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
197
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
198
+ "speech_speech_diff"
199
+ ],
200
+ [
201
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
202
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
203
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
204
+ "speech_speech_unison"
205
+ ],
206
+ [
207
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
208
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
209
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
210
+ "speech_speech_same_speaker"
211
+ ],
212
+ [
213
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
214
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
215
+ "./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
216
+ "singing_speech"
217
+ ]
218
+ ],
219
+ "valid_root_orpit": [
220
+ [
221
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
222
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
223
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
224
+ "sing_sing_diff"
225
+ ],
226
+ [
227
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
228
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
229
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
230
+ "sing_sing_unison"
231
+ ],
232
+ [
233
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
234
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
235
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
236
+ "sing_sing_same_singer"
237
+ ],
238
+ [
239
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
240
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
241
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
242
+ "speech_speech_diff"
243
+ ],
244
+ [
245
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
246
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
247
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
248
+ "speech_speech_unison"
249
+ ],
250
+ [
251
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
252
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
253
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
254
+ "speech_speech_same_speaker"
255
+ ],
256
+ [
257
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
258
+ "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
259
+ "./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
260
+ "singing_speech"
261
+ ]
262
+ ],
263
+ "weight_decay": 1e-06,
264
+ "world_size": 1
265
+ },
266
+ "best_epoch": 216,
267
+ "best_loss": -9.209092957632881,
268
+ "epochs_trained": 230,
269
+ "num_bad_epochs": 14,
270
+ "train_loss_history": [
271
+ -1.3311041593551636,
272
+ -3.4447357654571533,
273
+ -4.284253120422363,
274
+ -4.726616382598877,
275
+ -5.099369049072266,
276
+ -5.331325054168701,
277
+ -5.553539752960205,
278
+ -5.740077018737793,
279
+ -5.918744087219238,
280
+ -6.005505561828613,
281
+ -6.201973915100098,
282
+ -6.26826286315918,
283
+ -6.3942413330078125,
284
+ -6.4803619384765625,
285
+ -6.592747688293457,
286
+ -6.6781134605407715,
287
+ -6.777161121368408,
288
+ -6.848526477813721,
289
+ -6.911881923675537,
290
+ -7.017796993255615,
291
+ -7.12304162979126,
292
+ -7.14536190032959,
293
+ -7.289445400238037,
294
+ -7.409412384033203,
295
+ -7.7652082443237305,
296
+ -7.837531089782715,
297
+ -7.850446701049805,
298
+ -7.941095352172852,
299
+ -7.939220428466797,
300
+ -8.047593116760254,
301
+ -8.07531452178955,
302
+ -8.134244918823242,
303
+ -8.143590927124023,
304
+ -8.190814018249512,
305
+ -8.217510223388672,
306
+ -8.175138473510742,
307
+ -7.989644527435303,
308
+ -8.09794807434082,
309
+ -8.24197006225586,
310
+ -8.232804298400879,
311
+ -8.328511238098145,
312
+ -8.389233589172363,
313
+ -8.267472267150879,
314
+ -8.301199913024902,
315
+ -8.36364459991455,
316
+ -8.43917465209961,
317
+ -8.493982315063477,
318
+ -8.481128692626953,
319
+ -8.429868698120117,
320
+ -8.501734733581543,
321
+ -8.54090404510498,
322
+ -8.568470001220703,
323
+ -8.50845718383789,
324
+ -8.597081184387207,
325
+ -8.513223648071289,
326
+ -8.38924503326416,
327
+ -8.502962112426758,
328
+ -8.518073081970215,
329
+ -8.56679916381836,
330
+ -8.698277473449707,
331
+ -8.630810737609863,
332
+ -8.755276679992676,
333
+ -8.700800895690918,
334
+ -8.74862003326416,
335
+ -8.734071731567383,
336
+ -8.633768081665039,
337
+ -8.633097648620605,
338
+ -8.872031211853027,
339
+ -8.828736305236816,
340
+ -8.753975868225098,
341
+ -8.886126518249512,
342
+ -8.758654594421387,
343
+ -8.883810997009277,
344
+ -8.952722549438477,
345
+ -8.945046424865723,
346
+ -8.907071113586426,
347
+ -8.891634941101074,
348
+ -8.91631031036377,
349
+ -8.951156616210938,
350
+ -8.931319236755371,
351
+ -8.960397720336914,
352
+ -8.841835975646973,
353
+ -8.834044456481934,
354
+ -8.786222457885742,
355
+ -8.903646469116211,
356
+ -8.947869300842285,
357
+ -8.696074485778809,
358
+ -8.99515438079834,
359
+ -9.005078315734863,
360
+ -8.934849739074707,
361
+ -8.99370002746582,
362
+ -9.030400276184082,
363
+ -9.101688385009766,
364
+ -9.08572006225586,
365
+ -9.075435638427734,
366
+ -9.125774383544922,
367
+ -9.102258682250977,
368
+ -9.160833358764648,
369
+ -8.999387741088867,
370
+ -8.929178237915039,
371
+ -9.085306167602539,
372
+ -9.149312019348145,
373
+ -9.201435089111328,
374
+ -9.119452476501465,
375
+ -9.192963600158691,
376
+ -9.153352737426758,
377
+ -9.16665267944336,
378
+ -9.187670707702637,
379
+ -9.213151931762695,
380
+ -9.295731544494629,
381
+ -9.204228401184082,
382
+ -9.2329683303833,
383
+ -9.198917388916016,
384
+ -9.242225646972656,
385
+ -9.251509666442871,
386
+ -9.233222007751465,
387
+ -9.235602378845215,
388
+ -9.264388084411621,
389
+ -9.286247253417969,
390
+ -9.287186622619629,
391
+ -9.327977180480957,
392
+ -9.304702758789062,
393
+ -9.34760570526123,
394
+ -9.314836502075195,
395
+ -9.300081253051758,
396
+ -9.20028018951416,
397
+ -9.35509967803955,
398
+ -9.345370292663574,
399
+ -9.36442756652832,
400
+ -9.351317405700684,
401
+ -9.352913856506348,
402
+ -9.388010025024414,
403
+ -9.326189994812012,
404
+ -9.411141395568848,
405
+ -9.424927711486816,
406
+ -9.376615524291992,
407
+ -9.394768714904785,
408
+ -9.382343292236328,
409
+ -9.345908164978027,
410
+ -9.387025833129883,
411
+ -9.397958755493164,
412
+ -9.370079040527344,
413
+ -9.419344902038574,
414
+ -9.414657592773438,
415
+ -9.450013160705566,
416
+ -9.424891471862793,
417
+ -9.468652725219727,
418
+ -9.437067031860352,
419
+ -9.452010154724121,
420
+ -9.476055145263672,
421
+ -9.454631805419922,
422
+ -9.519726753234863,
423
+ -9.494053840637207,
424
+ -9.349456787109375,
425
+ -9.444249153137207,
426
+ -9.432062149047852,
427
+ -9.469500541687012,
428
+ -9.506385803222656,
429
+ -9.541167259216309,
430
+ -9.514572143554688,
431
+ -9.517498016357422,
432
+ -9.508042335510254,
433
+ -9.524667739868164,
434
+ -9.513023376464844,
435
+ -9.518259048461914,
436
+ -9.491355895996094,
437
+ -9.527623176574707,
438
+ -9.503666877746582,
439
+ -9.575556755065918,
440
+ -9.51135540008545,
441
+ -9.574329376220703,
442
+ -9.559322357177734,
443
+ -9.576539993286133,
444
+ -9.587591171264648,
445
+ -9.615789413452148,
446
+ -9.586484909057617,
447
+ -9.597373008728027,
448
+ -9.565719604492188,
449
+ -9.580348014831543,
450
+ -9.544068336486816,
451
+ -9.576735496520996,
452
+ -9.617915153503418,
453
+ -9.634200096130371,
454
+ -9.50833511352539,
455
+ -9.633086204528809,
456
+ -9.622976303100586,
457
+ -9.628181457519531,
458
+ -9.385575294494629,
459
+ -9.312309265136719,
460
+ -8.996809005737305,
461
+ -9.591567993164062,
462
+ -9.602102279663086,
463
+ -9.606905937194824,
464
+ -9.660425186157227,
465
+ -9.59228229522705,
466
+ -9.66215991973877,
467
+ -9.652912139892578,
468
+ -9.683008193969727,
469
+ -9.550703048706055,
470
+ -9.616209983825684,
471
+ -7.74033784866333,
472
+ -8.984460830688477,
473
+ -9.136533737182617,
474
+ -9.426427841186523,
475
+ -9.425248146057129,
476
+ -9.407462120056152,
477
+ -9.523107528686523,
478
+ -9.644789695739746,
479
+ -9.905366897583008,
480
+ -10.454096794128418,
481
+ -10.200395584106445,
482
+ -10.226968765258789,
483
+ -10.290560722351074,
484
+ -10.246292114257812,
485
+ -10.437302589416504,
486
+ -10.198098182678223,
487
+ -10.481816291809082,
488
+ -10.32693862915039,
489
+ -10.341702461242676,
490
+ -10.409103393554688,
491
+ -10.289705276489258,
492
+ -10.6312255859375,
493
+ -10.405678749084473,
494
+ -10.4303617477417,
495
+ -10.45798110961914,
496
+ -10.44364070892334,
497
+ -10.312222480773926,
498
+ -10.264429092407227,
499
+ -10.573458671569824,
500
+ -10.27155590057373
501
+ ],
502
+ "train_time_history": [
503
+ 4284.811353683472,
504
+ 4284.813168525696,
505
+ 4239.820109844208,
506
+ 4358.5235912799835,
507
+ 4358.525362968445,
508
+ 4289.520437240601,
509
+ 4289.5296330451965,
510
+ 4233.677313089371,
511
+ 4233.679363965988,
512
+ 4209.371140003204,
513
+ 4209.381086587906,
514
+ 4202.905996799469,
515
+ 4469.978202342987,
516
+ 4469.989181756973,
517
+ 4247.160337924957,
518
+ 4247.1704177856445,
519
+ 4190.890568256378,
520
+ 4190.900403022766,
521
+ 4185.636907577515,
522
+ 4185.647009372711,
523
+ 4180.687466144562,
524
+ 4215.30419754982,
525
+ 4215.314230442047,
526
+ 4206.753845453262,
527
+ 4206.76371717453,
528
+ 4206.280591726303,
529
+ 4206.290879011154,
530
+ 4222.331785202026,
531
+ 4222.341979503632,
532
+ 4220.95298576355,
533
+ 4220.962949752808,
534
+ 4199.66743016243,
535
+ 4199.67768073082,
536
+ 4200.696933507919,
537
+ 4200.706924915314,
538
+ 4200.071183204651,
539
+ 4200.073669195175,
540
+ 4201.461757183075,
541
+ 4201.47197842598,
542
+ 4212.675180196762,
543
+ 4212.685215473175,
544
+ 4266.539958238602,
545
+ 4266.55042219162,
546
+ 4254.028660058975,
547
+ 4254.030869007111,
548
+ 4586.545968532562,
549
+ 4586.556686401367,
550
+ 4486.801070451736,
551
+ 4486.811651468277,
552
+ 4201.306690454483,
553
+ 4201.308066606522,
554
+ 4204.077554225922,
555
+ 4204.087781906128,
556
+ 4194.944247722626,
557
+ 4194.954358577728,
558
+ 4193.961704969406,
559
+ 4222.590797185898,
560
+ 4222.594073057175,
561
+ 4221.6570999622345,
562
+ 4221.666466474533,
563
+ 4221.045345544815,
564
+ 4221.055670261383,
565
+ 4214.11606669426,
566
+ 4214.125596284866,
567
+ 4479.404296398163,
568
+ 4479.414994955063,
569
+ 4262.62514591217,
570
+ 4262.635618209839,
571
+ 4214.268101215363,
572
+ 4214.2785403728485,
573
+ 4218.142910718918,
574
+ 4218.15364408493,
575
+ 4215.917347192764,
576
+ 4215.927803516388,
577
+ 4218.397645950317,
578
+ 4218.408536672592,
579
+ 4233.58446598053,
580
+ 4233.59490442276,
581
+ 4318.161808013916,
582
+ 4318.171140432358,
583
+ 4237.026048898697,
584
+ 4237.036669015884,
585
+ 4220.925004482269,
586
+ 4220.9352016448975,
587
+ 4226.221168041229,
588
+ 4223.1825070381165,
589
+ 4223.192782878876,
590
+ 4219.102268218994,
591
+ 4219.113127231598,
592
+ 4216.297616004944,
593
+ 4216.308108329773,
594
+ 4217.926244974136,
595
+ 4217.937202453613,
596
+ 4426.571401119232,
597
+ 4426.573066711426,
598
+ 4612.790915489197,
599
+ 4612.801674365997,
600
+ 4719.1595368385315,
601
+ 4719.169989824295,
602
+ 4305.255445480347,
603
+ 4305.266388177872,
604
+ 4221.674624681473,
605
+ 4221.686189174652,
606
+ 4229.138904571533,
607
+ 4178.568962574005,
608
+ 4178.5717051029205,
609
+ 4178.647545337677,
610
+ 4178.650447130203,
611
+ 4169.984578132629,
612
+ 4169.995152950287,
613
+ 4173.8019506931305,
614
+ 4173.804402589798,
615
+ 4179.692799806595,
616
+ 4179.695784330368,
617
+ 4176.926806688309,
618
+ 4176.937863111496,
619
+ 4189.7040383815765,
620
+ 4189.7144474983215,
621
+ 4194.854960680008,
622
+ 4194.8661851882935,
623
+ 4488.314256668091,
624
+ 4488.324142932892,
625
+ 4301.72206735611,
626
+ 4301.732882022858,
627
+ 4203.297667264938,
628
+ 4203.307426214218,
629
+ 4212.263510465622,
630
+ 4212.2729642391205,
631
+ 4202.838434457779,
632
+ 4202.8495717048645,
633
+ 4206.559844255447,
634
+ 4206.570970535278,
635
+ 4202.594026565552,
636
+ 4202.6052367687225,
637
+ 4204.671685695648,
638
+ 4204.675058603287,
639
+ 4201.653420209885,
640
+ 4201.664590358734,
641
+ 4203.356340646744,
642
+ 4203.3675968647,
643
+ 4226.834460258484,
644
+ 4226.84539103508,
645
+ 4432.4133422374725,
646
+ 4432.424476385117,
647
+ 4194.520195245743,
648
+ 4194.531393289566,
649
+ 4185.361557483673,
650
+ 4185.372809171677,
651
+ 4178.024575471878,
652
+ 4178.035531282425,
653
+ 4183.264570951462,
654
+ 4183.275583267212,
655
+ 4178.5521404743195,
656
+ 4178.563311338425,
657
+ 4178.228582620621,
658
+ 4178.238200426102,
659
+ 4181.432615280151,
660
+ 4181.443482160568,
661
+ 4181.636572599411,
662
+ 4181.647958517075,
663
+ 4180.119422197342,
664
+ 4180.130319356918,
665
+ 4181.348428249359,
666
+ 4181.3601496219635,
667
+ 4182.4969573020935,
668
+ 4182.508371829987,
669
+ 4255.815136909485,
670
+ 4255.824706077576,
671
+ 4447.2853989601135,
672
+ 4447.294949054718,
673
+ 4375.476977586746,
674
+ 4375.488611936569,
675
+ 4216.147409915924,
676
+ 4216.157112836838,
677
+ 4184.855574131012,
678
+ 4184.867551803589,
679
+ 4182.2731301784515,
680
+ 4182.284587860107,
681
+ 4182.427225112915,
682
+ 4182.438867807388,
683
+ 4181.939938545227,
684
+ 4181.951656103134,
685
+ 4183.5050485134125,
686
+ 4183.516293287277,
687
+ 4180.313590764999,
688
+ 4180.325238704681,
689
+ 4184.185824394226,
690
+ 4184.196978807449,
691
+ 4175.860624790192,
692
+ 4175.8725233078,
693
+ 4174.206290960312,
694
+ 4174.217987298965,
695
+ 4225.280811309814,
696
+ 4346.787808179855,
697
+ 4346.791662693024,
698
+ 4299.203949213028,
699
+ 4334.719336986542,
700
+ 4334.72660279274,
701
+ 4307.453342437744,
702
+ 4307.463569164276,
703
+ 4243.263749361038,
704
+ 4243.27504825592,
705
+ 4232.403777837753,
706
+ 4232.415019750595,
707
+ 4234.023860692978,
708
+ 4234.036010503769,
709
+ 4232.419568777084,
710
+ 4232.430717229843,
711
+ 4228.692707538605,
712
+ 4228.695293188095,
713
+ 4235.275017976761,
714
+ 4235.286781549454,
715
+ 4231.93186712265,
716
+ 4231.934266328812,
717
+ 4237.727004766464,
718
+ 4237.736963748932,
719
+ 4448.2472088336945,
720
+ 4448.257912635803,
721
+ 4283.024597644806,
722
+ 4283.03609752655,
723
+ 4270.3121337890625,
724
+ 4270.324274778366,
725
+ 4244.299434423447,
726
+ 4244.311620950699,
727
+ 4363.46278834343,
728
+ 4180.62579703331,
729
+ 4180.635629653931,
730
+ 4363.069185256958,
731
+ 4220.090236663818,
732
+ 4220.102267503738,
733
+ 4190.208593130112,
734
+ 4190.220735549927,
735
+ 4181.494255304337,
736
+ 4181.50580906868,
737
+ 4186.210835933685,
738
+ 4186.214511394501,
739
+ 4188.612834215164,
740
+ 4188.625131607056,
741
+ 4182.178534984589,
742
+ 4182.189949512482,
743
+ 4183.857384443283,
744
+ 4183.869287014008,
745
+ 4183.761756181717,
746
+ 4241.330404281616,
747
+ 4241.341110467911,
748
+ 4207.978038311005,
749
+ 4207.990997314453,
750
+ 4209.410867214203,
751
+ 4209.421168088913,
752
+ 4207.717931270599,
753
+ 4207.730401754379,
754
+ 4204.301562309265,
755
+ 4204.313354253769,
756
+ 4297.861345052719,
757
+ 4297.873908042908,
758
+ 4282.807532548904,
759
+ 4282.820100307465,
760
+ 4269.668355226517,
761
+ 4269.680841684341,
762
+ 4198.918546676636,
763
+ 4198.928604364395,
764
+ 4239.654682636261,
765
+ 4239.659080028534,
766
+ 4419.87956905365,
767
+ 4419.889652013779,
768
+ 4302.591921806335,
769
+ 4302.60400891304,
770
+ 4199.097110033035,
771
+ 4199.109765052795,
772
+ 4202.586899995804,
773
+ 4202.596865415573,
774
+ 4223.580963373184,
775
+ 4236.571214199066,
776
+ 4236.583789110184,
777
+ 4266.631365537643,
778
+ 4266.643340587616,
779
+ 4206.533836603165,
780
+ 4206.543870687485,
781
+ 4196.797498226166,
782
+ 4196.809820890427,
783
+ 4202.778592824936,
784
+ 4202.791028261185,
785
+ 4200.911655426025,
786
+ 4200.922192811966,
787
+ 4218.757748126984,
788
+ 4218.7700316905975,
789
+ 4197.834621667862,
790
+ 4197.8472237586975,
791
+ 4194.553659200668,
792
+ 4194.558137655258,
793
+ 4210.2872478961945,
794
+ 4210.291656970978,
795
+ 4269.952535390854,
796
+ 4269.963551998138,
797
+ 4214.965420722961,
798
+ 4214.9777710437775,
799
+ 4268.254637956619,
800
+ 4268.267082452774,
801
+ 4188.457591295242,
802
+ 4188.467690706253,
803
+ 4188.935349225998,
804
+ 4188.947833776474,
805
+ 4192.73951125145,
806
+ 4192.749709367752,
807
+ 4188.534428119659,
808
+ 4188.53829908371,
809
+ 4196.497691392899,
810
+ 4196.510225534439,
811
+ 4318.416720151901,
812
+ 4318.4267864227295,
813
+ 4209.298709154129,
814
+ 4204.6052923202515,
815
+ 4204.609621763229,
816
+ 4192.598699092865,
817
+ 4192.6110072135925,
818
+ 4264.5488522052765,
819
+ 4264.562687158585,
820
+ 4342.3707575798035,
821
+ 4342.3756980896,
822
+ 4299.415410995483,
823
+ 4299.425767421722,
824
+ 4285.986501693726,
825
+ 4285.999414205551,
826
+ 4251.881839513779,
827
+ 4251.89198923111,
828
+ 4217.251371145248,
829
+ 4217.262971401215,
830
+ 4265.004074335098,
831
+ 4265.016601800919,
832
+ 4422.643936634064,
833
+ 4453.576984167099,
834
+ 4453.588968753815,
835
+ 4183.795456409454,
836
+ 4183.80871462822,
837
+ 4183.177849292755,
838
+ 4183.1909646987915,
839
+ 4190.727601289749,
840
+ 4190.740168809891,
841
+ 4185.585786104202,
842
+ 4185.596675872803,
843
+ 4186.326423406601,
844
+ 4186.3365132808685,
845
+ 4188.701127767563,
846
+ 4188.713495969772,
847
+ 4183.693524837494,
848
+ 4183.706875085831,
849
+ 4182.603164672852,
850
+ 4182.169225692749,
851
+ 4182.182250261307,
852
+ 4183.1377918720245,
853
+ 4183.142628669739,
854
+ 4179.616315603256,
855
+ 4179.626562833786,
856
+ 4304.994537830353,
857
+ 4305.007478475571,
858
+ 4361.554908275604,
859
+ 4361.56044960022,
860
+ 4368.104673624039,
861
+ 4368.11031460762,
862
+ 4246.525162935257,
863
+ 4246.5380046367645,
864
+ 4183.925352096558,
865
+ 4232.265904188156,
866
+ 4232.277180671692,
867
+ 4238.892568349838,
868
+ 4238.905729055405,
869
+ 4187.827491521835,
870
+ 4187.84108877182,
871
+ 4190.126079082489,
872
+ 4190.13965845108,
873
+ 4190.435103654861,
874
+ 4190.440406799316,
875
+ 4191.884477853775,
876
+ 4191.897578239441,
877
+ 4187.4977107048035,
878
+ 4172.838095903397,
879
+ 4172.843760967255,
880
+ 4177.684302330017,
881
+ 4177.6969130039215,
882
+ 4172.654875993729,
883
+ 4172.667930603027,
884
+ 4174.483522415161,
885
+ 4174.496375083923,
886
+ 4166.372047901154,
887
+ 4166.384793281555,
888
+ 4283.736061811447,
889
+ 3653.9717135429382,
890
+ 3653.97727560997,
891
+ 3628.154771566391,
892
+ 3628.159923315048,
893
+ 3652.242630004883,
894
+ 3652.2448382377625,
895
+ 3646.540367603302,
896
+ 3646.542966604233,
897
+ 3608.7122309207916,
898
+ 3608.717301607132,
899
+ 3608.8411026000977,
900
+ 3608.846682548523,
901
+ 3606.5311863422394,
902
+ 3606.5361762046814,
903
+ 3611.4129967689514,
904
+ 3611.418157339096,
905
+ 3610.7246301174164,
906
+ 3610.729764699936,
907
+ 3607.0119185447693,
908
+ 3607.0174593925476,
909
+ 3607.5829951763153,
910
+ 3607.5891518592834,
911
+ 3607.95986866951,
912
+ 3607.964668035507,
913
+ 3614.2318153381348,
914
+ 3614.2375481128693,
915
+ 3618.1517746448517,
916
+ 3618.1568336486816,
917
+ 3622.268902540207,
918
+ 3667.4287581443787,
919
+ 3667.433854341507,
920
+ 3623.2074506282806,
921
+ 3623.212779045105,
922
+ 3643.333916425705,
923
+ 3643.339797258377,
924
+ 3641.6545128822327,
925
+ 3641.6596987247467,
926
+ 3627.3986847400665,
927
+ 3627.4038894176483,
928
+ 3628.012758731842,
929
+ 3628.017865419388,
930
+ 3635.4565312862396,
931
+ 3635.461765527725,
932
+ 3620.5242562294006,
933
+ 3620.529673099518,
934
+ 3640.751862049103,
935
+ 3640.7576014995575,
936
+ 3647.903746366501,
937
+ 3647.9063782691956,
938
+ 3645.3971898555756,
939
+ 3645.4029626846313,
940
+ 3645.676437139511,
941
+ 3645.680727005005,
942
+ 3624.857933282852,
943
+ 3624.863513469696,
944
+ 3629.3647339344025,
945
+ 3629.370223760605,
946
+ 3664.6942942142487
947
+ ],
948
+ "valid_loss_history": [
949
+ -2.2420080729893277,
950
+ -3.6040473665509904,
951
+ -4.652349131447928,
952
+ -5.269411563873291,
953
+ -5.602223873138428,
954
+ -5.948959009987967,
955
+ -6.180064678192139,
956
+ -6.373329707554409,
957
+ -6.4635710035051614,
958
+ -6.628378936222622,
959
+ -6.765629632132394,
960
+ -6.878908634185791,
961
+ -6.975889819008963,
962
+ -7.089849744524274,
963
+ -7.137168339320591,
964
+ -7.214839458465576,
965
+ -7.248862539018903,
966
+ -7.323270389011928,
967
+ -7.374068532671247,
968
+ -7.447478975568499,
969
+ -7.470496041434152,
970
+ -7.578763212476458,
971
+ -7.638515608651297,
972
+ -7.603791032518659,
973
+ -7.658165522984096,
974
+ -7.660087721688407,
975
+ -7.711926255907331,
976
+ -7.763034411839077,
977
+ -7.80566440309797,
978
+ -7.829599516732352,
979
+ -7.908110482352121,
980
+ -7.871029717581613,
981
+ -7.790640013558524,
982
+ -7.807113443102155,
983
+ -7.826304980686733,
984
+ -7.77531235558646,
985
+ -7.879563399723598,
986
+ -7.897988796234131,
987
+ -7.845814909253802,
988
+ -7.848473821367536,
989
+ -7.912371976034982,
990
+ -7.943405968802316,
991
+ -8.085525648934501,
992
+ -8.010899543762207,
993
+ -8.028815746307373,
994
+ -8.061845302581787,
995
+ -8.02747140611921,
996
+ -8.03413268498012,
997
+ -8.033596924373082,
998
+ -8.068816934313093,
999
+ -8.067536762782506,
1000
+ -8.144167695726667,
1001
+ -8.148260184696742,
1002
+ -8.180625711168561,
1003
+ -8.180845873696464,
1004
+ -8.25086770738874,
1005
+ -8.261961323874337,
1006
+ -8.260808059147426,
1007
+ -8.186679295131139,
1008
+ -8.165157794952393,
1009
+ -8.194125039236885,
1010
+ -8.254536492483956,
1011
+ -8.292360033307757,
1012
+ -8.267435346330915,
1013
+ -8.27747208731515,
1014
+ -8.366285255977086,
1015
+ -8.354675361088344,
1016
+ -8.365063190460205,
1017
+ -8.427791595458984,
1018
+ -8.452910355159215,
1019
+ -8.395057133265905,
1020
+ -8.455147879464286,
1021
+ -8.485073634556361,
1022
+ -8.504877976008824,
1023
+ -8.502339363098145,
1024
+ -8.485261576516288,
1025
+ -8.50761045728411,
1026
+ -8.482435567038399,
1027
+ -8.516456604003906,
1028
+ -8.503895146506173,
1029
+ -8.515655858176094,
1030
+ -8.574515002114433,
1031
+ -8.580681255885533,
1032
+ -8.593669959477015,
1033
+ -8.538264206477574,
1034
+ -8.570460319519043,
1035
+ -8.610838617597308,
1036
+ -8.576563426426478,
1037
+ -8.631826945713588,
1038
+ -8.593990189688546,
1039
+ -8.584804126194545,
1040
+ -8.616937228611537,
1041
+ -8.616405078342982,
1042
+ -8.636415685926165,
1043
+ -8.736162253788539,
1044
+ -8.684600080762591,
1045
+ -8.751097747257777,
1046
+ -8.744481086730957,
1047
+ -8.760670593806676,
1048
+ -8.81410721370152,
1049
+ -8.762031418936592,
1050
+ -8.731195313589913,
1051
+ -8.680067879813057,
1052
+ -8.73148284639631,
1053
+ -8.770104340144567,
1054
+ -8.83363403592791,
1055
+ -8.797364848000663,
1056
+ -8.756126131330218,
1057
+ -8.717773846217565,
1058
+ -8.755549158368792,
1059
+ -8.798967293330602,
1060
+ -8.80781262261527,
1061
+ -8.879967212677002,
1062
+ -8.83057907649449,
1063
+ -8.910664354051862,
1064
+ -8.930669920785087,
1065
+ -8.850233895438057,
1066
+ -8.87684679031372,
1067
+ -8.860790797642299,
1068
+ -8.854635306767054,
1069
+ -8.871529306684222,
1070
+ -8.870055334908622,
1071
+ -8.814562388828822,
1072
+ -8.895111628941127,
1073
+ -8.95235286440168,
1074
+ -8.978583880833217,
1075
+ -8.970093931470599,
1076
+ -8.94366032736642,
1077
+ -8.930564199175153,
1078
+ -8.896938255855016,
1079
+ -9.003027439117432,
1080
+ -8.967686380658831,
1081
+ -8.945790427071708,
1082
+ -8.978134904588972,
1083
+ -8.926983833312988,
1084
+ -8.911829403468541,
1085
+ -9.004649843488421,
1086
+ -8.982011726924352,
1087
+ -9.004248074122838,
1088
+ -9.022075244358607,
1089
+ -9.055972508021764,
1090
+ -9.095445496695381,
1091
+ -9.014348983764648,
1092
+ -9.017100266047887,
1093
+ -9.06740631375994,
1094
+ -9.062205382755824,
1095
+ -9.006571020398821,
1096
+ -9.060756206512451,
1097
+ -9.114073821476527,
1098
+ -9.12088053567069,
1099
+ -9.146572181156703,
1100
+ -9.129499162946429,
1101
+ -9.162499564034599,
1102
+ -9.146372726985387,
1103
+ -9.138916151864189,
1104
+ -9.140360014779228,
1105
+ -9.14337342126029,
1106
+ -9.13001537322998,
1107
+ -9.089552674974714,
1108
+ -9.172866821289062,
1109
+ -9.200943265642438,
1110
+ -9.191112245832171,
1111
+ -9.207633904048375,
1112
+ -9.147029059273857,
1113
+ -9.17673145021711,
1114
+ -9.129148755754743,
1115
+ -9.157607623508998,
1116
+ -9.13064786366054,
1117
+ -9.154420512063163,
1118
+ -9.181631565093994,
1119
+ -9.155359063829694,
1120
+ -9.158296721322197,
1121
+ -9.156671251569476,
1122
+ -9.154706001281738,
1123
+ -9.167226382664271,
1124
+ -9.163607052394322,
1125
+ -9.209595475878034,
1126
+ -9.310745784214564,
1127
+ -9.238739694867816,
1128
+ -9.288273334503174,
1129
+ -9.2847033228193,
1130
+ -9.313508306230817,
1131
+ -9.334877354758126,
1132
+ -9.270281859806605,
1133
+ -9.189015797206334,
1134
+ -9.247245516095843,
1135
+ -9.272651195526123,
1136
+ -9.430454867226738,
1137
+ -9.431772300175258,
1138
+ -9.406911509377617,
1139
+ -9.434791496821813,
1140
+ -9.40122835976737,
1141
+ -9.331563881465367,
1142
+ -9.266850130898613,
1143
+ -9.263189588274274,
1144
+ -9.341036796569824,
1145
+ -9.302794524601527,
1146
+ -9.364838123321533,
1147
+ -9.468104021889824,
1148
+ -9.427109173366002,
1149
+ -7.843665736062186,
1150
+ -8.728734561375209,
1151
+ -8.838280609675817,
1152
+ -8.92993450164795,
1153
+ -8.983613082340785,
1154
+ -9.073682171957833,
1155
+ -9.108558654785156,
1156
+ -9.189598287854876,
1157
+ -9.265264647347587,
1158
+ -9.251329898834229,
1159
+ -9.130606515066964,
1160
+ -9.009151935577393,
1161
+ -8.934782436915807,
1162
+ -8.819936275482178,
1163
+ -8.798521995544434,
1164
+ -9.209092957632881,
1165
+ -9.022621767861503,
1166
+ -8.851909978049141,
1167
+ -8.771939413888115,
1168
+ -8.668200629098076,
1169
+ -8.705128737858363,
1170
+ -8.659645216805595,
1171
+ -8.618493284497942,
1172
+ -8.664817196982247,
1173
+ -8.702061380658831,
1174
+ -8.703030790601458,
1175
+ -8.59453991481236,
1176
+ -8.648234503609794,
1177
+ -8.672728061676025,
1178
+ -8.749418190547399
1179
+ ]
1180
+ }
MedleyVox-MultiSinger/vocal 231/loss_graph_vocals.png ADDED
MelBand-Roformer-Deux-Becruily/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
MelBand-Roformer-Deux-Becruily/README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+ Dual model for vocal and instrumental separation based on Mel-Band RoFormer architecture.
5
+
6
+ Metric sdr for instrum: 17.5466
7
+
8
+ Metric sdr for vocals: 11.3695
MelBand-Roformer-Deux-Becruily/config_deux_becruily.yaml ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 573300
3
+ dim_f: 1024
4
+ dim_t: 256
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 12
14
+ stereo: true
15
+ num_stems: 2
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ num_bands: 60
19
+ dim_head: 64
20
+ heads: 8
21
+ attn_dropout: 0
22
+ ff_dropout: 0
23
+ flash_attn: true
24
+ dim_freqs_in: 1025
25
+ sample_rate: 44100
26
+ stft_n_fft: 2048
27
+ stft_hop_length: 441
28
+ stft_win_length: 2048
29
+ stft_normalized: false
30
+ mask_estimator_depth: 2
31
+ multi_stft_resolution_loss_weight: 1.0
32
+ multi_stft_resolutions_window_sizes: !!python/tuple
33
+ - 4096
34
+ - 2048
35
+ - 1024
36
+ - 512
37
+ - 256
38
+ multi_stft_hop_size: 147
39
+ multi_stft_normalized: false
40
+
41
+ training:
42
+ batch_size: 1
43
+ gradient_accumulation_steps: 1
44
+ grad_clip: 0
45
+ instruments:
46
+ - Vocals
47
+ - Instrumental
48
+ lr: 0.0001
49
+ patience: 2
50
+ reduce_factor: 0.95
51
+ target_instrument:
52
+ num_epochs: 1000
53
+ num_steps: 1000
54
+ q: 0.95
55
+ coarse_loss_clip: false
56
+ ema_momentum: 0.999
57
+ optimizer: adamw
58
+ other_fix: false
59
+ use_amp: true
60
+
61
+ inference:
62
+ batch_size: 1
63
+ dim_t: 1101
64
+ num_overlap: 2
MelBandRoformer-Original/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
MelBandRoformer-Original/README.md ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ ---
MelBandRoformers/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
MelBandRoformers/bsroformers/karaoke_bs_roformer.yaml ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 801 # don't work (use in model)
5
+ hop_length: 441 # don't work (use in model)
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 12
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ linear_transformer_depth: 0
19
+ freqs_per_bands: !!python/tuple
20
+ - 2
21
+ - 2
22
+ - 2
23
+ - 2
24
+ - 2
25
+ - 2
26
+ - 2
27
+ - 2
28
+ - 2
29
+ - 2
30
+ - 2
31
+ - 2
32
+ - 2
33
+ - 2
34
+ - 2
35
+ - 2
36
+ - 2
37
+ - 2
38
+ - 2
39
+ - 2
40
+ - 2
41
+ - 2
42
+ - 2
43
+ - 2
44
+ - 4
45
+ - 4
46
+ - 4
47
+ - 4
48
+ - 4
49
+ - 4
50
+ - 4
51
+ - 4
52
+ - 4
53
+ - 4
54
+ - 4
55
+ - 4
56
+ - 12
57
+ - 12
58
+ - 12
59
+ - 12
60
+ - 12
61
+ - 12
62
+ - 12
63
+ - 12
64
+ - 24
65
+ - 24
66
+ - 24
67
+ - 24
68
+ - 24
69
+ - 24
70
+ - 24
71
+ - 24
72
+ - 48
73
+ - 48
74
+ - 48
75
+ - 48
76
+ - 48
77
+ - 48
78
+ - 48
79
+ - 48
80
+ - 128
81
+ - 129
82
+ dim_head: 64
83
+ heads: 8
84
+ attn_dropout: 0.0
85
+ ff_dropout: 0.0
86
+ flash_attn: true
87
+ dim_freqs_in: 1025
88
+ stft_n_fft: 2048
89
+ stft_hop_length: 512
90
+ stft_win_length: 2048
91
+ stft_normalized: false
92
+ mask_estimator_depth: 2
93
+ multi_stft_resolution_loss_weight: 1.0
94
+ multi_stft_resolutions_window_sizes: !!python/tuple
95
+ - 4096
96
+ - 2048
97
+ - 1024
98
+ - 512
99
+ - 256
100
+ multi_stft_hop_size: 147
101
+ multi_stft_normalized: False
102
+ mlp_expansion_factor: 4
103
+ use_torch_checkpoint: True
104
+ skip_connection: False
105
+
106
+
107
+ training:
108
+ batch_size: 1
109
+ gradient_accumulation_steps: 999
110
+ grad_clip: 1
111
+ instruments: ['vocals', 'other']
112
+ lr: 1.0e-5
113
+ patience: 1000000
114
+ reduce_factor: 0.75
115
+ target_instrument: vocals
116
+ num_epochs: 1000
117
+ num_steps: 1000
118
+ q: 0.95
119
+ coarse_loss_clip: true
120
+ ema_momentum: 0.999
121
+ optimizer: Fira
122
+ other_fix: True # it's needed for checking on multisong dataset if other is actually instrumental
123
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
124
+ use_torch_checkpoint: True
125
+
126
+ inference:
127
+ batch_size: 6
128
+ dim_t: 1251
129
+ num_overlap: 2
MelBandRoformers/melbandroformers/instrumental/inst_gabox.yaml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 485100
3
+ dim_f: 1024
4
+ dim_t: 1101
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 384
13
+ depth: 6
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ num_bands: 60
19
+ dim_head: 64
20
+ heads: 8
21
+ attn_dropout: 0
22
+ ff_dropout: 0
23
+ flash_attn: True
24
+ dim_freqs_in: 1025
25
+ sample_rate: 44100 # needed for mel filter bank from librosa
26
+ stft_n_fft: 2048
27
+ stft_hop_length: 441
28
+ stft_win_length: 2048
29
+ stft_normalized: False
30
+ mask_estimator_depth: 2
31
+ multi_stft_resolution_loss_weight: 1.0
32
+ multi_stft_resolutions_window_sizes: !!python/tuple
33
+ - 4096
34
+ - 2048
35
+ - 1024
36
+ - 512
37
+ - 256
38
+ multi_stft_hop_size: 147
39
+ multi_stft_normalized: False
40
+
41
+ training:
42
+ instruments:
43
+ - Instrumental
44
+ - Vocals
45
+ target_instrument: Instrumental
46
+ use_amp: True
47
+
48
+ inference:
49
+ batch_size: 1
50
+ dim_t: 1101
51
+ num_overlap: 2
MelBandRoformers/melbandroformers/instrumental/v10.yaml ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 256
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.00
10
+
11
+ model:
12
+ dim: 256
13
+ depth: 12
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ num_bands: 60
19
+ dim_head: 64
20
+ heads: 8
21
+ attn_dropout: 0
22
+ ff_dropout: 0
23
+ flash_attn: true
24
+ dim_freqs_in: 1025
25
+ sample_rate: 44100
26
+ stft_n_fft: 2048
27
+ stft_hop_length: 441
28
+ stft_win_length: 2048
29
+ stft_normalized: true
30
+ mask_estimator_depth: 2
31
+ multi_stft_resolution_loss_weight: 1.0
32
+ multi_stft_resolutions_window_sizes: !!python/tuple
33
+ - 4096
34
+ - 2048
35
+ - 1024
36
+ - 512
37
+ - 256
38
+ multi_stft_hop_size: 250
39
+ multi_stft_normalized: false
40
+ use_torch_checkpoint: true
41
+
42
+ training:
43
+ batch_size: 1
44
+ gradient_accumulation_steps: 999999999999999999999999
45
+ grad_clip: 0
46
+ instruments:
47
+ - other
48
+ - vocals
49
+ lr: 0.00001
50
+ patience: 100000000
51
+ reduce_factor: 0.95
52
+ target_instrument: other
53
+ num_epochs: 1000
54
+ num_steps: 1000
55
+ augmentation: false # enable augmentations by audiomentations and pedalboard
56
+ augmentation_type: simple1
57
+ use_mp3_compress: false # Deprecated
58
+ augmentation_mix: false # Mix several stems of the same type with some probability
59
+ augmentation_loudness: true # randomly change loudness of each stem
60
+ augmentation_loudness_type: 1 # Type 1 or 2
61
+ augmentation_loudness_min: 0
62
+ augmentation_loudness_max: 0
63
+ q: 0.95
64
+ coarse_loss_clip: false
65
+ ema_momentum: 0.999
66
+ optimizer: adamw
67
+ other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
68
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
69
+
70
+ inference:
71
+ batch_size: 1
72
+ dim_t: 1101
73
+ num_overlap: 2
MelBandRoformers/melbandroformers/karaoke/karaokegabox_1750911344.yaml ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 485100
3
+ dim_f: 1024
4
+ dim_t: 256
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.000
10
+
11
+ model:
12
+ dim: 384
13
+ depth: 6
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ num_bands: 60
19
+ dim_head: 64
20
+ heads: 8
21
+ attn_dropout: 0
22
+ ff_dropout: 0
23
+ flash_attn: true
24
+ dim_freqs_in: 1025
25
+ sample_rate: 44100 # needed for mel filter bank from librosa
26
+ stft_n_fft: 2048
27
+ stft_hop_length: 441
28
+ stft_win_length: 2048
29
+ stft_normalized: false
30
+ mask_estimator_depth: 2
31
+ multi_stft_resolution_loss_weight: 1.0
32
+ multi_stft_resolutions_window_sizes: !!python/tuple
33
+ - 4096
34
+ - 2048
35
+ - 1024
36
+ - 512
37
+ - 256
38
+ multi_stft_hop_size: 147
39
+ multi_stft_normalized: true
40
+
41
+ training:
42
+ batch_size: 1
43
+ gradient_accumulation_steps: 1
44
+ grad_clip: 0
45
+ instruments:
46
+ - Vocals
47
+ - Instrumental
48
+ lr: 0.0005
49
+ patience: 2
50
+ reduce_factor: 0.95
51
+ target_instrument: Vocals
52
+ num_epochs: 1000
53
+ num_steps: 1000
54
+ augmentation: false # enable augmentations by audiomentations and pedalboard
55
+ augmentation_type:
56
+ use_mp3_compress: false # Deprecated
57
+ augmentation_mix: false # Mix several stems of the same type with some probability
58
+ augmentation_loudness: false # randomly change loudness of each stem
59
+ augmentation_loudness_type: 1 # Type 1 or 2
60
+ augmentation_loudness_min: 0
61
+ augmentation_loudness_max: 0
62
+ q: 0.95
63
+ coarse_loss_clip: false
64
+ ema_momentum: 0.999
65
+ optimizer: adamw
66
+ other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
67
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
68
+
69
+ inference:
70
+ batch_size: 1
71
+ dim_t: 1101
72
+ num_overlap: 8
MelBandRoformers/melbandroformers/vocals/voc_gabox.yaml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 352800
3
+ dim_f: 1024
4
+ dim_t: 256
5
+ hop_length: 441
6
+ n_fft: 2048
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.001
10
+
11
+ model:
12
+ dim: 384
13
+ depth: 6
14
+ stereo: true
15
+ num_stems: 1
16
+ time_transformer_depth: 1
17
+ freq_transformer_depth: 1
18
+ num_bands: 60
19
+ dim_head: 64
20
+ heads: 8
21
+ attn_dropout: 0
22
+ ff_dropout: 0
23
+ flash_attn: True
24
+ dim_freqs_in: 1025
25
+ sample_rate: 44100 # needed for mel filter bank from librosa
26
+ stft_n_fft: 2048
27
+ stft_hop_length: 441
28
+ stft_win_length: 2048
29
+ stft_normalized: False
30
+ mask_estimator_depth: 2
31
+ multi_stft_resolution_loss_weight: 1.0
32
+ multi_stft_resolutions_window_sizes: !!python/tuple
33
+ - 4096
34
+ - 2048
35
+ - 1024
36
+ - 512
37
+ - 256
38
+ multi_stft_hop_size: 147
39
+ multi_stft_normalized: False
40
+
41
+ training:
42
+ instruments:
43
+ - Vocals
44
+ - Instrumental
45
+ target_instrument: Vocals
46
+
47
+ inference:
48
+ batch_size: 1
49
+ dim_t: 1101
50
+ num_overlap: 1
51
+ chunk_size: 352800
Single_Models/ZFTurbo/Vocals/config_vocals_htdemucs.yaml ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 485100 # samplerate * segment
3
+ min_mean_abs: 0.001
4
+ hop_length: 1024
5
+
6
+ training:
7
+ batch_size: 10
8
+ gradient_accumulation_steps: 1
9
+ grad_clip: 0
10
+ segment: 11
11
+ shift: 1
12
+ samplerate: 44100
13
+ channels: 2
14
+ normalize: true
15
+ instruments: ['vocals', 'other']
16
+ target_instrument: null
17
+ num_epochs: 1000
18
+ num_steps: 1000
19
+ optimizer: adam
20
+ lr: 9.0e-05
21
+ patience: 2
22
+ reduce_factor: 0.95
23
+ q: 0.95
24
+ coarse_loss_clip: true
25
+ ema_momentum: 0.999
26
+ other_fix: true # it's needed for checking on multisong dataset if other is actually instrumental
27
+ use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
28
+
29
+ augmentations:
30
+ enable: true # enable or disable all augmentations (to fast disable if needed)
31
+ loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
32
+ loudness_min: 0.5
33
+ loudness_max: 1.5
34
+ mixup: true # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
35
+ mixup_probs: [0.2, 0.02]
36
+ mixup_loudness_min: 0.5
37
+ mixup_loudness_max: 1.5
38
+
39
+ inference:
40
+ num_overlap: 2
41
+ batch_size: 8
42
+
43
+ model: htdemucs
44
+
45
+ htdemucs: # see demucs/htdemucs.py for a detailed description
46
+ # Channels
47
+ channels: 48
48
+ channels_time:
49
+ growth: 2
50
+ # STFT
51
+ num_subbands: 1
52
+ nfft: 4096
53
+ wiener_iters: 0
54
+ end_iters: 0
55
+ wiener_residual: false
56
+ cac: true
57
+ # Main structure
58
+ depth: 4
59
+ rewrite: true
60
+ # Frequency Branch
61
+ multi_freqs: []
62
+ multi_freqs_depth: 3
63
+ freq_emb: 0.2
64
+ emb_scale: 10
65
+ emb_smooth: true
66
+ # Convolutions
67
+ kernel_size: 8
68
+ stride: 4
69
+ time_stride: 2
70
+ context: 1
71
+ context_enc: 0
72
+ # normalization
73
+ norm_starts: 4
74
+ norm_groups: 4
75
+ # DConv residual branch
76
+ dconv_mode: 3
77
+ dconv_depth: 2
78
+ dconv_comp: 8
79
+ dconv_init: 1e-3
80
+ # Before the Transformer
81
+ bottom_channels: 512
82
+ # CrossTransformer
83
+ # ------ Common to all
84
+ # Regular parameters
85
+ t_layers: 5
86
+ t_hidden_scale: 4.0
87
+ t_heads: 8
88
+ t_dropout: 0.0
89
+ t_layer_scale: True
90
+ t_gelu: True
91
+ # ------------- Positional Embedding
92
+ t_emb: sin
93
+ t_max_positions: 10000 # for the scaled embedding
94
+ t_max_period: 10000.0
95
+ t_weight_pos_embed: 1.0
96
+ t_cape_mean_normalize: True
97
+ t_cape_augment: True
98
+ t_cape_glob_loc_scale: [5000.0, 1.0, 1.4]
99
+ t_sin_random_shift: 0
100
+ # ------------- norm before a transformer encoder
101
+ t_norm_in: True
102
+ t_norm_in_group: False
103
+ # ------------- norm inside the encoder
104
+ t_group_norm: False
105
+ t_norm_first: True
106
+ t_norm_out: True
107
+ # ------------- optim
108
+ t_weight_decay: 0.0
109
+ t_lr:
110
+ # ------------- sparsity
111
+ t_sparse_self_attn: False
112
+ t_sparse_cross_attn: False
113
+ t_mask_type: diag
114
+ t_mask_random_seed: 42
115
+ t_sparse_attn_window: 400
116
+ t_global_window: 100
117
+ t_sparsity: 0.95
118
+ t_auto_sparsity: False
119
+ # Cross Encoder First (False)
120
+ t_cross_first: False
121
+ # Weight init
122
+ rescale: 0.1
123
+
Single_Models/ZFTurbo/Vocals/config_vocals_mdx23c.yaml ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ audio:
2
+ chunk_size: 261120
3
+ dim_f: 4096
4
+ dim_t: 256
5
+ hop_length: 1024
6
+ n_fft: 8192
7
+ num_channels: 2
8
+ sample_rate: 44100
9
+ min_mean_abs: 0.001
10
+
11
+ model:
12
+ act: gelu
13
+ bottleneck_factor: 4
14
+ growth: 128
15
+ norm: InstanceNorm
16
+ num_blocks_per_scale: 2
17
+ num_channels: 128
18
+ num_scales: 5
19
+ num_subbands: 4
20
+ scale:
21
+ - 2
22
+ - 2
23
+
24
+ training:
25
+ batch_size: 6
26
+ gradient_accumulation_steps: 1
27
+ grad_clip: 0
28
+ instruments:
29
+ - vocals
30
+ - other
31
+ lr: 9.0e-05
32
+ patience: 2
33
+ reduce_factor: 0.95
34
+ target_instrument: null
35
+ num_epochs: 1000
36
+ num_steps: 1000
37
+ augmentation: false # enable augmentations by audiomentations and pedalboard
38
+ augmentation_type: simple1
39
+ use_mp3_compress: false # Deprecated
40
+ augmentation_mix: true # Mix several stems of the same type with some probability
41
+ augmentation_loudness: true # randomly change loudness of each stem
42
+ augmentation_loudness_type: 1 # Type 1 or 2
43
+ augmentation_loudness_min: 0.5
44
+ augmentation_loudness_max: 1.5
45
+ q: 0.95
46
+ coarse_loss_clip: true
47
+ ema_momentum: 0.999
48
+ optimizer: adam
49
+ other_fix: true # it's needed for checking on multisong dataset if other is actually instrumental
50
+
51
+ inference:
52
+ batch_size: 1
53
+ dim_t: 256
54
+ num_overlap: 4
Stable-Audio-Open-1.0/LICENSE.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ STABILITY AI COMMUNITY LICENSE AGREEMENT
2
+
3
+ Last Updated: July 5, 2024
4
+
5
+ 1. INTRODUCTION
6
+
7
+ This Agreement applies to any individual person or entity (“You”, “Your” or “Licensee”) that uses or distributes any portion or element of the Stability AI Materials or Derivative Works thereof for any Research & Non-Commercial or Commercial purpose. Capitalized terms not otherwise defined herein are defined in Section V below.
8
+
9
+ This Agreement is intended to allow research, non-commercial, and limited commercial uses of the Models free of charge. In order to ensure that certain limited commercial uses of the Models continue to be allowed, this Agreement preserves free access to the Models for people or organizations generating annual revenue of less than US $1,000,000 (or local currency equivalent).
10
+
11
+ By clicking “I Accept” or by using or distributing or using any portion or element of the Stability Materials or Derivative Works, You agree that You have read, understood and are bound by the terms of this Agreement. If You are acting on behalf of a company, organization or other entity, then “You” includes you and that entity, and You agree that You: (i) are an authorized representative of such entity with the authority to bind such entity to this Agreement, and (ii) You agree to the terms of this Agreement on that entity’s behalf.
12
+
13
+ 2. RESEARCH & NON-COMMERCIAL USE LICENSE
14
+
15
+ Subject to the terms of this Agreement, Stability AI grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable and royalty-free limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Stability AI Materials to use, reproduce, distribute, and create Derivative Works of, and make modifications to, the Stability AI Materials for any Research or Non-Commercial Purpose. “Research Purpose” means academic or scientific advancement, and in each case, is not primarily intended for commercial advantage or monetary compensation to You or others. “Non-Commercial Purpose” means any purpose other than a Research Purpose that is not primarily intended for commercial advantage or monetary compensation to You or others, such as personal use (i.e., hobbyist) or evaluation and testing.
16
+
17
+ 3. COMMERCIAL USE LICENSE
18
+
19
+ Subject to the terms of this Agreement (including the remainder of this Section III), Stability AI grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable and royalty-free limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Stability AI Materials to use, reproduce, distribute, and create Derivative Works of, and make modifications to, the Stability AI Materials for any Commercial Purpose. “Commercial Purpose” means any purpose other than a Research Purpose or Non-Commercial Purpose that is primarily intended for commercial advantage or monetary compensation to You or others, including but not limited to, (i) creating, modifying, or distributing Your product or service, including via a hosted service or application programming interface, and (ii) for Your business’s or organization’s internal operations.
20
+ If You are using or distributing the Stability AI Materials for a Commercial Purpose, You must register with Stability AI at (https://stability.ai/community-license). If at any time You or Your Affiliate(s), either individually or in aggregate, generate more than USD $1,000,000 in annual revenue (or the equivalent thereof in Your local currency), regardless of whether that revenue is generated directly or indirectly from the Stability AI Materials or Derivative Works, any licenses granted to You under this Agreement shall terminate as of such date. You must request a license from Stability AI at (https://stability.ai/enterprise) , which Stability AI may grant to You in its sole discretion. If you receive Stability AI Materials, or any Derivative Works thereof, from a Licensee as part of an integrated end user product, then Section III of this Agreement will not apply to you.
21
+
22
+ 4. GENERAL TERMS
23
+
24
+ Your Research, Non-Commercial, and Commercial License(s) under this Agreement are subject to the following terms.
25
+ a. Distribution & Attribution. If You distribute or make available the Stability AI Materials or a Derivative Work to a third party, or a product or service that uses any portion of them, You shall: (i) provide a copy of this Agreement to that third party, (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "This Stability AI Model is licensed under the Stability AI Community License, Copyright © Stability AI Ltd. All Rights Reserved”, and (iii) prominently display “Powered by Stability AI” on a related website, user interface, blogpost, about page, or product documentation. If You create a Derivative Work, You may add your own attribution notice(s) to the “Notice” text file included with that Derivative Work, provided that You clearly indicate which attributions apply to the Stability AI Materials and state in the “Notice” text file that You changed the Stability AI Materials and how it was modified.
26
+ b. Use Restrictions. Your use of the Stability AI Materials and Derivative Works, including any output or results of the Stability AI Materials or Derivative Works, must comply with applicable laws and regulations (including Trade Control Laws and equivalent regulations) and adhere to the Documentation and Stability AI’s AUP, which is hereby incorporated by reference. Furthermore, You will not use the Stability AI Materials or Derivative Works, or any output or results of the Stability AI Materials or Derivative Works, to create or improve any foundational generative AI model (excluding the Models or Derivative Works).
27
+ c. Intellectual Property.
28
+ (i) Trademark License. No trademark licenses are granted under this Agreement, and in connection with the Stability AI Materials or Derivative Works, You may not use any name or mark owned by or associated with Stability AI or any of its Affiliates, except as required under Section IV(a) herein.
29
+ (ii) Ownership of Derivative Works. As between You and Stability AI, You are the owner of Derivative Works You create, subject to Stability AI’s ownership of the Stability AI Materials and any Derivative Works made by or for Stability AI.
30
+ (iii) Ownership of Outputs. As between You and Stability AI, You own any outputs generated from the Models or Derivative Works to the extent permitted by applicable law.
31
+ (iv) Disputes. If You or Your Affiliate(s) institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Stability AI Materials, Derivative Works or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by You, then any licenses granted to You under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to Your use or distribution of the Stability AI Materials or Derivative Works in violation of this Agreement.
32
+ (v) Feedback. From time to time, You may provide Stability AI with verbal and/or written suggestions, comments or other feedback related to Stability AI’s existing or prospective technology, products or services (collectively, “Feedback”). You are not obligated to provide Stability AI with Feedback, but to the extent that You do, You hereby grant Stability AI a perpetual, irrevocable, royalty-free, fully-paid, sub-licensable, transferable, non-exclusive, worldwide right and license to exploit the Feedback in any manner without restriction. Your Feedback is provided “AS IS” and You make no warranties whatsoever about any Feedback.
33
+ d. Disclaimer Of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE STABILITY AI MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OR LAWFULNESS OF USING OR REDISTRIBUTING THE STABILITY AI MATERIALS, DERIVATIVE WORKS OR ANY OUTPUT OR RESULTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE STABILITY AI MATERIALS, DERIVATIVE WORKS AND ANY OUTPUT AND RESULTS.
34
+ e. Limitation Of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
35
+ f. Term And Termination. The term of this Agreement will commence upon Your acceptance of this Agreement or access to the Stability AI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if You are in breach of any term or condition of this Agreement. Upon termination of this Agreement, You shall delete and cease use of any Stability AI Materials or Derivative Works. Section IV(d), (e), and (g) shall survive the termination of this Agreement.
36
+ g. Governing Law. This Agreement will be governed by and constructed in accordance with the laws of the United States and the State of California without regard to choice of law principles, and the UN Convention on Contracts for International Sale of Goods does not apply to this Agreement.
37
+
38
+ 5. DEFINITIONS
39
+
40
+ “Affiliate(s)” means any entity that directly or indirectly controls, is controlled by, or is under common control with the subject entity; for purposes of this definition, “control” means direct or indirect ownership or control of more than 50% of the voting interests of the subject entity.
41
+
42
+ "Agreement" means this Stability AI Community License Agreement.
43
+
44
+ “AUP” means the Stability AI Acceptable Use Policy available at (https://stability.ai/use-policy), as may be updated from time to time.
45
+
46
+ "Derivative Work(s)” means (a) any derivative work of the Stability AI Materials as recognized by U.S. copyright laws and (b) any modifications to a Model, and any other model created which is based on or derived from the Model or the Model’s output, including “fine tune” and “low-rank adaptation” models derived from a Model or a Model’s output, but do not include the output of any Model.
47
+
48
+ “Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software or Models.
49
+
50
+ “Model(s)" means, collectively, Stability AI’s proprietary models and algorithms, including machine-learning models, trained model weights and other elements of the foregoing listed on Stability’s Core Models Webpage available at (https://stability.ai/core-models), as may be updated from time to time.
51
+
52
+ "Stability AI" or "we" means Stability AI Ltd. and its Affiliates.
53
+
54
+ "Software" means Stability AI’s proprietary software made available under this Agreement now or in the future.
55
+
56
+ “Stability AI Materials” means, collectively, Stability’s proprietary Models, Software and Documentation (and any portion or combination thereof) made available under this Agreement.
57
+
58
+ “Trade Control Laws” means any applicable U.S. and non-U.S. export control and trade sanctions laws and regulations.
Stable-Audio-Open-1.0/README.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: stable-audio-tools
5
+ license: other
6
+ license_name: stable-audio-community
7
+ license_link: LICENSE
8
+ pipeline_tag: text-to-audio
9
+ extra_gated_prompt: By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-audio-open-1.0/blob/main/LICENSE.md)
10
+ and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy).
11
+ extra_gated_fields:
12
+ Name: text
13
+ Email: text
14
+ Country: country
15
+ Organization or Affiliation: text
16
+ Receive email updates and promotions on Stability AI products, services, and research?:
17
+ type: select
18
+ options:
19
+ - 'Yes'
20
+ - 'No'
21
+ What do you intend to use the model for?:
22
+ type: select
23
+ options:
24
+ - Research
25
+ - Personal use
26
+ - Creative Professional
27
+ - Startup
28
+ - Enterprise
29
+ I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
30
+ ---
31
+
32
+ # Stable Audio Open 1.0
33
+
34
+ ![Stable Audio Open logo](./stable_audio_light.png)
35
+
36
+ Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
37
+
38
+ ## Model Description
39
+ `Stable Audio Open 1.0` generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder.
40
+
41
+ ## Usage
42
+
43
+ This model can be used with:
44
+ 1. the [`stable-audio-tools`](https://github.com/Stability-AI/stable-audio-tools) library
45
+ 2. the [`diffusers`](https://huggingface.co/docs/diffusers/main/en/index) library
46
+
47
+
48
+ ### Using with `stable-audio-tools`
49
+
50
+ This model is made to be used with the [`stable-audio-tools`](https://github.com/Stability-AI/stable-audio-tools) library for inference, for example:
51
+
52
+ ```python
53
+ import torch
54
+ import torchaudio
55
+ from einops import rearrange
56
+ from stable_audio_tools import get_pretrained_model
57
+ from stable_audio_tools.inference.generation import generate_diffusion_cond
58
+
59
+ device = "cuda" if torch.cuda.is_available() else "cpu"
60
+
61
+ # Download model
62
+ model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
63
+ sample_rate = model_config["sample_rate"]
64
+ sample_size = model_config["sample_size"]
65
+
66
+ model = model.to(device)
67
+
68
+ # Set up text and timing conditioning
69
+ conditioning = [{
70
+ "prompt": "128 BPM tech house drum loop",
71
+ "seconds_start": 0,
72
+ "seconds_total": 30
73
+ }]
74
+
75
+ # Generate stereo audio
76
+ output = generate_diffusion_cond(
77
+ model,
78
+ steps=100,
79
+ cfg_scale=7,
80
+ conditioning=conditioning,
81
+ sample_size=sample_size,
82
+ sigma_min=0.3,
83
+ sigma_max=500,
84
+ sampler_type="dpmpp-3m-sde",
85
+ device=device
86
+ )
87
+
88
+ # Rearrange audio batch to a single sequence
89
+ output = rearrange(output, "b d n -> d (b n)")
90
+
91
+ # Peak normalize, clip, convert to int16, and save to file
92
+ output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
93
+ torchaudio.save("output.wav", output, sample_rate)
94
+ ```
95
+
96
+ ## Using with `diffusers`
97
+
98
+ Make sure you upgrade to the latest version of diffusers: `pip install -U diffusers`. And then you can run:
99
+
100
+ ```py
101
+ import torch
102
+ import soundfile as sf
103
+ from diffusers import StableAudioPipeline
104
+
105
+ pipe = StableAudioPipeline.from_pretrained("stabilityai/stable-audio-open-1.0", torch_dtype=torch.float16)
106
+ pipe = pipe.to("cuda")
107
+
108
+ # define the prompts
109
+ prompt = "The sound of a hammer hitting a wooden surface."
110
+ negative_prompt = "Low quality."
111
+
112
+ # set the seed for generator
113
+ generator = torch.Generator("cuda").manual_seed(0)
114
+
115
+ # run the generation
116
+ audio = pipe(
117
+ prompt,
118
+ negative_prompt=negative_prompt,
119
+ num_inference_steps=200,
120
+ audio_end_in_s=10.0,
121
+ num_waveforms_per_prompt=3,
122
+ generator=generator,
123
+ ).audios
124
+
125
+ output = audio[0].T.float().cpu().numpy()
126
+ sf.write("hammer.wav", output, pipe.vae.sampling_rate)
127
+
128
+ ```
129
+ Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index) for more details on optimization and usage.
130
+
131
+
132
+
133
+
134
+ ## Model Details
135
+ * **Model type**: `Stable Audio Open 1.0` is a latent diffusion model based on a transformer architecture.
136
+ * **Language(s)**: English
137
+ * **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-open-1.0/blob/main/LICENSE.md).
138
+ * **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
139
+ * **Research Paper**: [https://arxiv.org/abs/2407.14358](https://arxiv.org/abs/2407.14358)
140
+
141
+ ## Training dataset
142
+
143
+ ### Datasets Used
144
+ Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model ([t5-base](https://huggingface.co/google-t5/t5-base)) for text conditioning.
145
+
146
+ ### Attribution
147
+ Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found on our [attribution page](https://info.stability.ai/attributions).
148
+
149
+ ### Mitigations
150
+ We conducted an in-depth analysis to ensure no unauthorized copyrighted music was present in our training data before we began training.
151
+
152
+ To that end, we first identified music samples in Freesound using the [PANNs](https://github.com/qiuqiangkong/audioset_tagging_cnn) music classifier based on AudioSet classes. The identified music samples had at least 30 seconds of music that was predicted to belong to a music-related class with a threshold of 0.15 (PANNs output probabilities range from 0 to 1). This threshold was determined by classifying known music examples from FMA and ensuring no false negatives were present.
153
+
154
+ The identified music samples were sent to Audible Magic’s identification services, a trusted content detection company, to ensure the absence of copyrighted music. Audible Magic flagged suspected copyrighted music, which we subsequently removed before training on the dataset. The majority of the removed content was field recordings in which copyrighted music was playing in the background. Following this procedure, we were left with 266324 CC0, 194840 CC-BY, and 11454 CC Sampling+ audio recordings.
155
+
156
+ We also conducted an in-depth analysis to ensure no copyrighted content was present in FMA's subset. In this case, the procedure was slightly different because the FMA subset consists of music signals. We did a metadata search against a large database of copyrighted music (https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset) and flagged any potential match. The flagged content was reviewed individually by humans. After this process, we ended up with 8967 CC-BY and 4907 CC0 tracks.
157
+
158
+
159
+ ## Use and Limitations
160
+
161
+
162
+ ### Intended Use
163
+ The primary use of Stable Audio Open is research and experimentation on AI-based music and audio generation, including:
164
+
165
+ - Research efforts to better understand the limitations of generative models and further improve the state of science.
166
+ - Generation of music and audio guided by text to explore current abilities of generative AI models by machine learning practitioners and artists.
167
+
168
+
169
+ ### Out-of-Scope Use Cases
170
+ The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate audio or music pieces that create hostile or alienating environments for people.
171
+
172
+
173
+ ### Limitations
174
+ - The model is not able to generate realistic vocals.
175
+ - The model has been trained with English descriptions and will not perform as well in other languages.
176
+ - The model does not perform equally well for all music styles and cultures.
177
+ - The model is better at generating sound effects and field recordings than music.
178
+ - It is sometimes difficult to assess what types of text descriptions provide the best generations. Prompt engineering may be required to obtain satisfying results.
179
+
180
+
181
+ ### Biases
182
+ The source of data is potentially lacking diversity and all cultures are not equally represented in the dataset. The model may not perform equally well on the wide variety of music genres and sound effects that exist. The generated samples from the model will reflect the biases from the training data.