Add files using upload-large-folder tool
Browse files- Apollo-Enhancement/.gitattributes +35 -0
- Apollo-Enhancement/README.md +125 -0
- Apollo-Vocal-MSST/.gitattributes +35 -0
- Apollo-Vocal-MSST/README.md +10 -0
- Apollo-Vocal-MSST/config_apollo_vocals_ep_54.yaml +31 -0
- Aspiration-MelBand-Sucial/.gitattributes +40 -0
- Aspiration-MelBand-Sucial/README.md +30 -0
- Aspiration-MelBand-Sucial/config_aspiration_mel_band_roformer.yaml +77 -0
- BS-RoFormer-Anvuew/.gitattributes +35 -0
- BS-RoFormer-Anvuew/README.md +5 -0
- BS-RoFormer-Anvuew/config.yaml +129 -0
- Dereverb-Echo-MelBand-Sucial/.gitattributes +55 -0
- Dereverb-Echo-MelBand-Sucial/README.md +85 -0
- Dereverb-Echo-MelBand-Sucial/config_dereverb-echo_mel_band_roformer.yaml +77 -0
- Dereverb-Echo-MelBand-Sucial/config_dereverb_echo_mbr_v2.yaml +65 -0
- Dereverb-MelBand-Anvuew/.gitattributes +35 -0
- Dereverb-MelBand-Anvuew/README.md +15 -0
- Dereverb-MelBand-Anvuew/dereverb_mel_band_roformer_anvuew.yaml +76 -0
- Dereverb-Room-Anvuew/.gitattributes +35 -0
- Dereverb-Room-Anvuew/README.md +23 -0
- Dereverb-Room-Anvuew/dereverb_room_anvuew.yaml +132 -0
- Karaoke-BS-RoFormer-Anvuew/.gitattributes +35 -0
- Karaoke-BS-RoFormer-Anvuew/README.md +5 -0
- Karaoke-BS-RoFormer-Anvuew/karaoke_bs_roformer_anvuew.yaml +131 -0
- MedleyVox-MultiSinger/.gitattributes +55 -0
- MedleyVox-MultiSinger/README.md +156 -0
- MedleyVox-MultiSinger/multi_singing_librispeech/loss_graph_vocals.png +0 -0
- MedleyVox-MultiSinger/multi_singing_librispeech/vocals.json +642 -0
- MedleyVox-MultiSinger/multi_singing_librispeech_138/loss_graph_vocals.png +0 -0
- MedleyVox-MultiSinger/multi_singing_librispeech_138/vocals.json +812 -0
- MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/loss_graph_vocals.png +0 -0
- MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/vocals.json +1321 -0
- MedleyVox-MultiSinger/singing_librispeech_iSRNet/loss_graph_vocals.png +0 -0
- MedleyVox-MultiSinger/singing_librispeech_iSRNet/vocals.json +1180 -0
- MedleyVox-MultiSinger/vocal 231/loss_graph_vocals.png +0 -0
- MelBand-Roformer-Deux-Becruily/.gitattributes +35 -0
- MelBand-Roformer-Deux-Becruily/README.md +8 -0
- MelBand-Roformer-Deux-Becruily/config_deux_becruily.yaml +64 -0
- MelBandRoformer-Original/.gitattributes +35 -0
- MelBandRoformer-Original/README.md +3 -0
- MelBandRoformers/.gitattributes +35 -0
- MelBandRoformers/bsroformers/karaoke_bs_roformer.yaml +129 -0
- MelBandRoformers/melbandroformers/instrumental/inst_gabox.yaml +51 -0
- MelBandRoformers/melbandroformers/instrumental/v10.yaml +73 -0
- MelBandRoformers/melbandroformers/karaoke/karaokegabox_1750911344.yaml +72 -0
- MelBandRoformers/melbandroformers/vocals/voc_gabox.yaml +51 -0
- Single_Models/ZFTurbo/Vocals/config_vocals_htdemucs.yaml +123 -0
- Single_Models/ZFTurbo/Vocals/config_vocals_mdx23c.yaml +54 -0
- Stable-Audio-Open-1.0/LICENSE.md +58 -0
- Stable-Audio-Open-1.0/README.md +182 -0
Apollo-Enhancement/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Apollo-Enhancement/README.md
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-sa-4.0
|
| 3 |
+
datasets:
|
| 4 |
+
- sebchw/musdb18
|
| 5 |
+
pipeline_tag: audio-to-audio
|
| 6 |
+
tags:
|
| 7 |
+
- music
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
<p align="center">
|
| 11 |
+
<img src="https://cslikai.cn/Apollo/asserts/apollo-logo.png" alt="Logo" width="150"/>
|
| 12 |
+
</p>
|
| 13 |
+
|
| 14 |
+
<p align="center">
|
| 15 |
+
<strong>Kai Li<sup>1,2</sup>, Yi Luo<sup>2</sup></strong><br>
|
| 16 |
+
<strong><sup>1</sup>Tsinghua University, Beijing, China</strong><br>
|
| 17 |
+
<strong><sup>2</sup>Tencent AI Lab, Shenzhen, China</strong><br>
|
| 18 |
+
<a href="#">ArXiv</a> | <a href="https://cslikai.cn/Apollo/">Demo</a>
|
| 19 |
+
|
| 20 |
+
<p align="center">
|
| 21 |
+
<img src="https://visitor-badge.laobi.icu/badge?page_id=JusperLee.Apollo" alt="访客统计" />
|
| 22 |
+
<img src="https://img.shields.io/github/stars/JusperLee/Apollo?style=social" alt="GitHub stars" />
|
| 23 |
+
<img alt="Static Badge" src="https://img.shields.io/badge/license-CC%20BY--SA%204.0-blue">
|
| 24 |
+
</p>
|
| 25 |
+
|
| 26 |
+
<p align="center">
|
| 27 |
+
|
| 28 |
+
# Apollo: Band-sequence Modeling for High-Quality Music Restoration in Compressed Audio
|
| 29 |
+
|
| 30 |
+
## 📖 Abstract
|
| 31 |
+
|
| 32 |
+
Apollo is a novel music restoration method designed to address distortions and artefacts caused by audio codecs, especially at low bitrates. Operating in the frequency domain, Apollo uses a frequency band-split module, band-sequence modeling, and frequency band reconstruction to restore the audio quality of **MP3-compressed music**. It divides the spectrogram into sub-bands, extracts gain-shape representations, and models both sub-band and temporal information for high-quality audio recovery. Trained with a Generative Adversarial Network (GAN), Apollo outperforms existing SR-GAN models on the **MUSDB18-HQ and MoisesDB** datasets, excelling in complex multi-instrument and vocal scenarios, while maintaining efficiency.
|
| 33 |
+
|
| 34 |
+
## 🔥 News
|
| 35 |
+
|
| 36 |
+
- [2024.09.10] Apollo is now available on [ArXiv](#) and [Demo](https://cslikai.cn/Apollo/).
|
| 37 |
+
- [2024.09.106] Apollo checkpoints and pre-trained models are available for download.
|
| 38 |
+
|
| 39 |
+
## ⚡️ Installation
|
| 40 |
+
|
| 41 |
+
clone the repository
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
git clone https://github.com/JusperLee/Apollo.git && cd Apollo
|
| 45 |
+
conda create --name look2hear --file look2hear.yml
|
| 46 |
+
conda activate look2hear
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## 🖥️ Usage
|
| 50 |
+
|
| 51 |
+
### 🗂️ Datasets
|
| 52 |
+
|
| 53 |
+
Apollo is trained on the MUSDB18-HQ and MoisesDB datasets. To download the datasets, run the following commands:
|
| 54 |
+
|
| 55 |
+
```bash
|
| 56 |
+
wget https://zenodo.org/records/3338373/files/musdb18hq.zip?download=1
|
| 57 |
+
wget https://ds-website-downloads.55c2710389d9da776875002a7d018e59.r2.cloudflarestorage.com/moisesdb.zip
|
| 58 |
+
```
|
| 59 |
+
During data preprocessing, we drew inspiration from music separation techniques and implemented the following steps:
|
| 60 |
+
|
| 61 |
+
1. **Source Activity Detection (SAD):**
|
| 62 |
+
We used a Source Activity Detector (SAD) to remove silent regions from the audio tracks, retaining only the significant portions for training.
|
| 63 |
+
|
| 64 |
+
2. **Data Augmentation:**
|
| 65 |
+
We performed real-time data augmentation by mixing tracks from different songs. For each mix, we randomly selected between 1 and 8 stems from the 11 available tracks, extracting 3-second clips from each selected stem. These clips were scaled in energy by a random factor within the range of [-10, 10] dB relative to their original levels. The selected clips were then summed together to create simulated mixed music.
|
| 66 |
+
|
| 67 |
+
3. **Simulating Dynamic Bitrate Compression:**
|
| 68 |
+
We simulated various bitrate scenarios by applying MP3 codecs with bitrates of [24000, 32000, 48000, 64000, 96000, 128000].
|
| 69 |
+
|
| 70 |
+
4. **Rescaling:**
|
| 71 |
+
To ensure consistency across all samples, we rescaled both the target and the encoded audio based on their maximum absolute values.
|
| 72 |
+
|
| 73 |
+
5. **Saving as HDF5:**
|
| 74 |
+
After preprocessing, all data (including the source stems, mixed tracks, and compressed audio) was saved in HDF5 format, making it easy to load for training and evaluation purposes.
|
| 75 |
+
|
| 76 |
+
### 🚀 Training
|
| 77 |
+
To train the Apollo model, run the following command:
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
python train.py --conf_dir=configs/apollo.yml
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### 🎨 Evaluation
|
| 84 |
+
To evaluate the Apollo model, run the following command:
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
python inference.py --in_wav=assets/input.wav --out_wav=assets/output.wav
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
## 📊 Results
|
| 91 |
+
|
| 92 |
+
*Here, you can include a brief overview of the performance metrics or results that Apollo achieves using different bitrates*
|
| 93 |
+
|
| 94 |
+

|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
*Different methods' SDR/SI-SNR/VISQOL scores for various types of music, as well as the number of model parameters and GPU inference time. For the GPU inference time test, a music signal with a sampling rate of 44.1 kHz and a length of 1 second was used.*
|
| 98 |
+

|
| 99 |
+
|
| 100 |
+
## License
|
| 101 |
+
|
| 102 |
+
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
|
| 103 |
+
|
| 104 |
+
## Acknowledgements
|
| 105 |
+
|
| 106 |
+
Apollo is developed by the **Look2Hear** at Tsinghua University.
|
| 107 |
+
|
| 108 |
+
## Citation
|
| 109 |
+
|
| 110 |
+
If you use Apollo in your research or project, please cite the following paper:
|
| 111 |
+
|
| 112 |
+
```
|
| 113 |
+
@article{li2024apollo,
|
| 114 |
+
title={Apollo: Band-sequence Modeling for High-Quality Music Restoration in Compressed Audio},
|
| 115 |
+
author={Li, Kai and Luo, Yi},
|
| 116 |
+
journal={xxxxxx},
|
| 117 |
+
year={2024}
|
| 118 |
+
}
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
## Contact
|
| 122 |
+
|
| 123 |
+
For any questions or feedback regarding Apollo, feel free to reach out to us via email: `tsinghua.kaili@gmail.com`
|
| 124 |
+
|
| 125 |
+
|
Apollo-Vocal-MSST/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Apollo-Vocal-MSST/README.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
Apollo Offical GitHub:https://github.com/JusperLee/Apollo
|
| 5 |
+
|
| 6 |
+
Apollo is a novel music restoration method designed to address distortions and artefacts caused by audio codecs, especially at low bitrates. Operating in the frequency domain, Apollo uses a frequency band-split module, band-sequence modeling, and frequency band reconstruction to restore the audio quality of MP3-compressed music. It divides the spectrogram into sub-bands, extracts gain-shape representations, and models both sub-band and temporal information for high-quality audio recovery. Trained with a Generative Adversarial Network (GAN), Apollo outperforms existing SR-GAN models on the MUSDB18-HQ and MoisesDB datasets, excelling in complex multi-instrument and vocal scenarios, while maintaining efficiency.
|
| 7 |
+
|
| 8 |
+
The open-sourced content includes models for inference at https://github.com/ZFTurbo/Music-Source-Separation-Training and the original weights with fewer training steps. The training was conducted using sucial's project at https://github.com/SUC-DriverOld/Apollo-Training, with a 92-hour high-quality vocal dataset trained for 1 million steps.
|
| 9 |
+
|
| 10 |
+

|
Apollo-Vocal-MSST/config_apollo_vocals_ep_54.yaml
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 441000
|
| 3 |
+
min_mean_abs: 0.0
|
| 4 |
+
num_channels: 2
|
| 5 |
+
sample_rate: 44100
|
| 6 |
+
augmentations:
|
| 7 |
+
enable: false
|
| 8 |
+
inference:
|
| 9 |
+
batch_size: 1
|
| 10 |
+
num_overlap: 4
|
| 11 |
+
model:
|
| 12 |
+
feature_dim: 384
|
| 13 |
+
layer: 8
|
| 14 |
+
sr: 44100
|
| 15 |
+
win: 20
|
| 16 |
+
training:
|
| 17 |
+
batch_size: 1
|
| 18 |
+
coarse_loss_clip: true
|
| 19 |
+
grad_clip: 0
|
| 20 |
+
instruments:
|
| 21 |
+
- restored
|
| 22 |
+
- addition
|
| 23 |
+
lr: 1.0
|
| 24 |
+
num_epochs: 1000
|
| 25 |
+
num_steps: 1000
|
| 26 |
+
optimizer: prodigy
|
| 27 |
+
patience: 2
|
| 28 |
+
q: 0.95
|
| 29 |
+
reduce_factor: 0.95
|
| 30 |
+
target_instrument: restored
|
| 31 |
+
use_amp: true
|
Aspiration-MelBand-Sucial/.gitattributes
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
example_audio/example_aspiration_less_aggr.wav filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
example_audio/example_aspiration.wav filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
example_audio/example_other_less_aggr.wav filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
example_audio/example_other.wav filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
example_audio/example_raw.wav filter=lfs diff=lfs merge=lfs -text
|
Aspiration-MelBand-Sucial/README.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
You can try listening to the performance of this model [here](https://huggingface.co/Sucial/Aspiration_Mel_Band_Roformer/tree/main/example_audio)
|
| 6 |
+
|
| 7 |
+
How to use the model?<br>
|
| 8 |
+
Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
|
| 9 |
+
|
| 10 |
+
Description: The model is used to separate aspiration, which will be useful for mixing to some mixrs.<br>
|
| 11 |
+
Instruments: aspiration, other<br>
|
| 12 |
+
Dataset: My own datasets(171 songs for training and 17 songs for validation).<br>
|
| 13 |
+
Metrics: Based on the SDR of 17 songs for validation.<br>
|
| 14 |
+
Finetuned from: `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`<br>
|
| 15 |
+
Configs: [config_aspiration_mel_band_roformer.yaml](./config_aspiration_mel_band_roformer.yaml)
|
| 16 |
+
|
| 17 |
+
Model: [aspiration_mel_band_roformer_sdr_18.9845.ckpt](./aspiration_mel_band_roformer_sdr_18.9845.ckpt)<br>
|
| 18 |
+
Epoch: 123<br>
|
| 19 |
+
Instr SDR aspiration: 9.8554<br>
|
| 20 |
+
Instr SDR other: 28.1136<br>
|
| 21 |
+
SDR Avg: 18.9845<br>
|
| 22 |
+
|
| 23 |
+
Model: [aspiration_mel_band_roformer_less_aggr_sdr_18.1201.ckpt](./aspiration_mel_band_roformer_less_aggr_sdr_18.1201.ckpt)<br>
|
| 24 |
+
Epoch: 27<br>
|
| 25 |
+
Instr SDR aspiration: 9.0704<br>
|
| 26 |
+
Instr SDR other: 27.1699<br>
|
| 27 |
+
SDR Avg: 18.1201<br>
|
| 28 |
+
|
| 29 |
+
Training logs:
|
| 30 |
+

|
Aspiration-MelBand-Sucial/config_aspiration_mel_band_roformer.yaml
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801 # don't work (use in model)
|
| 5 |
+
hop_length: 441 # don't work (use in model)
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 8
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 2
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
num_bands: 60
|
| 20 |
+
dim_head: 64
|
| 21 |
+
heads: 8
|
| 22 |
+
attn_dropout: 0.1
|
| 23 |
+
ff_dropout: 0.1
|
| 24 |
+
flash_attn: True
|
| 25 |
+
dim_freqs_in: 1025
|
| 26 |
+
sample_rate: 44100 # needed for mel filter bank from librosa
|
| 27 |
+
stft_n_fft: 2048
|
| 28 |
+
stft_hop_length: 441
|
| 29 |
+
stft_win_length: 2048
|
| 30 |
+
stft_normalized: False
|
| 31 |
+
mask_estimator_depth: 2
|
| 32 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 33 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 34 |
+
- 4096
|
| 35 |
+
- 2048
|
| 36 |
+
- 1024
|
| 37 |
+
- 512
|
| 38 |
+
- 256
|
| 39 |
+
multi_stft_hop_size: 147
|
| 40 |
+
multi_stft_normalized: False
|
| 41 |
+
|
| 42 |
+
training:
|
| 43 |
+
batch_size: 1
|
| 44 |
+
gradient_accumulation_steps: 8
|
| 45 |
+
grad_clip: 0
|
| 46 |
+
instruments:
|
| 47 |
+
- aspiration
|
| 48 |
+
- other
|
| 49 |
+
lr: 4.0e-05
|
| 50 |
+
patience: 2
|
| 51 |
+
reduce_factor: 0.95
|
| 52 |
+
target_instrument: null
|
| 53 |
+
num_epochs: 1000
|
| 54 |
+
num_steps: 1000
|
| 55 |
+
q: 0.95
|
| 56 |
+
coarse_loss_clip: true
|
| 57 |
+
ema_momentum: 0.999
|
| 58 |
+
optimizer: adam
|
| 59 |
+
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
|
| 60 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 61 |
+
|
| 62 |
+
augmentations:
|
| 63 |
+
enable: true # enable or disable all augmentations (to fast disable if needed)
|
| 64 |
+
loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
|
| 65 |
+
loudness_min: 0.5
|
| 66 |
+
loudness_max: 1.5
|
| 67 |
+
mixup: false # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
|
| 68 |
+
mixup_probs: !!python/tuple # 2 additional stems of the same type (1st with prob 0.2, 2nd with prob 0.02)
|
| 69 |
+
- 0.2
|
| 70 |
+
- 0.02
|
| 71 |
+
mixup_loudness_min: 0.5
|
| 72 |
+
mixup_loudness_max: 1.5
|
| 73 |
+
|
| 74 |
+
inference:
|
| 75 |
+
batch_size: 4
|
| 76 |
+
dim_t: 801
|
| 77 |
+
num_overlap: 2
|
BS-RoFormer-Anvuew/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
BS-RoFormer-Anvuew/README.md
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
dataset by [bascurtiz](https://github.com/bascurtiz)
|
BS-RoFormer-Anvuew/config.yaml
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 960000
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801 # don't work (use in model)
|
| 5 |
+
hop_length: 441 # don't work (use in model)
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.0001
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 12
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
freqs_per_bands: !!python/tuple
|
| 20 |
+
- 2
|
| 21 |
+
- 2
|
| 22 |
+
- 2
|
| 23 |
+
- 2
|
| 24 |
+
- 2
|
| 25 |
+
- 2
|
| 26 |
+
- 2
|
| 27 |
+
- 2
|
| 28 |
+
- 2
|
| 29 |
+
- 2
|
| 30 |
+
- 2
|
| 31 |
+
- 2
|
| 32 |
+
- 2
|
| 33 |
+
- 2
|
| 34 |
+
- 2
|
| 35 |
+
- 2
|
| 36 |
+
- 2
|
| 37 |
+
- 2
|
| 38 |
+
- 2
|
| 39 |
+
- 2
|
| 40 |
+
- 2
|
| 41 |
+
- 2
|
| 42 |
+
- 2
|
| 43 |
+
- 2
|
| 44 |
+
- 4
|
| 45 |
+
- 4
|
| 46 |
+
- 4
|
| 47 |
+
- 4
|
| 48 |
+
- 4
|
| 49 |
+
- 4
|
| 50 |
+
- 4
|
| 51 |
+
- 4
|
| 52 |
+
- 4
|
| 53 |
+
- 4
|
| 54 |
+
- 4
|
| 55 |
+
- 4
|
| 56 |
+
- 12
|
| 57 |
+
- 12
|
| 58 |
+
- 12
|
| 59 |
+
- 12
|
| 60 |
+
- 12
|
| 61 |
+
- 12
|
| 62 |
+
- 12
|
| 63 |
+
- 12
|
| 64 |
+
- 24
|
| 65 |
+
- 24
|
| 66 |
+
- 24
|
| 67 |
+
- 24
|
| 68 |
+
- 24
|
| 69 |
+
- 24
|
| 70 |
+
- 24
|
| 71 |
+
- 24
|
| 72 |
+
- 48
|
| 73 |
+
- 48
|
| 74 |
+
- 48
|
| 75 |
+
- 48
|
| 76 |
+
- 48
|
| 77 |
+
- 48
|
| 78 |
+
- 48
|
| 79 |
+
- 48
|
| 80 |
+
- 128
|
| 81 |
+
- 129
|
| 82 |
+
dim_head: 64
|
| 83 |
+
heads: 8
|
| 84 |
+
attn_dropout: 0.0
|
| 85 |
+
ff_dropout: 0.0
|
| 86 |
+
flash_attn: true
|
| 87 |
+
dim_freqs_in: 1025
|
| 88 |
+
stft_n_fft: 2048
|
| 89 |
+
stft_hop_length: 512
|
| 90 |
+
stft_win_length: 2048
|
| 91 |
+
stft_normalized: false
|
| 92 |
+
mask_estimator_depth: 2
|
| 93 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 94 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 95 |
+
- 4096
|
| 96 |
+
- 2048
|
| 97 |
+
- 1024
|
| 98 |
+
- 512
|
| 99 |
+
- 256
|
| 100 |
+
multi_stft_hop_size: 147
|
| 101 |
+
multi_stft_normalized: False
|
| 102 |
+
mlp_expansion_factor: 4
|
| 103 |
+
use_torch_checkpoint: True
|
| 104 |
+
skip_connection: False
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
training:
|
| 108 |
+
batch_size: 1
|
| 109 |
+
gradient_accumulation_steps: 1
|
| 110 |
+
grad_clip: 0
|
| 111 |
+
instruments: ['vocals', 'instrument']
|
| 112 |
+
lr: 1.0e-5
|
| 113 |
+
patience: 5
|
| 114 |
+
reduce_factor: 0.9
|
| 115 |
+
target_instrument: vocals
|
| 116 |
+
num_epochs: 1000
|
| 117 |
+
num_steps: 1000
|
| 118 |
+
q: 0.95
|
| 119 |
+
coarse_loss_clip: true
|
| 120 |
+
ema_momentum: 0.999
|
| 121 |
+
optimizer: adam
|
| 122 |
+
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
|
| 123 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
inference:
|
| 127 |
+
batch_size: 2
|
| 128 |
+
dim_t: 1876
|
| 129 |
+
num_overlap: 4
|
Dereverb-Echo-MelBand-Sucial/.gitattributes
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
examples/example_dry.wav filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
examples/example_other.wav filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
examples/example_raw.wav filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
examples/other_v1.wav filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
examples/raw.wav filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
examples/dry_v1.wav filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
examples/dry_v2.wav filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
examples/other_v2.wav filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
example/de_super_big_reverb_mbr_ep_346/model_super_reverb_dry.flac filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
example/de_super_big_reverb_mbr_ep_346/model_super_reverb_other.flac filter=lfs diff=lfs merge=lfs -text
|
| 46 |
+
example/de_super_big_reverb_mbr_ep_346/raw.flac filter=lfs diff=lfs merge=lfs -text
|
| 47 |
+
example/dereverb_echo_mbr_fused_model/model_fused_reverb_dry.flac filter=lfs diff=lfs merge=lfs -text
|
| 48 |
+
example/dereverb_echo_mbr_fused_model/model_fused_reverb_other.flac filter=lfs diff=lfs merge=lfs -text
|
| 49 |
+
example/dereverb_echo_mbr_fused_model/raw.flac filter=lfs diff=lfs merge=lfs -text
|
| 50 |
+
example/dereverb_echo_mbr_v2_sdr_dry_13.4843/dry_v2.wav filter=lfs diff=lfs merge=lfs -text
|
| 51 |
+
example/dereverb_echo_mbr_v2_sdr_dry_13.4843/other_v2.wav filter=lfs diff=lfs merge=lfs -text
|
| 52 |
+
example/dereverb_echo_mbr_v2_sdr_dry_13.4843/raw.wav filter=lfs diff=lfs merge=lfs -text
|
| 53 |
+
example/dereverb-echo_mbr_v1_sdr_10.0169/dry_v1.wav filter=lfs diff=lfs merge=lfs -text
|
| 54 |
+
example/dereverb-echo_mbr_v1_sdr_10.0169/other_v1.wav filter=lfs diff=lfs merge=lfs -text
|
| 55 |
+
example/dereverb-echo_mbr_v1_sdr_10.0169/raw.wav filter=lfs diff=lfs merge=lfs -text
|
Dereverb-Echo-MelBand-Sucial/README.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## Description
|
| 6 |
+
|
| 7 |
+
These models are used to separate reverb and delay effects in vocals. In addition, **these models also have the ability to remove most of the harmonies.** I added random high cut after the reverberation and delay effects in the dataset, so these model's handling of high frequencies is not particularly aggressive.<br>
|
| 8 |
+
You can try listening to the performance of these models [here](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/tree/main/example)!
|
| 9 |
+
|
| 10 |
+
## How to use the model?
|
| 11 |
+
|
| 12 |
+
Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
|
| 13 |
+
|
| 14 |
+
## Models
|
| 15 |
+
|
| 16 |
+
### ===Note: The following models are only effective for vocals!===
|
| 17 |
+
|
| 18 |
+
### 1. Fused Models(I personally recommend using this model)
|
| 19 |
+
|
| 20 |
+
I used [a model fusion script](https://huggingface.co/Sucial/Dereverb-Echo_Mel_Band_Roformer/blob/main/scripts/model_fusion.py) to fuse three models with the same model structure. The three models and their corresponding fusion ratios are as follows:<br>
|
| 21 |
+
**0.5 * dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt + 0.25 * de_big_reverb_mbr_ep_362.ckpt + 0.25 * de_super_big_reverb_mbr_ep_346.ckpt**<br>
|
| 22 |
+
Therefore, the fused model has the ability to remove both small and large reverberations simultaneously. However, I did not carefully adjust the fusion ratio of each model. If any experts are willing to help me adjust it carefully, I would be very grateful!
|
| 23 |
+
|
| 24 |
+
config: the same as v2 models and big reverb models: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
|
| 25 |
+
fused_model: [dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt](./dereverb_echo_mbr_fused_0.5_v2_0.25_big_0.25_super.ckpt)
|
| 26 |
+
|
| 27 |
+
### 2. Big reverb Models
|
| 28 |
+
|
| 29 |
+
There are two models for removing large reverberation in total: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt) and [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt). In general, for large reverberations, using the `de_big_reverb_mbr` model is sufficient. The `de_super_big_reverb_mbr` model is trained for extremely large reverberations and is generally less commonly used. The configuration files of these two models and the v2 model share the same configuration file. And they are all finetuned from `dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt`.
|
| 30 |
+
|
| 31 |
+
config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
|
| 32 |
+
Model_de_big_reverb: [de_big_reverb_mbr_ep_362.ckpt](./de_big_reverb_mbr_ep_362.ckpt)<br>
|
| 33 |
+
Model_de_super_big_reverb: [de_super_big_reverb_mbr_ep_346.ckpt](./de_super_big_reverb_mbr_ep_346.ckpt)
|
| 34 |
+
|
| 35 |
+
In order to better validate the model's performance, I have added two indicators, `f0_fitness` and `uv_fitness`, as follows:<br>
|
| 36 |
+
Calculate the F0 and voiced/unvoiced (UV) fitness between a reference and an estimated audio signal. These two metrics are only of reference value for vocals.<br>
|
| 37 |
+
The F0 fitness measures how similar the fundamental frequency (F0) of the reference and estimated signals are, while the UV fitness evaluates the accuracy of voiced/unvoiced detection between the two signals. Both are computed by extracting F0 and UV information using pitch analysis and then calculating the Pearson correlation between the corresponding F0 and UV sequences. The F0 fitness can also be used to compare the completeness of the extracted fundamental frequency (F0) for human voice signals. The values of these two metrics are both -1 to 1, and the closer the value is to 1, the better the fit.
|
| 38 |
+
|
| 39 |
+
For these two models, I used different validation sets for verification (so SDR has no practical reference significance), and the validation results are as follows:
|
| 40 |
+
```
|
| 41 |
+
de_big_reverb_mbr_ep_362.ckpt
|
| 42 |
+
Num overlap: 2
|
| 43 |
+
Instr dry sdr: 14.0030 (Std: 2.9492)
|
| 44 |
+
Instr dry bleedless: 43.6501 (Std: 10.1362)
|
| 45 |
+
Instr dry fullness: 21.7776 (Std: 5.9445)
|
| 46 |
+
Instr dry f0_fitness: 0.8405 (Std: 0.1520)
|
| 47 |
+
Instr dry uv_fitness: 0.9759 (Std: 0.0162)
|
| 48 |
+
|
| 49 |
+
de_super_big_reverb_mbr_ep_346.ckpt
|
| 50 |
+
Num overlap: 2
|
| 51 |
+
Instr dry sdr: 11.3164 (Std: 2.4877)
|
| 52 |
+
Instr dry bleedless: 43.3989 (Std: 10.7918)
|
| 53 |
+
Instr dry fullness: 17.5554 (Std: 4.0178)
|
| 54 |
+
Instr dry f0_fitness: 0.7845 (Std: 0.1864)
|
| 55 |
+
Instr dry uv_fitness: 0.9662 (Std: 0.0172)
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
### 3. V2 Models
|
| 59 |
+
|
| 60 |
+
Config: [config_dereverb_echo_mbr_v2.yaml](./config_dereverb_echo_mbr_v2.yaml)<br>
|
| 61 |
+
Model: [dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt](./dereverb_echo_mbr_v2_sdr_dry_13.4843.ckpt)<br>
|
| 62 |
+
Instr dry sdr: 13.4843 (Std: 4.8675)
|
| 63 |
+
|
| 64 |
+
Finetuned from: `dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt`<br>
|
| 65 |
+
Used 1000+ songs to Finetune.
|
| 66 |
+
|
| 67 |
+
### 4. V1 Models
|
| 68 |
+
|
| 69 |
+
Configs: [config_dereverb-echo_mel_band_roformer.yaml](./config_dereverb-echo_mel_band_roformer.yaml)<br>
|
| 70 |
+
Model: [dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt](./dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt)<br>
|
| 71 |
+
Instr dry sdr: 13.1507, Instr other sdr: 6.8830, Metric avg sdr: 10.0169
|
| 72 |
+
|
| 73 |
+
Instruments: [dry, other]<br>
|
| 74 |
+
Finetuned from: `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`<br>
|
| 75 |
+
Datasets:
|
| 76 |
+
- Training datasets: 270 songs from [opencpop](https://github.com/wenet-e2e/opencpop) and [GTSinger](https://github.com/GTSinger/GTSinger)
|
| 77 |
+
- Validation datasets: 30 songs from my own collection
|
| 78 |
+
- All random reverbs and delay effects are generated by [this python script](./scripts/create_reverb_delay.py) and sorted into the mustb18 dataset format.
|
| 79 |
+
|
| 80 |
+
## Thanks
|
| 81 |
+
|
| 82 |
+
- Mel-Band-Roformer [[Paper](https://arxiv.org/abs/2310.01809), [Repository](https://github.com/lucidrains/BS-RoFormer)]
|
| 83 |
+
- [ZFTurbo](https://github.com/ZFTurbo)'s training code [[Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)]
|
| 84 |
+
- [CN17161](https://github.com/CN17161) provided GPUs.
|
| 85 |
+
- [Glucy-2](https://github.com/Glucy-2) provided technical assistance.
|
Dereverb-Echo-MelBand-Sucial/config_dereverb-echo_mel_band_roformer.yaml
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801 # don't work (use in model)
|
| 5 |
+
hop_length: 441 # don't work (use in model)
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 8
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 2
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
num_bands: 60
|
| 20 |
+
dim_head: 64
|
| 21 |
+
heads: 8
|
| 22 |
+
attn_dropout: 0.1
|
| 23 |
+
ff_dropout: 0.1
|
| 24 |
+
flash_attn: True
|
| 25 |
+
dim_freqs_in: 1025
|
| 26 |
+
sample_rate: 44100 # needed for mel filter bank from librosa
|
| 27 |
+
stft_n_fft: 2048
|
| 28 |
+
stft_hop_length: 441
|
| 29 |
+
stft_win_length: 2048
|
| 30 |
+
stft_normalized: False
|
| 31 |
+
mask_estimator_depth: 2
|
| 32 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 33 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 34 |
+
- 4096
|
| 35 |
+
- 2048
|
| 36 |
+
- 1024
|
| 37 |
+
- 512
|
| 38 |
+
- 256
|
| 39 |
+
multi_stft_hop_size: 147
|
| 40 |
+
multi_stft_normalized: False
|
| 41 |
+
|
| 42 |
+
training:
|
| 43 |
+
batch_size: 1
|
| 44 |
+
gradient_accumulation_steps: 8
|
| 45 |
+
grad_clip: 0
|
| 46 |
+
instruments:
|
| 47 |
+
- dry
|
| 48 |
+
- other
|
| 49 |
+
lr: 4.0e-05
|
| 50 |
+
patience: 2
|
| 51 |
+
reduce_factor: 0.95
|
| 52 |
+
target_instrument: null
|
| 53 |
+
num_epochs: 1000
|
| 54 |
+
num_steps: 1000
|
| 55 |
+
q: 0.95
|
| 56 |
+
coarse_loss_clip: true
|
| 57 |
+
ema_momentum: 0.999
|
| 58 |
+
optimizer: adam
|
| 59 |
+
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
|
| 60 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 61 |
+
|
| 62 |
+
augmentations:
|
| 63 |
+
enable: true # enable or disable all augmentations (to fast disable if needed)
|
| 64 |
+
loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
|
| 65 |
+
loudness_min: 0.5
|
| 66 |
+
loudness_max: 1.5
|
| 67 |
+
mixup: false # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
|
| 68 |
+
mixup_probs: !!python/tuple # 2 additional stems of the same type (1st with prob 0.2, 2nd with prob 0.02)
|
| 69 |
+
- 0.2
|
| 70 |
+
- 0.02
|
| 71 |
+
mixup_loudness_min: 0.5
|
| 72 |
+
mixup_loudness_max: 1.5
|
| 73 |
+
|
| 74 |
+
inference:
|
| 75 |
+
batch_size: 4
|
| 76 |
+
dim_t: 801
|
| 77 |
+
num_overlap: 4
|
Dereverb-Echo-MelBand-Sucial/config_dereverb_echo_mbr_v2.yaml
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 8
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
num_bands: 60
|
| 20 |
+
dim_head: 64
|
| 21 |
+
heads: 8
|
| 22 |
+
attn_dropout: 0.1
|
| 23 |
+
ff_dropout: 0.1
|
| 24 |
+
flash_attn: True
|
| 25 |
+
dim_freqs_in: 1025
|
| 26 |
+
sample_rate: 44100
|
| 27 |
+
stft_n_fft: 2048
|
| 28 |
+
stft_hop_length: 441
|
| 29 |
+
stft_win_length: 2048
|
| 30 |
+
stft_normalized: False
|
| 31 |
+
mask_estimator_depth: 2
|
| 32 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 33 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 34 |
+
- 4096
|
| 35 |
+
- 2048
|
| 36 |
+
- 1024
|
| 37 |
+
- 512
|
| 38 |
+
- 256
|
| 39 |
+
multi_stft_hop_size: 147
|
| 40 |
+
multi_stft_normalized: False
|
| 41 |
+
|
| 42 |
+
training:
|
| 43 |
+
batch_size: 1
|
| 44 |
+
gradient_accumulation_steps: 8
|
| 45 |
+
grad_clip: 0
|
| 46 |
+
instruments:
|
| 47 |
+
- dry
|
| 48 |
+
- other
|
| 49 |
+
lr: 1.0e-05
|
| 50 |
+
patience: 2
|
| 51 |
+
reduce_factor: 0.95
|
| 52 |
+
target_instrument: dry
|
| 53 |
+
num_epochs: 1000
|
| 54 |
+
num_steps: 1000
|
| 55 |
+
q: 0.95
|
| 56 |
+
coarse_loss_clip: true
|
| 57 |
+
ema_momentum: 0.999
|
| 58 |
+
optimizer: adam
|
| 59 |
+
other_fix: false
|
| 60 |
+
use_amp: true
|
| 61 |
+
|
| 62 |
+
inference:
|
| 63 |
+
batch_size: 1
|
| 64 |
+
dim_t: 801
|
| 65 |
+
num_overlap: 4
|
Dereverb-MelBand-Anvuew/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Dereverb-MelBand-Anvuew/README.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
专用于人声的去混响模型,适用于 https://github.com/ZFTurbo/Music-Source-Separation-Training
|
| 6 |
+
|
| 7 |
+
微调自[kim的人声乐器分离模型](https://huggingface.co/KimberleyJSN/melbandroformer),所以保留了一些分离人声与乐器的能力
|
| 8 |
+
|
| 9 |
+
训练数据的干声为单声道单人演唱/说话,所以模型会尽可能使分离后的干声为单声道,导致在分离一些人声为立体声或者经过叠轨等处理的音频时可能会出问题,`dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt`是使用为这些情况的样本挑出来的`dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt`训练中期的权重
|
| 10 |
+
|
| 11 |
+
`dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt`与`dereverb_mel_band_roformer_less_aggressive_anvuew_sdr_18.8050.ckpt`两个权重由于训练代码[bug](https://github.com/ZFTurbo/Music-Source-Separation-Training/commit/0ca5691f22ea71d9afe297926d6e1517cdb38e55),在训练时混响与人声实际并未对齐,所以这两个模型表现出一定程度的去除人声与乐器分离后的残留(通常是弦乐)以及部分和声(非中央声道)的能力
|
| 12 |
+
|
| 13 |
+
`dereverb_mel_band_roformer_mono_anvuew_sdr_20.4029.ckpt`则因修复了对齐bug,去混响的能力更强了,但是去人声乐器分离残留以及和声的能力下降了
|
| 14 |
+
|
| 15 |
+
|
Dereverb-MelBand-Anvuew/dereverb_mel_band_roformer_anvuew.yaml
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 256
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 384
|
| 13 |
+
depth: 6
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
num_bands: 60
|
| 19 |
+
dim_head: 64
|
| 20 |
+
heads: 8
|
| 21 |
+
attn_dropout: 0
|
| 22 |
+
ff_dropout: 0
|
| 23 |
+
flash_attn: True
|
| 24 |
+
dim_freqs_in: 1025
|
| 25 |
+
sample_rate: 44100 # needed for mel filter bank from librosa
|
| 26 |
+
stft_n_fft: 2048
|
| 27 |
+
stft_hop_length: 441
|
| 28 |
+
stft_win_length: 2048
|
| 29 |
+
stft_normalized: False
|
| 30 |
+
mask_estimator_depth: 2
|
| 31 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 32 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 33 |
+
- 4096
|
| 34 |
+
- 2048
|
| 35 |
+
- 1024
|
| 36 |
+
- 512
|
| 37 |
+
- 256
|
| 38 |
+
multi_stft_hop_size: 147
|
| 39 |
+
multi_stft_normalized: False
|
| 40 |
+
|
| 41 |
+
training:
|
| 42 |
+
batch_size: 3
|
| 43 |
+
gradient_accumulation_steps: 1
|
| 44 |
+
grad_clip: 0
|
| 45 |
+
instruments:
|
| 46 |
+
- noreverb
|
| 47 |
+
- reverb
|
| 48 |
+
lr: 5.0e-05
|
| 49 |
+
patience: 2
|
| 50 |
+
reduce_factor: 0.95
|
| 51 |
+
target_instrument: noreverb
|
| 52 |
+
num_epochs: 1000
|
| 53 |
+
num_steps: 4000
|
| 54 |
+
q: 0.95
|
| 55 |
+
coarse_loss_clip: false
|
| 56 |
+
ema_momentum: 0.999
|
| 57 |
+
optimizer: adamw
|
| 58 |
+
other_fix: true # it's needed for checking on multisong dataset if other is actually instrumental
|
| 59 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 60 |
+
|
| 61 |
+
augmentations:
|
| 62 |
+
enable: true # enable or disable all augmentations (to fast disable if needed)
|
| 63 |
+
loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
|
| 64 |
+
loudness_min: 0.1
|
| 65 |
+
loudness_max: 1.0
|
| 66 |
+
mixup: false # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
|
| 67 |
+
mixup_probs: !!python/tuple # 2 additional stems of the same type (1st with prob 0.2, 2nd with prob 0.02)
|
| 68 |
+
- 0.2
|
| 69 |
+
- 0.02
|
| 70 |
+
mixup_loudness_min: 0.5
|
| 71 |
+
mixup_loudness_max: 1.5
|
| 72 |
+
|
| 73 |
+
inference:
|
| 74 |
+
batch_size: 1
|
| 75 |
+
dim_t: 801
|
| 76 |
+
num_overlap: 2
|
Dereverb-Room-Anvuew/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Dereverb-Room-Anvuew/README.md
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
---
|
| 4 |
+
A dereverb model specifically for mono vocal room reverb.
|
| 5 |
+
|
| 6 |
+
**Model type:** `bs_roformer`
|
| 7 |
+
**Channels:** mono
|
| 8 |
+
**Reverb in training data:** only convolutional reverbs, generated with [pyroomacoustics](https://github.com/LCAV/pyroomacoustics)
|
| 9 |
+
**Example:**
|
| 10 |
+
- input.flac
|
| 11 |
+
<audio controls>
|
| 12 |
+
<source src="https://huggingface.co/anvuew/dereverb_room/resolve/main/example/input.flac" type="audio/flac">
|
| 13 |
+
</audio>
|
| 14 |
+
- noreverb.flac
|
| 15 |
+
<audio controls>
|
| 16 |
+
<source src="https://huggingface.co/anvuew/dereverb_room/resolve/main/example/noreverb.flac" type="audio/flac">
|
| 17 |
+
</audio>
|
| 18 |
+
- reverb.flac
|
| 19 |
+
<audio controls>
|
| 20 |
+
<source src="https://huggingface.co/anvuew/dereverb_room/resolve/main/example/reverb.flac" type="audio/flac">
|
| 21 |
+
</audio>
|
| 22 |
+
|
| 23 |
+
for refercence [dereverb_mel_band_roformer_mono](https://huggingface.co/anvuew/dereverb_mel_band_roformer/blob/main/dereverb_mel_band_roformer_mono_anvuew_sdr_20.4029.ckpt) got SDR: 7.6685 on same valid set.
|
Dereverb-Room-Anvuew/dereverb_room_anvuew.yaml
ADDED
|
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 384000
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801 # don't work (use in model)
|
| 5 |
+
hop_length: 441 # don't work (use in model)
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 1
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 128
|
| 13 |
+
depth: 16
|
| 14 |
+
stereo: false
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
freqs_per_bands: !!python/tuple
|
| 20 |
+
- 2
|
| 21 |
+
- 2
|
| 22 |
+
- 2
|
| 23 |
+
- 2
|
| 24 |
+
- 2
|
| 25 |
+
- 2
|
| 26 |
+
- 3
|
| 27 |
+
- 3
|
| 28 |
+
- 3
|
| 29 |
+
- 3
|
| 30 |
+
- 3
|
| 31 |
+
- 4
|
| 32 |
+
- 4
|
| 33 |
+
- 4
|
| 34 |
+
- 4
|
| 35 |
+
- 4
|
| 36 |
+
- 5
|
| 37 |
+
- 5
|
| 38 |
+
- 5
|
| 39 |
+
- 5
|
| 40 |
+
- 6
|
| 41 |
+
- 6
|
| 42 |
+
- 6
|
| 43 |
+
- 6
|
| 44 |
+
- 7
|
| 45 |
+
- 7
|
| 46 |
+
- 7
|
| 47 |
+
- 8
|
| 48 |
+
- 8
|
| 49 |
+
- 8
|
| 50 |
+
- 9
|
| 51 |
+
- 9
|
| 52 |
+
- 10
|
| 53 |
+
- 10
|
| 54 |
+
- 11
|
| 55 |
+
- 12
|
| 56 |
+
- 13
|
| 57 |
+
- 14
|
| 58 |
+
- 15
|
| 59 |
+
- 16
|
| 60 |
+
- 17
|
| 61 |
+
- 18
|
| 62 |
+
- 19
|
| 63 |
+
- 20
|
| 64 |
+
- 21
|
| 65 |
+
- 22
|
| 66 |
+
- 23
|
| 67 |
+
- 24
|
| 68 |
+
- 25
|
| 69 |
+
- 27
|
| 70 |
+
- 29
|
| 71 |
+
- 31
|
| 72 |
+
- 33
|
| 73 |
+
- 35
|
| 74 |
+
- 37
|
| 75 |
+
- 39
|
| 76 |
+
- 41
|
| 77 |
+
- 43
|
| 78 |
+
- 45
|
| 79 |
+
- 48
|
| 80 |
+
- 52
|
| 81 |
+
- 57
|
| 82 |
+
- 64
|
| 83 |
+
dim_head: 16
|
| 84 |
+
heads: 8
|
| 85 |
+
attn_dropout: 0.0
|
| 86 |
+
ff_dropout: 0.0
|
| 87 |
+
flash_attn: true
|
| 88 |
+
dim_freqs_in: 1025
|
| 89 |
+
stft_n_fft: 2048
|
| 90 |
+
stft_hop_length: 512
|
| 91 |
+
stft_win_length: 2048
|
| 92 |
+
stft_normalized: False
|
| 93 |
+
mask_estimator_depth: 3
|
| 94 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 95 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 96 |
+
- 4096
|
| 97 |
+
- 2048
|
| 98 |
+
- 1024
|
| 99 |
+
- 512
|
| 100 |
+
- 256
|
| 101 |
+
multi_stft_hop_size: 147
|
| 102 |
+
multi_stft_normalized: False
|
| 103 |
+
mlp_expansion_factor: 4
|
| 104 |
+
use_torch_checkpoint: True
|
| 105 |
+
skip_connection: False
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
training:
|
| 109 |
+
batch_size: 4
|
| 110 |
+
gradient_accumulation_steps: 1
|
| 111 |
+
grad_clip: 1000.0
|
| 112 |
+
instruments: ['noreverb', 'reverb']
|
| 113 |
+
lr: 5.0e-5
|
| 114 |
+
patience: 5
|
| 115 |
+
reduce_factor: 0.75
|
| 116 |
+
target_instrument: noreverb
|
| 117 |
+
num_epochs: 1000
|
| 118 |
+
num_steps: 1000
|
| 119 |
+
q: 0.95
|
| 120 |
+
coarse_loss_clip: true
|
| 121 |
+
ema_momentum: 0.999
|
| 122 |
+
optimizer: adam
|
| 123 |
+
|
| 124 |
+
other_fix: False # it's needed for checking on multisong dataset if other is actually instrumental
|
| 125 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
inference:
|
| 130 |
+
batch_size: 1
|
| 131 |
+
dim_t: 871
|
| 132 |
+
num_overlap: 2
|
Karaoke-BS-RoFormer-Anvuew/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Karaoke-BS-RoFormer-Anvuew/README.md
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
dataset by [becruily](https://huggingface.co/becruily)
|
Karaoke-BS-RoFormer-Anvuew/karaoke_bs_roformer_anvuew.yaml
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 640000
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801 # don't work (use in model)
|
| 5 |
+
hop_length: 441 # don't work (use in model)
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 12
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
freqs_per_bands: !!python/tuple
|
| 20 |
+
- 2
|
| 21 |
+
- 2
|
| 22 |
+
- 2
|
| 23 |
+
- 2
|
| 24 |
+
- 2
|
| 25 |
+
- 2
|
| 26 |
+
- 2
|
| 27 |
+
- 2
|
| 28 |
+
- 2
|
| 29 |
+
- 2
|
| 30 |
+
- 2
|
| 31 |
+
- 2
|
| 32 |
+
- 2
|
| 33 |
+
- 2
|
| 34 |
+
- 2
|
| 35 |
+
- 2
|
| 36 |
+
- 2
|
| 37 |
+
- 2
|
| 38 |
+
- 2
|
| 39 |
+
- 2
|
| 40 |
+
- 2
|
| 41 |
+
- 2
|
| 42 |
+
- 2
|
| 43 |
+
- 2
|
| 44 |
+
- 4
|
| 45 |
+
- 4
|
| 46 |
+
- 4
|
| 47 |
+
- 4
|
| 48 |
+
- 4
|
| 49 |
+
- 4
|
| 50 |
+
- 4
|
| 51 |
+
- 4
|
| 52 |
+
- 4
|
| 53 |
+
- 4
|
| 54 |
+
- 4
|
| 55 |
+
- 4
|
| 56 |
+
- 12
|
| 57 |
+
- 12
|
| 58 |
+
- 12
|
| 59 |
+
- 12
|
| 60 |
+
- 12
|
| 61 |
+
- 12
|
| 62 |
+
- 12
|
| 63 |
+
- 12
|
| 64 |
+
- 24
|
| 65 |
+
- 24
|
| 66 |
+
- 24
|
| 67 |
+
- 24
|
| 68 |
+
- 24
|
| 69 |
+
- 24
|
| 70 |
+
- 24
|
| 71 |
+
- 24
|
| 72 |
+
- 48
|
| 73 |
+
- 48
|
| 74 |
+
- 48
|
| 75 |
+
- 48
|
| 76 |
+
- 48
|
| 77 |
+
- 48
|
| 78 |
+
- 48
|
| 79 |
+
- 48
|
| 80 |
+
- 128
|
| 81 |
+
- 129
|
| 82 |
+
dim_head: 64
|
| 83 |
+
heads: 8
|
| 84 |
+
attn_dropout: 0.0
|
| 85 |
+
ff_dropout: 0.0
|
| 86 |
+
flash_attn: true
|
| 87 |
+
dim_freqs_in: 1025
|
| 88 |
+
stft_n_fft: 2048
|
| 89 |
+
stft_hop_length: 512
|
| 90 |
+
stft_win_length: 2048
|
| 91 |
+
stft_normalized: false
|
| 92 |
+
mask_estimator_depth: 2
|
| 93 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 94 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 95 |
+
- 4096
|
| 96 |
+
- 2048
|
| 97 |
+
- 1024
|
| 98 |
+
- 512
|
| 99 |
+
- 256
|
| 100 |
+
multi_stft_hop_size: 147
|
| 101 |
+
multi_stft_normalized: False
|
| 102 |
+
mlp_expansion_factor: 4
|
| 103 |
+
use_torch_checkpoint: True
|
| 104 |
+
skip_connection: False
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
training:
|
| 108 |
+
batch_size: 1
|
| 109 |
+
gradient_accumulation_steps: 1
|
| 110 |
+
grad_clip: 0
|
| 111 |
+
instruments: ['Vocals', 'Instrumental']
|
| 112 |
+
lr: 5.0e-5
|
| 113 |
+
patience: 7
|
| 114 |
+
reduce_factor: 0.75
|
| 115 |
+
target_instrument: Vocals
|
| 116 |
+
num_epochs: 1000
|
| 117 |
+
num_steps: 1000
|
| 118 |
+
q: 0.95
|
| 119 |
+
coarse_loss_clip: true
|
| 120 |
+
ema_momentum: 0.999
|
| 121 |
+
optimizer: adam
|
| 122 |
+
other_fix: False # it's needed for checking on multisong dataset if other is actually instrumental
|
| 123 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
inference:
|
| 129 |
+
batch_size: 2
|
| 130 |
+
dim_t: 1251
|
| 131 |
+
num_overlap: 4
|
MedleyVox-MultiSinger/.gitattributes
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
vocals[[:space:]]135/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
vocals[[:space:]]163/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
vocals[[:space:]]188/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
vocals[[:space:]]200/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
singing_librispeech_iSRNet/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
eval_results/singing_librispeech_iSRNet/examples/ex_68/mixture.wav filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
eval_results/singing_librispeech_iSRNet/examples/ex_68/s0_estimate.wav filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
eval_results/singing_librispeech_iSRNet/examples/ex_68/s0.wav filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
eval_results/singing_librispeech_iSRNet/examples/ex_68/s1_estimate.wav filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
eval_results/singing_librispeech_iSRNet/examples/ex_68/s1.wav filter=lfs diff=lfs merge=lfs -text
|
| 46 |
+
eval_results/singing_librispeech/examples/ex_69/mixture.wav filter=lfs diff=lfs merge=lfs -text
|
| 47 |
+
eval_results/singing_librispeech/examples/ex_69/s0_estimate.wav filter=lfs diff=lfs merge=lfs -text
|
| 48 |
+
eval_results/singing_librispeech/examples/ex_69/s0.wav filter=lfs diff=lfs merge=lfs -text
|
| 49 |
+
eval_results/singing_librispeech/examples/ex_69/s1_estimate.wav filter=lfs diff=lfs merge=lfs -text
|
| 50 |
+
eval_results/singing_librispeech/examples/ex_69/s1.wav filter=lfs diff=lfs merge=lfs -text
|
| 51 |
+
vocal[[:space:]]231/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 52 |
+
vocals[[:space:]]238/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 53 |
+
singing_librispeech_ft_iSRNet/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 54 |
+
multi_singing_librispeech/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
| 55 |
+
multi_singing_librispeech_138/vocals.chkpnt filter=lfs diff=lfs merge=lfs -text
|
MedleyVox-MultiSinger/README.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
library_name: asteroid
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Model Card for Model ID
|
| 7 |
+
|
| 8 |
+
This model aims to separate duets, unisons, or any other number of voices from a given audio track.
|
| 9 |
+
|
| 10 |
+
## Model Details
|
| 11 |
+
|
| 12 |
+
### Model Description
|
| 13 |
+
|
| 14 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 15 |
+
|
| 16 |
+
- **Developed by:** Carson Evans
|
| 17 |
+
- **Model type:** Audio Separation
|
| 18 |
+
- **License:** Creative Commons Attribution 4.0
|
| 19 |
+
|
| 20 |
+
### Model Sources [optional]
|
| 21 |
+
|
| 22 |
+
<!-- Provide the basic links for the model. -->
|
| 23 |
+
|
| 24 |
+
- **Repository:** https://github.com/CBeast25/MedleyVox
|
| 25 |
+
- **Paper:** https://arxiv.org/abs/2211.07302
|
| 26 |
+
- **Demo:** https://catnip-leaf-c6a.notion.site/Audio-Samples-of-MedleyVox-An-Evaluation-Dataset-for-Multiple-Singing-Voices-Separation-30074b2c88d24f46b68d9293f6095962
|
| 27 |
+
|
| 28 |
+
## Uses
|
| 29 |
+
|
| 30 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 31 |
+
|
| 32 |
+
### Direct Use
|
| 33 |
+
|
| 34 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 35 |
+
|
| 36 |
+
[More Information Needed]
|
| 37 |
+
|
| 38 |
+
### Downstream Use [optional]
|
| 39 |
+
|
| 40 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 41 |
+
|
| 42 |
+
[More Information Needed]
|
| 43 |
+
|
| 44 |
+
### Out-of-Scope Use
|
| 45 |
+
|
| 46 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 47 |
+
|
| 48 |
+
[More Information Needed]
|
| 49 |
+
|
| 50 |
+
## Bias, Risks, and Limitations
|
| 51 |
+
|
| 52 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 53 |
+
|
| 54 |
+
[More Information Needed]
|
| 55 |
+
|
| 56 |
+
### Recommendations
|
| 57 |
+
|
| 58 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 59 |
+
|
| 60 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 61 |
+
|
| 62 |
+
## How to Get Started with the Model
|
| 63 |
+
|
| 64 |
+
Use the code below to get started with the model.
|
| 65 |
+
|
| 66 |
+
[More Information Needed]
|
| 67 |
+
|
| 68 |
+
## Training Details
|
| 69 |
+
|
| 70 |
+
### Training Data
|
| 71 |
+
|
| 72 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 73 |
+
|
| 74 |
+
[More Information Needed]
|
| 75 |
+
|
| 76 |
+
### Training Procedure
|
| 77 |
+
|
| 78 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 79 |
+
|
| 80 |
+
#### Preprocessing
|
| 81 |
+
|
| 82 |
+
[More Information Needed]
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
#### Training Hyperparameters
|
| 86 |
+
|
| 87 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 88 |
+
|
| 89 |
+
#### Speeds, Sizes, Times [optional]
|
| 90 |
+
|
| 91 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 92 |
+
|
| 93 |
+
[More Information Needed]
|
| 94 |
+
|
| 95 |
+
## Evaluation
|
| 96 |
+
|
| 97 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 98 |
+
|
| 99 |
+
### Testing Data, Factors & Metrics
|
| 100 |
+
|
| 101 |
+
#### Testing Data
|
| 102 |
+
|
| 103 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 104 |
+
|
| 105 |
+
[More Information Needed]
|
| 106 |
+
|
| 107 |
+
#### Factors
|
| 108 |
+
|
| 109 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 110 |
+
|
| 111 |
+
[More Information Needed]
|
| 112 |
+
|
| 113 |
+
#### Metrics
|
| 114 |
+
|
| 115 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 116 |
+
|
| 117 |
+
[More Information Needed]
|
| 118 |
+
|
| 119 |
+
### Results
|
| 120 |
+
|
| 121 |
+
[More Information Needed]
|
| 122 |
+
|
| 123 |
+
#### Summary
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
## Environmental Impact
|
| 127 |
+
|
| 128 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 129 |
+
|
| 130 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 131 |
+
|
| 132 |
+
- **Hardware Type:** [More Information Needed]
|
| 133 |
+
- **Hours used:** [More Information Needed]
|
| 134 |
+
- **Cloud Provider:** [More Information Needed]
|
| 135 |
+
- **Compute Region:** [More Information Needed]
|
| 136 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 137 |
+
|
| 138 |
+
## Citation [optional]
|
| 139 |
+
|
| 140 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 141 |
+
|
| 142 |
+
**BibTeX:**
|
| 143 |
+
|
| 144 |
+
[More Information Needed]
|
| 145 |
+
|
| 146 |
+
**APA:**
|
| 147 |
+
|
| 148 |
+
[More Information Needed]
|
| 149 |
+
|
| 150 |
+
## Glossary [optional]
|
| 151 |
+
|
| 152 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 153 |
+
|
| 154 |
+
## Model Card Contact
|
| 155 |
+
|
| 156 |
+
carson.evans@colostate.edu
|
MedleyVox-MultiSinger/multi_singing_librispeech/loss_graph_vocals.png
ADDED
|
MedleyVox-MultiSinger/multi_singing_librispeech/vocals.json
ADDED
|
@@ -0,0 +1,642 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"args": {
|
| 3 |
+
"above_freq": 300.0,
|
| 4 |
+
"architecture": "conv_tasnet_stft",
|
| 5 |
+
"batch_size": 58,
|
| 6 |
+
"beta1": 0.5,
|
| 7 |
+
"beta2": 0.9,
|
| 8 |
+
"bn_chan": 256,
|
| 9 |
+
"continual_train": false,
|
| 10 |
+
"dataset": "multi_singing_librispeech",
|
| 11 |
+
"db_normalize": false,
|
| 12 |
+
"ema": true,
|
| 13 |
+
"encoder_activation": null,
|
| 14 |
+
"entity": "carson2050",
|
| 15 |
+
"epochs": 200,
|
| 16 |
+
"eps": 1e-08,
|
| 17 |
+
"exp_name": "multi_singing_librispeech",
|
| 18 |
+
"ff_activation": "relu",
|
| 19 |
+
"gpu": 0,
|
| 20 |
+
"gradient_clip": null,
|
| 21 |
+
"hid_chan": 1024,
|
| 22 |
+
"load_ema_online_model": false,
|
| 23 |
+
"lr": 0.0002,
|
| 24 |
+
"lr_decay_gamma": 0.5,
|
| 25 |
+
"lr_decay_patience": 20,
|
| 26 |
+
"lr_scheduler": "step_lr",
|
| 27 |
+
"mask_act": "linear",
|
| 28 |
+
"max_n_src": 4,
|
| 29 |
+
"min_n_src": 2,
|
| 30 |
+
"mixed_precision": false,
|
| 31 |
+
"mixture_consistency": "mixture_consistency",
|
| 32 |
+
"multi_spec_loss_log_scale": false,
|
| 33 |
+
"n_blocks": 6,
|
| 34 |
+
"n_filter": 512,
|
| 35 |
+
"n_kernel": 512,
|
| 36 |
+
"n_nodes": 1,
|
| 37 |
+
"n_repeats": 4,
|
| 38 |
+
"n_src": 2,
|
| 39 |
+
"nb_workers": 8,
|
| 40 |
+
"nfft": 2048,
|
| 41 |
+
"ngpus_per_node": 1,
|
| 42 |
+
"nhop": 512,
|
| 43 |
+
"no_cuda": false,
|
| 44 |
+
"no_mask": false,
|
| 45 |
+
"no_mask_residual": false,
|
| 46 |
+
"optimizer": "adam",
|
| 47 |
+
"output": "results/singing_sep/checkpoint/multi_singing_librispeech",
|
| 48 |
+
"output_directory": "results/singing_sep",
|
| 49 |
+
"part_of_data": null,
|
| 50 |
+
"patience": 50,
|
| 51 |
+
"pitch_formant_augment_prob": 0.4,
|
| 52 |
+
"port": null,
|
| 53 |
+
"project": "MedleyVox_home_pt2",
|
| 54 |
+
"quiet": false,
|
| 55 |
+
"rank": 0,
|
| 56 |
+
"reduced_training_data_ratio": 1.0,
|
| 57 |
+
"resume": "results/singing_sep/checkpoint/multi_singing_librispeech",
|
| 58 |
+
"same_singer_dict_path": [
|
| 59 |
+
[
|
| 60 |
+
"../data/24k/OpenSinger",
|
| 61 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
|
| 62 |
+
"OpenSinger"
|
| 63 |
+
],
|
| 64 |
+
[
|
| 65 |
+
"../data/24k/k_multisinger",
|
| 66 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
|
| 67 |
+
"k_multisinger"
|
| 68 |
+
],
|
| 69 |
+
[
|
| 70 |
+
"../data/24k/CSD",
|
| 71 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
|
| 72 |
+
"CSD"
|
| 73 |
+
],
|
| 74 |
+
[
|
| 75 |
+
"../data/24k/jsut-song_ver1",
|
| 76 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
|
| 77 |
+
"jsut-song_ver1"
|
| 78 |
+
],
|
| 79 |
+
[
|
| 80 |
+
"../data/24k/jvs_music_ver1",
|
| 81 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
|
| 82 |
+
"jvs_music_ver1"
|
| 83 |
+
],
|
| 84 |
+
[
|
| 85 |
+
"../data/24k/k_multitimbre",
|
| 86 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
|
| 87 |
+
"k_multitimbre"
|
| 88 |
+
],
|
| 89 |
+
[
|
| 90 |
+
"../data/24k/kiritan_revised",
|
| 91 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
|
| 92 |
+
"kiritan"
|
| 93 |
+
],
|
| 94 |
+
[
|
| 95 |
+
"../data/24k/musdb_a_train",
|
| 96 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
|
| 97 |
+
"musdb_a_train"
|
| 98 |
+
],
|
| 99 |
+
[
|
| 100 |
+
"../data/24k/NUS",
|
| 101 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
|
| 102 |
+
"NUS"
|
| 103 |
+
],
|
| 104 |
+
[
|
| 105 |
+
"../data/24k/VocalSet",
|
| 106 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
|
| 107 |
+
"VocalSet"
|
| 108 |
+
]
|
| 109 |
+
],
|
| 110 |
+
"same_singer_ratio": 0.2,
|
| 111 |
+
"same_song_dict_path": [
|
| 112 |
+
[
|
| 113 |
+
"../data/24k/k_multisinger",
|
| 114 |
+
"./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
|
| 115 |
+
"k_multisinger"
|
| 116 |
+
]
|
| 117 |
+
],
|
| 118 |
+
"same_song_ratio": 0.2,
|
| 119 |
+
"same_speaker_dict_path": [
|
| 120 |
+
[
|
| 121 |
+
"../data/24k/LibriSpeech_train-clean-100",
|
| 122 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
|
| 123 |
+
"LibriSpeech_train-clean-100"
|
| 124 |
+
],
|
| 125 |
+
[
|
| 126 |
+
"../data/24k/LibriSpeech_train-clean-360",
|
| 127 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
|
| 128 |
+
"LibriSpeech_train-clean-360"
|
| 129 |
+
]
|
| 130 |
+
],
|
| 131 |
+
"same_speaker_ratio": 0.15,
|
| 132 |
+
"sample_rate": 24000,
|
| 133 |
+
"seed": 777,
|
| 134 |
+
"seq_dur": 3.0,
|
| 135 |
+
"sing_sing_ratio": 0.15,
|
| 136 |
+
"sing_speech_ratio": 0.15,
|
| 137 |
+
"skip_chan": 256,
|
| 138 |
+
"song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
|
| 139 |
+
"speech_train_root": [
|
| 140 |
+
"../data/24k/LibriSpeech_train-clean-360",
|
| 141 |
+
"../data/24k/LibriSpeech_train-clean-100"
|
| 142 |
+
],
|
| 143 |
+
"sr_input_res": false,
|
| 144 |
+
"sr_out_mix_consistency": false,
|
| 145 |
+
"srnet": "orig",
|
| 146 |
+
"start_from_best": false,
|
| 147 |
+
"sweep": false,
|
| 148 |
+
"target": "vocals",
|
| 149 |
+
"train_loss_func": [
|
| 150 |
+
"pit_snr",
|
| 151 |
+
"multi_spectral_l1"
|
| 152 |
+
],
|
| 153 |
+
"train_root": [
|
| 154 |
+
"../data/24k/CSD",
|
| 155 |
+
"../data/24k/NUS",
|
| 156 |
+
"../data/24k/TONAS",
|
| 157 |
+
"../data/24k/VocalSet",
|
| 158 |
+
"../data/24k/jsut-song_ver1",
|
| 159 |
+
"../data/24k/jvs_music_ver1",
|
| 160 |
+
"../data/24k/kiritan_revised",
|
| 161 |
+
"../data/24k/vocadito",
|
| 162 |
+
"../data/24k/musdb_a_train",
|
| 163 |
+
"../data/24k/OpenSinger",
|
| 164 |
+
"../data/24k/medleyDB_v1_in_musdb",
|
| 165 |
+
"../data/24k/k_multisinger",
|
| 166 |
+
"../data/24k/k_multitimbre"
|
| 167 |
+
],
|
| 168 |
+
"unison_prob": 0.3,
|
| 169 |
+
"use_wandb": true,
|
| 170 |
+
"valid_loss_func": [
|
| 171 |
+
"pit_si_sdr"
|
| 172 |
+
],
|
| 173 |
+
"valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
|
| 174 |
+
"valid_root": [
|
| 175 |
+
[
|
| 176 |
+
"../data/24k/musdb_a_test",
|
| 177 |
+
"../data/24k/musdb_a_test",
|
| 178 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
|
| 179 |
+
"sing_sing_diff"
|
| 180 |
+
],
|
| 181 |
+
[
|
| 182 |
+
"../data/24k/musdb_a_test",
|
| 183 |
+
"../data/24k/musdb_a_test",
|
| 184 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
|
| 185 |
+
"sing_sing_unison"
|
| 186 |
+
],
|
| 187 |
+
[
|
| 188 |
+
"../data/24k/musdb_a_test",
|
| 189 |
+
"../data/24k/musdb_a_test",
|
| 190 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
|
| 191 |
+
"sing_sing_same_singer"
|
| 192 |
+
],
|
| 193 |
+
[
|
| 194 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 195 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 196 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
|
| 197 |
+
"speech_speech_diff"
|
| 198 |
+
],
|
| 199 |
+
[
|
| 200 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 201 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 202 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
|
| 203 |
+
"speech_speech_unison"
|
| 204 |
+
],
|
| 205 |
+
[
|
| 206 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 207 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 208 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
|
| 209 |
+
"speech_speech_same_speaker"
|
| 210 |
+
],
|
| 211 |
+
[
|
| 212 |
+
"../data/24k/musdb_a_test",
|
| 213 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 214 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
|
| 215 |
+
"singing_speech"
|
| 216 |
+
]
|
| 217 |
+
],
|
| 218 |
+
"valid_root_orpit": [
|
| 219 |
+
[
|
| 220 |
+
"../data/24k/musdb_a_test",
|
| 221 |
+
"../data/24k/musdb_a_test",
|
| 222 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
|
| 223 |
+
"sing_sing_diff"
|
| 224 |
+
],
|
| 225 |
+
[
|
| 226 |
+
"../data/24k/musdb_a_test",
|
| 227 |
+
"../data/24k/musdb_a_test",
|
| 228 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
|
| 229 |
+
"sing_sing_unison"
|
| 230 |
+
],
|
| 231 |
+
[
|
| 232 |
+
"../data/24k/musdb_a_test",
|
| 233 |
+
"../data/24k/musdb_a_test",
|
| 234 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
|
| 235 |
+
"sing_sing_same_singer"
|
| 236 |
+
],
|
| 237 |
+
[
|
| 238 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 239 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 240 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
|
| 241 |
+
"speech_speech_diff"
|
| 242 |
+
],
|
| 243 |
+
[
|
| 244 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 245 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 246 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
|
| 247 |
+
"speech_speech_unison"
|
| 248 |
+
],
|
| 249 |
+
[
|
| 250 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 251 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 252 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
|
| 253 |
+
"speech_speech_same_speaker"
|
| 254 |
+
],
|
| 255 |
+
[
|
| 256 |
+
"../data/24k/musdb_a_test",
|
| 257 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 258 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
|
| 259 |
+
"singing_speech"
|
| 260 |
+
]
|
| 261 |
+
],
|
| 262 |
+
"weight_decay": 1e-06,
|
| 263 |
+
"world_size": 1
|
| 264 |
+
},
|
| 265 |
+
"best_epoch": 92,
|
| 266 |
+
"best_loss": -7.675145898546491,
|
| 267 |
+
"epochs_trained": 93,
|
| 268 |
+
"num_bad_epochs": 1,
|
| 269 |
+
"train_loss_history": [
|
| 270 |
+
-1.4921716451644897,
|
| 271 |
+
-2.8664329051971436,
|
| 272 |
+
-3.3393054008483887,
|
| 273 |
+
-3.638092517852783,
|
| 274 |
+
-3.879303216934204,
|
| 275 |
+
-4.088993072509766,
|
| 276 |
+
-4.227543830871582,
|
| 277 |
+
-4.412248134613037,
|
| 278 |
+
-4.58026123046875,
|
| 279 |
+
-4.71837043762207,
|
| 280 |
+
-4.800468444824219,
|
| 281 |
+
-4.882855415344238,
|
| 282 |
+
-5.011181831359863,
|
| 283 |
+
-5.128243923187256,
|
| 284 |
+
-5.150334358215332,
|
| 285 |
+
-5.240769386291504,
|
| 286 |
+
-5.357062816619873,
|
| 287 |
+
-5.35420560836792,
|
| 288 |
+
-5.427922248840332,
|
| 289 |
+
-5.536999225616455,
|
| 290 |
+
-5.6000895500183105,
|
| 291 |
+
-5.664849758148193,
|
| 292 |
+
-5.704154968261719,
|
| 293 |
+
-5.791101455688477,
|
| 294 |
+
-5.794349670410156,
|
| 295 |
+
-5.784161567687988,
|
| 296 |
+
-5.824007511138916,
|
| 297 |
+
-5.931461811065674,
|
| 298 |
+
-5.981809139251709,
|
| 299 |
+
-6.045787334442139,
|
| 300 |
+
-6.045494079589844,
|
| 301 |
+
-6.075621128082275,
|
| 302 |
+
-6.086508750915527,
|
| 303 |
+
-6.123781681060791,
|
| 304 |
+
-6.192169666290283,
|
| 305 |
+
-6.248963832855225,
|
| 306 |
+
-6.183308124542236,
|
| 307 |
+
-6.25191593170166,
|
| 308 |
+
-6.301548004150391,
|
| 309 |
+
-6.259702682495117,
|
| 310 |
+
-6.338959217071533,
|
| 311 |
+
-6.372439861297607,
|
| 312 |
+
-6.436537742614746,
|
| 313 |
+
-6.462899684906006,
|
| 314 |
+
-6.449411392211914,
|
| 315 |
+
-6.489621639251709,
|
| 316 |
+
-6.461447238922119,
|
| 317 |
+
-6.558005332946777,
|
| 318 |
+
-6.603482723236084,
|
| 319 |
+
-6.555445194244385,
|
| 320 |
+
-6.579801082611084,
|
| 321 |
+
-6.639071464538574,
|
| 322 |
+
-6.648660182952881,
|
| 323 |
+
-6.6866631507873535,
|
| 324 |
+
-6.767474174499512,
|
| 325 |
+
-6.750443935394287,
|
| 326 |
+
-6.7306742668151855,
|
| 327 |
+
-6.7853617668151855,
|
| 328 |
+
-6.818509101867676,
|
| 329 |
+
-6.761360168457031,
|
| 330 |
+
-6.795668601989746,
|
| 331 |
+
-6.82188606262207,
|
| 332 |
+
-6.795504093170166,
|
| 333 |
+
-6.914917469024658,
|
| 334 |
+
-6.921656131744385,
|
| 335 |
+
-6.950718402862549,
|
| 336 |
+
-6.966548919677734,
|
| 337 |
+
-6.965545654296875,
|
| 338 |
+
-6.964168548583984,
|
| 339 |
+
-6.888548374176025,
|
| 340 |
+
-6.932443141937256,
|
| 341 |
+
-6.930734634399414,
|
| 342 |
+
-6.949597358703613,
|
| 343 |
+
-6.947843551635742,
|
| 344 |
+
-6.959360599517822,
|
| 345 |
+
-6.974522590637207,
|
| 346 |
+
-7.005373954772949,
|
| 347 |
+
-7.039368629455566,
|
| 348 |
+
-7.008696556091309,
|
| 349 |
+
-7.064368724822998,
|
| 350 |
+
-7.038439750671387,
|
| 351 |
+
-7.046519756317139,
|
| 352 |
+
-7.052777290344238,
|
| 353 |
+
-7.06027889251709,
|
| 354 |
+
-7.048835277557373,
|
| 355 |
+
-7.095891952514648,
|
| 356 |
+
-7.080573558807373,
|
| 357 |
+
-7.120383262634277,
|
| 358 |
+
-7.1035075187683105,
|
| 359 |
+
-7.147456645965576,
|
| 360 |
+
-7.133329391479492,
|
| 361 |
+
-7.134939670562744,
|
| 362 |
+
-7.155049800872803
|
| 363 |
+
],
|
| 364 |
+
"train_time_history": [
|
| 365 |
+
4810.419310808182,
|
| 366 |
+
4810.429551362991,
|
| 367 |
+
4780.074353456497,
|
| 368 |
+
4780.08434343338,
|
| 369 |
+
4793.850719213486,
|
| 370 |
+
4793.851686954498,
|
| 371 |
+
4799.262031078339,
|
| 372 |
+
4799.2719786167145,
|
| 373 |
+
4776.265509605408,
|
| 374 |
+
4776.275769710541,
|
| 375 |
+
4800.915772199631,
|
| 376 |
+
4800.925550937653,
|
| 377 |
+
4782.19565987587,
|
| 378 |
+
4870.2729279994965,
|
| 379 |
+
4864.202353715897,
|
| 380 |
+
5526.39341044426,
|
| 381 |
+
5526.402764797211,
|
| 382 |
+
5210.057184457779,
|
| 383 |
+
5210.0663821697235,
|
| 384 |
+
5192.114199876785,
|
| 385 |
+
5192.115474700928,
|
| 386 |
+
5119.568732976913,
|
| 387 |
+
5119.579450130463,
|
| 388 |
+
4854.391019105911,
|
| 389 |
+
4854.4009165763855,
|
| 390 |
+
4825.207883834839,
|
| 391 |
+
4825.218036174774,
|
| 392 |
+
4839.370161294937,
|
| 393 |
+
4839.3797080516815,
|
| 394 |
+
4829.168277978897,
|
| 395 |
+
4829.178178310394,
|
| 396 |
+
4831.754481077194,
|
| 397 |
+
4831.764403104782,
|
| 398 |
+
4840.167069673538,
|
| 399 |
+
4840.1764142513275,
|
| 400 |
+
4839.306309938431,
|
| 401 |
+
4839.315984725952,
|
| 402 |
+
4835.479310274124,
|
| 403 |
+
4835.489530324936,
|
| 404 |
+
4991.815203428268,
|
| 405 |
+
4826.761980772018,
|
| 406 |
+
4826.772101163864,
|
| 407 |
+
4819.3496108055115,
|
| 408 |
+
4819.358952999115,
|
| 409 |
+
4820.984974384308,
|
| 410 |
+
4820.995014190674,
|
| 411 |
+
4820.539752483368,
|
| 412 |
+
4820.548979997635,
|
| 413 |
+
4814.873534917831,
|
| 414 |
+
4814.876268863678,
|
| 415 |
+
4812.354250907898,
|
| 416 |
+
4812.363839626312,
|
| 417 |
+
4822.391925573349,
|
| 418 |
+
4822.40118765831,
|
| 419 |
+
4809.686738491058,
|
| 420 |
+
4809.697638034821,
|
| 421 |
+
4832.5055372715,
|
| 422 |
+
4832.515355587006,
|
| 423 |
+
4831.67563867569,
|
| 424 |
+
4831.685403108597,
|
| 425 |
+
4824.845934391022,
|
| 426 |
+
4824.85514998436,
|
| 427 |
+
4835.57625246048,
|
| 428 |
+
4835.587289094925,
|
| 429 |
+
4817.744952201843,
|
| 430 |
+
4817.7542552948,
|
| 431 |
+
4807.804133653641,
|
| 432 |
+
4807.814810037613,
|
| 433 |
+
4818.521605968475,
|
| 434 |
+
4818.532015800476,
|
| 435 |
+
4981.354954957962,
|
| 436 |
+
4981.368631839752,
|
| 437 |
+
4875.586889028549,
|
| 438 |
+
4875.597553014755,
|
| 439 |
+
4801.111567258835,
|
| 440 |
+
4801.1219182014465,
|
| 441 |
+
4799.074081897736,
|
| 442 |
+
4799.08514547348,
|
| 443 |
+
4809.401276350021,
|
| 444 |
+
4809.41465306282,
|
| 445 |
+
4809.043102502823,
|
| 446 |
+
4809.04475402832,
|
| 447 |
+
4818.2070748806,
|
| 448 |
+
4818.209503889084,
|
| 449 |
+
4796.3679666519165,
|
| 450 |
+
4796.377726793289,
|
| 451 |
+
4794.153427362442,
|
| 452 |
+
4794.155965805054,
|
| 453 |
+
4804.1572597026825,
|
| 454 |
+
4804.168130159378,
|
| 455 |
+
4797.392125368118,
|
| 456 |
+
4797.401923418045,
|
| 457 |
+
4797.116873264313,
|
| 458 |
+
4797.12747836113,
|
| 459 |
+
4799.205674409866,
|
| 460 |
+
4799.215870857239,
|
| 461 |
+
4969.960748910904,
|
| 462 |
+
4969.971879482269,
|
| 463 |
+
5270.599810838699,
|
| 464 |
+
5270.6101796627045,
|
| 465 |
+
4881.989181518555,
|
| 466 |
+
4882.000226974487,
|
| 467 |
+
4867.6136746406555,
|
| 468 |
+
4867.624637126923,
|
| 469 |
+
5128.904933452606,
|
| 470 |
+
5128.915862798691,
|
| 471 |
+
4879.79870891571,
|
| 472 |
+
4879.80947971344,
|
| 473 |
+
4969.744366407394,
|
| 474 |
+
4969.754128456116,
|
| 475 |
+
4907.097052812576,
|
| 476 |
+
4907.107843637466,
|
| 477 |
+
4812.9132516384125,
|
| 478 |
+
4812.9242560863495,
|
| 479 |
+
4815.909214496613,
|
| 480 |
+
4815.920344591141,
|
| 481 |
+
4806.699935913086,
|
| 482 |
+
4806.70260477066,
|
| 483 |
+
4831.170897245407,
|
| 484 |
+
4831.180289506912,
|
| 485 |
+
4839.252681255341,
|
| 486 |
+
4839.262135982513,
|
| 487 |
+
4833.886634111404,
|
| 488 |
+
4833.8898758888245,
|
| 489 |
+
4830.524186134338,
|
| 490 |
+
4830.53564286232,
|
| 491 |
+
4824.74093079567,
|
| 492 |
+
4824.747734546661,
|
| 493 |
+
4818.754670858383,
|
| 494 |
+
4818.764072179794,
|
| 495 |
+
4816.966838121414,
|
| 496 |
+
4816.977759599686,
|
| 497 |
+
4831.582427740097,
|
| 498 |
+
4831.592094898224,
|
| 499 |
+
4804.51261639595,
|
| 500 |
+
4993.473606586456,
|
| 501 |
+
4815.601177692413,
|
| 502 |
+
4815.610737085342,
|
| 503 |
+
4789.788247346878,
|
| 504 |
+
4865.854624032974,
|
| 505 |
+
4865.864605426788,
|
| 506 |
+
5030.177618980408,
|
| 507 |
+
5030.188777208328,
|
| 508 |
+
4769.904754638672,
|
| 509 |
+
4895.086503267288,
|
| 510 |
+
4895.0979063510895,
|
| 511 |
+
4869.957269668579,
|
| 512 |
+
4869.958615779877,
|
| 513 |
+
4962.7930123806,
|
| 514 |
+
4962.803097486496,
|
| 515 |
+
5163.898764133453,
|
| 516 |
+
5163.90244436264,
|
| 517 |
+
4803.290739297867,
|
| 518 |
+
4803.3002672195435,
|
| 519 |
+
4819.002298593521,
|
| 520 |
+
4819.014036178589,
|
| 521 |
+
4812.0083973407745,
|
| 522 |
+
5130.011174440384,
|
| 523 |
+
5130.020927429199,
|
| 524 |
+
5238.748838424683,
|
| 525 |
+
5162.933927536011,
|
| 526 |
+
5162.94544506073,
|
| 527 |
+
5014.217702865601,
|
| 528 |
+
5014.227581739426,
|
| 529 |
+
5119.955267906189,
|
| 530 |
+
5119.966482877731,
|
| 531 |
+
4877.71505856514,
|
| 532 |
+
4947.4076771736145,
|
| 533 |
+
4947.418792486191,
|
| 534 |
+
4980.132425069809,
|
| 535 |
+
4980.143876552582,
|
| 536 |
+
5166.483239412308,
|
| 537 |
+
5166.49423623085,
|
| 538 |
+
4906.088274717331,
|
| 539 |
+
4906.0993638038635,
|
| 540 |
+
4880.329564332962,
|
| 541 |
+
4880.339328289032,
|
| 542 |
+
4873.104112148285,
|
| 543 |
+
4873.119816303253,
|
| 544 |
+
4885.143585205078,
|
| 545 |
+
5015.694309234619
|
| 546 |
+
],
|
| 547 |
+
"valid_loss_history": [
|
| 548 |
+
-3.0233164174216136,
|
| 549 |
+
-4.020770004817417,
|
| 550 |
+
-4.493505137307303,
|
| 551 |
+
-4.805826323372977,
|
| 552 |
+
-5.009723663330078,
|
| 553 |
+
-5.3551515851702005,
|
| 554 |
+
-5.507791314806257,
|
| 555 |
+
-5.709285395486014,
|
| 556 |
+
-5.82812111718314,
|
| 557 |
+
-5.88963999067034,
|
| 558 |
+
-5.970332486288888,
|
| 559 |
+
-5.988547257014683,
|
| 560 |
+
-6.113276481628418,
|
| 561 |
+
-6.132954188755581,
|
| 562 |
+
-6.210943358285086,
|
| 563 |
+
-6.279647214072091,
|
| 564 |
+
-6.3300862312316895,
|
| 565 |
+
-6.36109277180263,
|
| 566 |
+
-6.427222183772495,
|
| 567 |
+
-6.453009741646903,
|
| 568 |
+
-6.489914894104004,
|
| 569 |
+
-6.48867974962507,
|
| 570 |
+
-6.536007336207798,
|
| 571 |
+
-6.532879625047956,
|
| 572 |
+
-6.572577135903495,
|
| 573 |
+
-6.5566478456769675,
|
| 574 |
+
-6.660695620945522,
|
| 575 |
+
-6.70451055254255,
|
| 576 |
+
-6.65756470816476,
|
| 577 |
+
-6.701659406934466,
|
| 578 |
+
-6.7815567425319125,
|
| 579 |
+
-6.804818085261753,
|
| 580 |
+
-6.783689567020962,
|
| 581 |
+
-6.844764641353062,
|
| 582 |
+
-6.868685790470669,
|
| 583 |
+
-6.888231481824603,
|
| 584 |
+
-6.942419528961182,
|
| 585 |
+
-6.951289176940918,
|
| 586 |
+
-6.975076675415039,
|
| 587 |
+
-6.991657052721296,
|
| 588 |
+
-7.000387941087995,
|
| 589 |
+
-7.082024574279785,
|
| 590 |
+
-7.087371553693499,
|
| 591 |
+
-7.114969117300851,
|
| 592 |
+
-7.163520812988281,
|
| 593 |
+
-7.1951784406389505,
|
| 594 |
+
-7.216815676007952,
|
| 595 |
+
-7.201807635171073,
|
| 596 |
+
-7.183896745954241,
|
| 597 |
+
-7.227273804800851,
|
| 598 |
+
-7.232961927141462,
|
| 599 |
+
-7.2955668313162665,
|
| 600 |
+
-7.2690509387425015,
|
| 601 |
+
-7.273542472294399,
|
| 602 |
+
-7.281754766191755,
|
| 603 |
+
-7.311358247484479,
|
| 604 |
+
-7.287418706076486,
|
| 605 |
+
-7.261871746608189,
|
| 606 |
+
-7.2840664727347235,
|
| 607 |
+
-7.316314697265625,
|
| 608 |
+
-7.376913070678711,
|
| 609 |
+
-7.367326668330601,
|
| 610 |
+
-7.438824789864676,
|
| 611 |
+
-7.427623748779297,
|
| 612 |
+
-7.45092739377703,
|
| 613 |
+
-7.4810590744018555,
|
| 614 |
+
-7.43196405683245,
|
| 615 |
+
-7.413298266274588,
|
| 616 |
+
-7.448171275002616,
|
| 617 |
+
-7.470413276127407,
|
| 618 |
+
-7.478131294250488,
|
| 619 |
+
-7.494483879634312,
|
| 620 |
+
-7.459411212376186,
|
| 621 |
+
-7.515866688319615,
|
| 622 |
+
-7.571803229195731,
|
| 623 |
+
-7.573634147644043,
|
| 624 |
+
-7.562024729592459,
|
| 625 |
+
-7.550929818834577,
|
| 626 |
+
-7.53609037399292,
|
| 627 |
+
-7.563671180180141,
|
| 628 |
+
-7.578108038221087,
|
| 629 |
+
-7.531997203826904,
|
| 630 |
+
-7.600094999585833,
|
| 631 |
+
-7.639314787728446,
|
| 632 |
+
-7.65882362638201,
|
| 633 |
+
-7.624989918300083,
|
| 634 |
+
-7.5906588690621515,
|
| 635 |
+
-7.606484276907785,
|
| 636 |
+
-7.59099394934518,
|
| 637 |
+
-7.611005442483084,
|
| 638 |
+
-7.649059908730643,
|
| 639 |
+
-7.675145898546491,
|
| 640 |
+
-7.672644138336182
|
| 641 |
+
]
|
| 642 |
+
}
|
MedleyVox-MultiSinger/multi_singing_librispeech_138/loss_graph_vocals.png
ADDED
|
MedleyVox-MultiSinger/multi_singing_librispeech_138/vocals.json
ADDED
|
@@ -0,0 +1,812 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"args": {
|
| 3 |
+
"above_freq": 300.0,
|
| 4 |
+
"architecture": "conv_tasnet_stft",
|
| 5 |
+
"batch_size": 58,
|
| 6 |
+
"beta1": 0.5,
|
| 7 |
+
"beta2": 0.9,
|
| 8 |
+
"bn_chan": 256,
|
| 9 |
+
"continual_train": false,
|
| 10 |
+
"dataset": "multi_singing_librispeech",
|
| 11 |
+
"db_normalize": false,
|
| 12 |
+
"ema": true,
|
| 13 |
+
"encoder_activation": null,
|
| 14 |
+
"entity": "carson2050",
|
| 15 |
+
"epochs": 200,
|
| 16 |
+
"eps": 1e-08,
|
| 17 |
+
"exp_name": "multi_singing_librispeech",
|
| 18 |
+
"ff_activation": "relu",
|
| 19 |
+
"gpu": 0,
|
| 20 |
+
"gradient_clip": null,
|
| 21 |
+
"hid_chan": 1024,
|
| 22 |
+
"load_ema_online_model": false,
|
| 23 |
+
"lr": 0.0002,
|
| 24 |
+
"lr_decay_gamma": 0.5,
|
| 25 |
+
"lr_decay_patience": 20,
|
| 26 |
+
"lr_scheduler": "step_lr",
|
| 27 |
+
"mask_act": "linear",
|
| 28 |
+
"max_n_src": 4,
|
| 29 |
+
"min_n_src": 2,
|
| 30 |
+
"mixed_precision": false,
|
| 31 |
+
"mixture_consistency": "mixture_consistency",
|
| 32 |
+
"multi_spec_loss_log_scale": false,
|
| 33 |
+
"n_blocks": 6,
|
| 34 |
+
"n_filter": 512,
|
| 35 |
+
"n_kernel": 512,
|
| 36 |
+
"n_nodes": 1,
|
| 37 |
+
"n_repeats": 4,
|
| 38 |
+
"n_src": 2,
|
| 39 |
+
"nb_workers": 8,
|
| 40 |
+
"nfft": 2048,
|
| 41 |
+
"ngpus_per_node": 1,
|
| 42 |
+
"nhop": 512,
|
| 43 |
+
"no_cuda": false,
|
| 44 |
+
"no_mask": false,
|
| 45 |
+
"no_mask_residual": false,
|
| 46 |
+
"optimizer": "adam",
|
| 47 |
+
"output": "results/singing_sep/checkpoint/multi_singing_librispeech",
|
| 48 |
+
"output_directory": "results/singing_sep",
|
| 49 |
+
"part_of_data": null,
|
| 50 |
+
"patience": 50,
|
| 51 |
+
"pitch_formant_augment_prob": 0.4,
|
| 52 |
+
"port": null,
|
| 53 |
+
"project": "MedleyVox_home_pt2",
|
| 54 |
+
"quiet": false,
|
| 55 |
+
"rank": 0,
|
| 56 |
+
"reduced_training_data_ratio": 1.0,
|
| 57 |
+
"resume": "results/singing_sep/checkpoint/multi_singing_librispeech",
|
| 58 |
+
"same_singer_dict_path": [
|
| 59 |
+
[
|
| 60 |
+
"../data/24k/OpenSinger",
|
| 61 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
|
| 62 |
+
"OpenSinger"
|
| 63 |
+
],
|
| 64 |
+
[
|
| 65 |
+
"../data/24k/k_multisinger",
|
| 66 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
|
| 67 |
+
"k_multisinger"
|
| 68 |
+
],
|
| 69 |
+
[
|
| 70 |
+
"../data/24k/CSD",
|
| 71 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
|
| 72 |
+
"CSD"
|
| 73 |
+
],
|
| 74 |
+
[
|
| 75 |
+
"../data/24k/jsut-song_ver1",
|
| 76 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
|
| 77 |
+
"jsut-song_ver1"
|
| 78 |
+
],
|
| 79 |
+
[
|
| 80 |
+
"../data/24k/jvs_music_ver1",
|
| 81 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
|
| 82 |
+
"jvs_music_ver1"
|
| 83 |
+
],
|
| 84 |
+
[
|
| 85 |
+
"../data/24k/k_multitimbre",
|
| 86 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
|
| 87 |
+
"k_multitimbre"
|
| 88 |
+
],
|
| 89 |
+
[
|
| 90 |
+
"../data/24k/kiritan_revised",
|
| 91 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
|
| 92 |
+
"kiritan"
|
| 93 |
+
],
|
| 94 |
+
[
|
| 95 |
+
"../data/24k/musdb_a_train",
|
| 96 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
|
| 97 |
+
"musdb_a_train"
|
| 98 |
+
],
|
| 99 |
+
[
|
| 100 |
+
"../data/24k/NUS",
|
| 101 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
|
| 102 |
+
"NUS"
|
| 103 |
+
],
|
| 104 |
+
[
|
| 105 |
+
"../data/24k/VocalSet",
|
| 106 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
|
| 107 |
+
"VocalSet"
|
| 108 |
+
]
|
| 109 |
+
],
|
| 110 |
+
"same_singer_ratio": 0.2,
|
| 111 |
+
"same_song_dict_path": [
|
| 112 |
+
[
|
| 113 |
+
"../data/24k/k_multisinger",
|
| 114 |
+
"./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
|
| 115 |
+
"k_multisinger"
|
| 116 |
+
]
|
| 117 |
+
],
|
| 118 |
+
"same_song_ratio": 0.2,
|
| 119 |
+
"same_speaker_dict_path": [
|
| 120 |
+
[
|
| 121 |
+
"../data/24k/LibriSpeech_train-clean-100",
|
| 122 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
|
| 123 |
+
"LibriSpeech_train-clean-100"
|
| 124 |
+
],
|
| 125 |
+
[
|
| 126 |
+
"../data/24k/LibriSpeech_train-clean-360",
|
| 127 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
|
| 128 |
+
"LibriSpeech_train-clean-360"
|
| 129 |
+
]
|
| 130 |
+
],
|
| 131 |
+
"same_speaker_ratio": 0.15,
|
| 132 |
+
"sample_rate": 24000,
|
| 133 |
+
"seed": 777,
|
| 134 |
+
"seq_dur": 3.0,
|
| 135 |
+
"sing_sing_ratio": 0.15,
|
| 136 |
+
"sing_speech_ratio": 0.15,
|
| 137 |
+
"skip_chan": 256,
|
| 138 |
+
"song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
|
| 139 |
+
"speech_train_root": [
|
| 140 |
+
"../data/24k/LibriSpeech_train-clean-360",
|
| 141 |
+
"../data/24k/LibriSpeech_train-clean-100"
|
| 142 |
+
],
|
| 143 |
+
"sr_input_res": false,
|
| 144 |
+
"sr_out_mix_consistency": false,
|
| 145 |
+
"srnet": "orig",
|
| 146 |
+
"start_from_best": true,
|
| 147 |
+
"sweep": false,
|
| 148 |
+
"target": "vocals",
|
| 149 |
+
"train_loss_func": [
|
| 150 |
+
"pit_snr",
|
| 151 |
+
"multi_spectral_l1"
|
| 152 |
+
],
|
| 153 |
+
"train_root": [
|
| 154 |
+
"../data/24k/CSD",
|
| 155 |
+
"../data/24k/NUS",
|
| 156 |
+
"../data/24k/TONAS",
|
| 157 |
+
"../data/24k/VocalSet",
|
| 158 |
+
"../data/24k/jsut-song_ver1",
|
| 159 |
+
"../data/24k/jvs_music_ver1",
|
| 160 |
+
"../data/24k/kiritan_revised",
|
| 161 |
+
"../data/24k/vocadito",
|
| 162 |
+
"../data/24k/musdb_a_train",
|
| 163 |
+
"../data/24k/OpenSinger",
|
| 164 |
+
"../data/24k/medleyDB_v1_in_musdb",
|
| 165 |
+
"../data/24k/k_multisinger",
|
| 166 |
+
"../data/24k/k_multitimbre"
|
| 167 |
+
],
|
| 168 |
+
"unison_prob": 0.3,
|
| 169 |
+
"use_wandb": true,
|
| 170 |
+
"valid_loss_func": [
|
| 171 |
+
"pit_si_sdr"
|
| 172 |
+
],
|
| 173 |
+
"valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
|
| 174 |
+
"valid_root": [
|
| 175 |
+
[
|
| 176 |
+
"../data/24k/musdb_a_test",
|
| 177 |
+
"../data/24k/musdb_a_test",
|
| 178 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
|
| 179 |
+
"sing_sing_diff"
|
| 180 |
+
],
|
| 181 |
+
[
|
| 182 |
+
"../data/24k/musdb_a_test",
|
| 183 |
+
"../data/24k/musdb_a_test",
|
| 184 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
|
| 185 |
+
"sing_sing_unison"
|
| 186 |
+
],
|
| 187 |
+
[
|
| 188 |
+
"../data/24k/musdb_a_test",
|
| 189 |
+
"../data/24k/musdb_a_test",
|
| 190 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
|
| 191 |
+
"sing_sing_same_singer"
|
| 192 |
+
],
|
| 193 |
+
[
|
| 194 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 195 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 196 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
|
| 197 |
+
"speech_speech_diff"
|
| 198 |
+
],
|
| 199 |
+
[
|
| 200 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 201 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 202 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
|
| 203 |
+
"speech_speech_unison"
|
| 204 |
+
],
|
| 205 |
+
[
|
| 206 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 207 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 208 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
|
| 209 |
+
"speech_speech_same_speaker"
|
| 210 |
+
],
|
| 211 |
+
[
|
| 212 |
+
"../data/24k/musdb_a_test",
|
| 213 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 214 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
|
| 215 |
+
"singing_speech"
|
| 216 |
+
]
|
| 217 |
+
],
|
| 218 |
+
"valid_root_orpit": [
|
| 219 |
+
[
|
| 220 |
+
"../data/24k/musdb_a_test",
|
| 221 |
+
"../data/24k/musdb_a_test",
|
| 222 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
|
| 223 |
+
"sing_sing_diff"
|
| 224 |
+
],
|
| 225 |
+
[
|
| 226 |
+
"../data/24k/musdb_a_test",
|
| 227 |
+
"../data/24k/musdb_a_test",
|
| 228 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
|
| 229 |
+
"sing_sing_unison"
|
| 230 |
+
],
|
| 231 |
+
[
|
| 232 |
+
"../data/24k/musdb_a_test",
|
| 233 |
+
"../data/24k/musdb_a_test",
|
| 234 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
|
| 235 |
+
"sing_sing_same_singer"
|
| 236 |
+
],
|
| 237 |
+
[
|
| 238 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 239 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 240 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
|
| 241 |
+
"speech_speech_diff"
|
| 242 |
+
],
|
| 243 |
+
[
|
| 244 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 245 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 246 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
|
| 247 |
+
"speech_speech_unison"
|
| 248 |
+
],
|
| 249 |
+
[
|
| 250 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 251 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 252 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
|
| 253 |
+
"speech_speech_same_speaker"
|
| 254 |
+
],
|
| 255 |
+
[
|
| 256 |
+
"../data/24k/musdb_a_test",
|
| 257 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 258 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
|
| 259 |
+
"singing_speech"
|
| 260 |
+
]
|
| 261 |
+
],
|
| 262 |
+
"weight_decay": 1e-06,
|
| 263 |
+
"world_size": 1
|
| 264 |
+
},
|
| 265 |
+
"best_epoch": 133,
|
| 266 |
+
"best_loss": -8.078724997384208,
|
| 267 |
+
"epochs_trained": 138,
|
| 268 |
+
"num_bad_epochs": 5,
|
| 269 |
+
"train_loss_history": [
|
| 270 |
+
-1.4921716451644897,
|
| 271 |
+
-2.8664329051971436,
|
| 272 |
+
-3.3393054008483887,
|
| 273 |
+
-3.638092517852783,
|
| 274 |
+
-3.879303216934204,
|
| 275 |
+
-4.088993072509766,
|
| 276 |
+
-4.227543830871582,
|
| 277 |
+
-4.412248134613037,
|
| 278 |
+
-4.58026123046875,
|
| 279 |
+
-4.71837043762207,
|
| 280 |
+
-4.800468444824219,
|
| 281 |
+
-4.882855415344238,
|
| 282 |
+
-5.011181831359863,
|
| 283 |
+
-5.128243923187256,
|
| 284 |
+
-5.150334358215332,
|
| 285 |
+
-5.240769386291504,
|
| 286 |
+
-5.357062816619873,
|
| 287 |
+
-5.35420560836792,
|
| 288 |
+
-5.427922248840332,
|
| 289 |
+
-5.536999225616455,
|
| 290 |
+
-5.6000895500183105,
|
| 291 |
+
-5.664849758148193,
|
| 292 |
+
-5.704154968261719,
|
| 293 |
+
-5.791101455688477,
|
| 294 |
+
-5.794349670410156,
|
| 295 |
+
-5.784161567687988,
|
| 296 |
+
-5.824007511138916,
|
| 297 |
+
-5.931461811065674,
|
| 298 |
+
-5.981809139251709,
|
| 299 |
+
-6.045787334442139,
|
| 300 |
+
-6.045494079589844,
|
| 301 |
+
-6.075621128082275,
|
| 302 |
+
-6.086508750915527,
|
| 303 |
+
-6.123781681060791,
|
| 304 |
+
-6.192169666290283,
|
| 305 |
+
-6.248963832855225,
|
| 306 |
+
-6.183308124542236,
|
| 307 |
+
-6.25191593170166,
|
| 308 |
+
-6.301548004150391,
|
| 309 |
+
-6.259702682495117,
|
| 310 |
+
-6.338959217071533,
|
| 311 |
+
-6.372439861297607,
|
| 312 |
+
-6.436537742614746,
|
| 313 |
+
-6.462899684906006,
|
| 314 |
+
-6.449411392211914,
|
| 315 |
+
-6.489621639251709,
|
| 316 |
+
-6.461447238922119,
|
| 317 |
+
-6.558005332946777,
|
| 318 |
+
-6.603482723236084,
|
| 319 |
+
-6.555445194244385,
|
| 320 |
+
-6.579801082611084,
|
| 321 |
+
-6.639071464538574,
|
| 322 |
+
-6.648660182952881,
|
| 323 |
+
-6.6866631507873535,
|
| 324 |
+
-6.767474174499512,
|
| 325 |
+
-6.750443935394287,
|
| 326 |
+
-6.7306742668151855,
|
| 327 |
+
-6.7853617668151855,
|
| 328 |
+
-6.818509101867676,
|
| 329 |
+
-6.761360168457031,
|
| 330 |
+
-6.795668601989746,
|
| 331 |
+
-6.82188606262207,
|
| 332 |
+
-6.795504093170166,
|
| 333 |
+
-6.914917469024658,
|
| 334 |
+
-6.921656131744385,
|
| 335 |
+
-6.950718402862549,
|
| 336 |
+
-6.966548919677734,
|
| 337 |
+
-6.965545654296875,
|
| 338 |
+
-6.964168548583984,
|
| 339 |
+
-6.888548374176025,
|
| 340 |
+
-6.932443141937256,
|
| 341 |
+
-6.930734634399414,
|
| 342 |
+
-6.949597358703613,
|
| 343 |
+
-6.947843551635742,
|
| 344 |
+
-6.959360599517822,
|
| 345 |
+
-6.974522590637207,
|
| 346 |
+
-7.005373954772949,
|
| 347 |
+
-7.039368629455566,
|
| 348 |
+
-7.008696556091309,
|
| 349 |
+
-7.064368724822998,
|
| 350 |
+
-7.038439750671387,
|
| 351 |
+
-7.046519756317139,
|
| 352 |
+
-7.052777290344238,
|
| 353 |
+
-7.06027889251709,
|
| 354 |
+
-7.048835277557373,
|
| 355 |
+
-7.095891952514648,
|
| 356 |
+
-7.080573558807373,
|
| 357 |
+
-7.120383262634277,
|
| 358 |
+
-7.1035075187683105,
|
| 359 |
+
-7.147456645965576,
|
| 360 |
+
-7.133329391479492,
|
| 361 |
+
-7.134939670562744,
|
| 362 |
+
-7.155049800872803,
|
| 363 |
+
-7.205596446990967,
|
| 364 |
+
-7.225539207458496,
|
| 365 |
+
-7.267192840576172,
|
| 366 |
+
-7.250244617462158,
|
| 367 |
+
-7.287757873535156,
|
| 368 |
+
-7.301974296569824,
|
| 369 |
+
-7.254255294799805,
|
| 370 |
+
-7.3585429191589355,
|
| 371 |
+
-7.332351207733154,
|
| 372 |
+
-7.346045970916748,
|
| 373 |
+
-7.384589672088623,
|
| 374 |
+
-7.356956958770752,
|
| 375 |
+
-7.392472743988037,
|
| 376 |
+
-7.418970584869385,
|
| 377 |
+
-7.446511745452881,
|
| 378 |
+
-7.445052623748779,
|
| 379 |
+
-7.42110538482666,
|
| 380 |
+
-7.461490631103516,
|
| 381 |
+
-7.509364128112793,
|
| 382 |
+
-7.508744716644287,
|
| 383 |
+
-7.480283260345459,
|
| 384 |
+
-7.561679363250732,
|
| 385 |
+
-7.4522271156311035,
|
| 386 |
+
-7.437519550323486,
|
| 387 |
+
-7.473508834838867,
|
| 388 |
+
-7.49954080581665,
|
| 389 |
+
-7.418591499328613,
|
| 390 |
+
-7.478306293487549,
|
| 391 |
+
-7.459006309509277,
|
| 392 |
+
-7.478801250457764,
|
| 393 |
+
-7.432499408721924,
|
| 394 |
+
-7.565118312835693,
|
| 395 |
+
-7.627929210662842,
|
| 396 |
+
-7.529797554016113,
|
| 397 |
+
-7.611763954162598,
|
| 398 |
+
-7.658102989196777,
|
| 399 |
+
-7.665148735046387,
|
| 400 |
+
-7.690982818603516,
|
| 401 |
+
-7.733800411224365,
|
| 402 |
+
-7.499368667602539,
|
| 403 |
+
-7.578357696533203,
|
| 404 |
+
-7.613222122192383,
|
| 405 |
+
-7.657804489135742,
|
| 406 |
+
-7.653645038604736,
|
| 407 |
+
-7.742368221282959
|
| 408 |
+
],
|
| 409 |
+
"train_time_history": [
|
| 410 |
+
4810.419310808182,
|
| 411 |
+
4810.429551362991,
|
| 412 |
+
4780.074353456497,
|
| 413 |
+
4780.08434343338,
|
| 414 |
+
4793.850719213486,
|
| 415 |
+
4793.851686954498,
|
| 416 |
+
4799.262031078339,
|
| 417 |
+
4799.2719786167145,
|
| 418 |
+
4776.265509605408,
|
| 419 |
+
4776.275769710541,
|
| 420 |
+
4800.915772199631,
|
| 421 |
+
4800.925550937653,
|
| 422 |
+
4782.19565987587,
|
| 423 |
+
4870.2729279994965,
|
| 424 |
+
4864.202353715897,
|
| 425 |
+
5526.39341044426,
|
| 426 |
+
5526.402764797211,
|
| 427 |
+
5210.057184457779,
|
| 428 |
+
5210.0663821697235,
|
| 429 |
+
5192.114199876785,
|
| 430 |
+
5192.115474700928,
|
| 431 |
+
5119.568732976913,
|
| 432 |
+
5119.579450130463,
|
| 433 |
+
4854.391019105911,
|
| 434 |
+
4854.4009165763855,
|
| 435 |
+
4825.207883834839,
|
| 436 |
+
4825.218036174774,
|
| 437 |
+
4839.370161294937,
|
| 438 |
+
4839.3797080516815,
|
| 439 |
+
4829.168277978897,
|
| 440 |
+
4829.178178310394,
|
| 441 |
+
4831.754481077194,
|
| 442 |
+
4831.764403104782,
|
| 443 |
+
4840.167069673538,
|
| 444 |
+
4840.1764142513275,
|
| 445 |
+
4839.306309938431,
|
| 446 |
+
4839.315984725952,
|
| 447 |
+
4835.479310274124,
|
| 448 |
+
4835.489530324936,
|
| 449 |
+
4991.815203428268,
|
| 450 |
+
4826.761980772018,
|
| 451 |
+
4826.772101163864,
|
| 452 |
+
4819.3496108055115,
|
| 453 |
+
4819.358952999115,
|
| 454 |
+
4820.984974384308,
|
| 455 |
+
4820.995014190674,
|
| 456 |
+
4820.539752483368,
|
| 457 |
+
4820.548979997635,
|
| 458 |
+
4814.873534917831,
|
| 459 |
+
4814.876268863678,
|
| 460 |
+
4812.354250907898,
|
| 461 |
+
4812.363839626312,
|
| 462 |
+
4822.391925573349,
|
| 463 |
+
4822.40118765831,
|
| 464 |
+
4809.686738491058,
|
| 465 |
+
4809.697638034821,
|
| 466 |
+
4832.5055372715,
|
| 467 |
+
4832.515355587006,
|
| 468 |
+
4831.67563867569,
|
| 469 |
+
4831.685403108597,
|
| 470 |
+
4824.845934391022,
|
| 471 |
+
4824.85514998436,
|
| 472 |
+
4835.57625246048,
|
| 473 |
+
4835.587289094925,
|
| 474 |
+
4817.744952201843,
|
| 475 |
+
4817.7542552948,
|
| 476 |
+
4807.804133653641,
|
| 477 |
+
4807.814810037613,
|
| 478 |
+
4818.521605968475,
|
| 479 |
+
4818.532015800476,
|
| 480 |
+
4981.354954957962,
|
| 481 |
+
4981.368631839752,
|
| 482 |
+
4875.586889028549,
|
| 483 |
+
4875.597553014755,
|
| 484 |
+
4801.111567258835,
|
| 485 |
+
4801.1219182014465,
|
| 486 |
+
4799.074081897736,
|
| 487 |
+
4799.08514547348,
|
| 488 |
+
4809.401276350021,
|
| 489 |
+
4809.41465306282,
|
| 490 |
+
4809.043102502823,
|
| 491 |
+
4809.04475402832,
|
| 492 |
+
4818.2070748806,
|
| 493 |
+
4818.209503889084,
|
| 494 |
+
4796.3679666519165,
|
| 495 |
+
4796.377726793289,
|
| 496 |
+
4794.153427362442,
|
| 497 |
+
4794.155965805054,
|
| 498 |
+
4804.1572597026825,
|
| 499 |
+
4804.168130159378,
|
| 500 |
+
4797.392125368118,
|
| 501 |
+
4797.401923418045,
|
| 502 |
+
4797.116873264313,
|
| 503 |
+
4797.12747836113,
|
| 504 |
+
4799.205674409866,
|
| 505 |
+
4799.215870857239,
|
| 506 |
+
4969.960748910904,
|
| 507 |
+
4969.971879482269,
|
| 508 |
+
5270.599810838699,
|
| 509 |
+
5270.6101796627045,
|
| 510 |
+
4881.989181518555,
|
| 511 |
+
4882.000226974487,
|
| 512 |
+
4867.6136746406555,
|
| 513 |
+
4867.624637126923,
|
| 514 |
+
5128.904933452606,
|
| 515 |
+
5128.915862798691,
|
| 516 |
+
4879.79870891571,
|
| 517 |
+
4879.80947971344,
|
| 518 |
+
4969.744366407394,
|
| 519 |
+
4969.754128456116,
|
| 520 |
+
4907.097052812576,
|
| 521 |
+
4907.107843637466,
|
| 522 |
+
4812.9132516384125,
|
| 523 |
+
4812.9242560863495,
|
| 524 |
+
4815.909214496613,
|
| 525 |
+
4815.920344591141,
|
| 526 |
+
4806.699935913086,
|
| 527 |
+
4806.70260477066,
|
| 528 |
+
4831.170897245407,
|
| 529 |
+
4831.180289506912,
|
| 530 |
+
4839.252681255341,
|
| 531 |
+
4839.262135982513,
|
| 532 |
+
4833.886634111404,
|
| 533 |
+
4833.8898758888245,
|
| 534 |
+
4830.524186134338,
|
| 535 |
+
4830.53564286232,
|
| 536 |
+
4824.74093079567,
|
| 537 |
+
4824.747734546661,
|
| 538 |
+
4818.754670858383,
|
| 539 |
+
4818.764072179794,
|
| 540 |
+
4816.966838121414,
|
| 541 |
+
4816.977759599686,
|
| 542 |
+
4831.582427740097,
|
| 543 |
+
4831.592094898224,
|
| 544 |
+
4804.51261639595,
|
| 545 |
+
4993.473606586456,
|
| 546 |
+
4815.601177692413,
|
| 547 |
+
4815.610737085342,
|
| 548 |
+
4789.788247346878,
|
| 549 |
+
4865.854624032974,
|
| 550 |
+
4865.864605426788,
|
| 551 |
+
5030.177618980408,
|
| 552 |
+
5030.188777208328,
|
| 553 |
+
4769.904754638672,
|
| 554 |
+
4895.086503267288,
|
| 555 |
+
4895.0979063510895,
|
| 556 |
+
4869.957269668579,
|
| 557 |
+
4869.958615779877,
|
| 558 |
+
4962.7930123806,
|
| 559 |
+
4962.803097486496,
|
| 560 |
+
5163.898764133453,
|
| 561 |
+
5163.90244436264,
|
| 562 |
+
4803.290739297867,
|
| 563 |
+
4803.3002672195435,
|
| 564 |
+
4819.002298593521,
|
| 565 |
+
4819.014036178589,
|
| 566 |
+
4812.0083973407745,
|
| 567 |
+
5130.011174440384,
|
| 568 |
+
5130.020927429199,
|
| 569 |
+
5238.748838424683,
|
| 570 |
+
5162.933927536011,
|
| 571 |
+
5162.94544506073,
|
| 572 |
+
5014.217702865601,
|
| 573 |
+
5014.227581739426,
|
| 574 |
+
5119.955267906189,
|
| 575 |
+
5119.966482877731,
|
| 576 |
+
4877.71505856514,
|
| 577 |
+
4947.4076771736145,
|
| 578 |
+
4947.418792486191,
|
| 579 |
+
4980.132425069809,
|
| 580 |
+
4980.143876552582,
|
| 581 |
+
5166.483239412308,
|
| 582 |
+
5166.49423623085,
|
| 583 |
+
4906.088274717331,
|
| 584 |
+
4906.0993638038635,
|
| 585 |
+
4880.329564332962,
|
| 586 |
+
4880.339328289032,
|
| 587 |
+
4873.104112148285,
|
| 588 |
+
4873.119816303253,
|
| 589 |
+
4885.143585205078,
|
| 590 |
+
5015.694309234619,
|
| 591 |
+
5418.331888914108,
|
| 592 |
+
5144.6408631801605,
|
| 593 |
+
5144.652040481567,
|
| 594 |
+
5510.665446281433,
|
| 595 |
+
5510.677114725113,
|
| 596 |
+
4798.924424171448,
|
| 597 |
+
4798.935866594315,
|
| 598 |
+
4812.6511833667755,
|
| 599 |
+
4812.66309094429,
|
| 600 |
+
4802.269027709961,
|
| 601 |
+
4802.280553340912,
|
| 602 |
+
4802.639967918396,
|
| 603 |
+
4802.649654150009,
|
| 604 |
+
4785.577591180801,
|
| 605 |
+
4785.58954501152,
|
| 606 |
+
4792.5177211761475,
|
| 607 |
+
4792.5278561115265,
|
| 608 |
+
4783.97540807724,
|
| 609 |
+
4835.069321632385,
|
| 610 |
+
4857.578319787979,
|
| 611 |
+
4857.590538024902,
|
| 612 |
+
4861.290355205536,
|
| 613 |
+
4861.301545619965,
|
| 614 |
+
4851.626524686813,
|
| 615 |
+
4597.863308668137,
|
| 616 |
+
4597.874926805496,
|
| 617 |
+
4598.068494558334,
|
| 618 |
+
4598.072705507278,
|
| 619 |
+
4607.527726888657,
|
| 620 |
+
4607.5391726493835,
|
| 621 |
+
4593.976358413696,
|
| 622 |
+
4593.987717866898,
|
| 623 |
+
4608.605073928833,
|
| 624 |
+
4608.616888284683,
|
| 625 |
+
4604.218909025192,
|
| 626 |
+
4604.230967283249,
|
| 627 |
+
4601.031387329102,
|
| 628 |
+
4936.647894382477,
|
| 629 |
+
4936.660204172134,
|
| 630 |
+
4616.291204214096,
|
| 631 |
+
4828.3469569683075,
|
| 632 |
+
5099.975877046585,
|
| 633 |
+
5099.978621482849,
|
| 634 |
+
5208.007155179977,
|
| 635 |
+
5208.019089221954,
|
| 636 |
+
4924.660996437073,
|
| 637 |
+
4631.26912856102,
|
| 638 |
+
4631.2805788517,
|
| 639 |
+
4622.089585542679,
|
| 640 |
+
4669.914644002914,
|
| 641 |
+
4669.9266991615295,
|
| 642 |
+
5124.021832227707,
|
| 643 |
+
5124.033423900604,
|
| 644 |
+
4878.845312595367,
|
| 645 |
+
4878.8553302288055,
|
| 646 |
+
4774.273676395416,
|
| 647 |
+
4936.869963884354,
|
| 648 |
+
4936.871921777725,
|
| 649 |
+
4737.119180679321,
|
| 650 |
+
4737.129071235657,
|
| 651 |
+
4738.499984264374,
|
| 652 |
+
4738.504971981049,
|
| 653 |
+
4728.711101770401,
|
| 654 |
+
4728.721209287643,
|
| 655 |
+
4720.726502895355,
|
| 656 |
+
4720.738488435745,
|
| 657 |
+
4740.279819250107,
|
| 658 |
+
4740.292397260666,
|
| 659 |
+
4727.014559984207,
|
| 660 |
+
4771.3221616744995,
|
| 661 |
+
4771.332203388214,
|
| 662 |
+
4708.109707355499,
|
| 663 |
+
4708.121538639069,
|
| 664 |
+
4709.580441951752,
|
| 665 |
+
4709.592922925949,
|
| 666 |
+
4704.416685819626,
|
| 667 |
+
4704.427433013916,
|
| 668 |
+
4723.0361750125885,
|
| 669 |
+
4723.046160697937,
|
| 670 |
+
4742.564235210419
|
| 671 |
+
],
|
| 672 |
+
"valid_loss_history": [
|
| 673 |
+
-3.0233164174216136,
|
| 674 |
+
-4.020770004817417,
|
| 675 |
+
-4.493505137307303,
|
| 676 |
+
-4.805826323372977,
|
| 677 |
+
-5.009723663330078,
|
| 678 |
+
-5.3551515851702005,
|
| 679 |
+
-5.507791314806257,
|
| 680 |
+
-5.709285395486014,
|
| 681 |
+
-5.82812111718314,
|
| 682 |
+
-5.88963999067034,
|
| 683 |
+
-5.970332486288888,
|
| 684 |
+
-5.988547257014683,
|
| 685 |
+
-6.113276481628418,
|
| 686 |
+
-6.132954188755581,
|
| 687 |
+
-6.210943358285086,
|
| 688 |
+
-6.279647214072091,
|
| 689 |
+
-6.3300862312316895,
|
| 690 |
+
-6.36109277180263,
|
| 691 |
+
-6.427222183772495,
|
| 692 |
+
-6.453009741646903,
|
| 693 |
+
-6.489914894104004,
|
| 694 |
+
-6.48867974962507,
|
| 695 |
+
-6.536007336207798,
|
| 696 |
+
-6.532879625047956,
|
| 697 |
+
-6.572577135903495,
|
| 698 |
+
-6.5566478456769675,
|
| 699 |
+
-6.660695620945522,
|
| 700 |
+
-6.70451055254255,
|
| 701 |
+
-6.65756470816476,
|
| 702 |
+
-6.701659406934466,
|
| 703 |
+
-6.7815567425319125,
|
| 704 |
+
-6.804818085261753,
|
| 705 |
+
-6.783689567020962,
|
| 706 |
+
-6.844764641353062,
|
| 707 |
+
-6.868685790470669,
|
| 708 |
+
-6.888231481824603,
|
| 709 |
+
-6.942419528961182,
|
| 710 |
+
-6.951289176940918,
|
| 711 |
+
-6.975076675415039,
|
| 712 |
+
-6.991657052721296,
|
| 713 |
+
-7.000387941087995,
|
| 714 |
+
-7.082024574279785,
|
| 715 |
+
-7.087371553693499,
|
| 716 |
+
-7.114969117300851,
|
| 717 |
+
-7.163520812988281,
|
| 718 |
+
-7.1951784406389505,
|
| 719 |
+
-7.216815676007952,
|
| 720 |
+
-7.201807635171073,
|
| 721 |
+
-7.183896745954241,
|
| 722 |
+
-7.227273804800851,
|
| 723 |
+
-7.232961927141462,
|
| 724 |
+
-7.2955668313162665,
|
| 725 |
+
-7.2690509387425015,
|
| 726 |
+
-7.273542472294399,
|
| 727 |
+
-7.281754766191755,
|
| 728 |
+
-7.311358247484479,
|
| 729 |
+
-7.287418706076486,
|
| 730 |
+
-7.261871746608189,
|
| 731 |
+
-7.2840664727347235,
|
| 732 |
+
-7.316314697265625,
|
| 733 |
+
-7.376913070678711,
|
| 734 |
+
-7.367326668330601,
|
| 735 |
+
-7.438824789864676,
|
| 736 |
+
-7.427623748779297,
|
| 737 |
+
-7.45092739377703,
|
| 738 |
+
-7.4810590744018555,
|
| 739 |
+
-7.43196405683245,
|
| 740 |
+
-7.413298266274588,
|
| 741 |
+
-7.448171275002616,
|
| 742 |
+
-7.470413276127407,
|
| 743 |
+
-7.478131294250488,
|
| 744 |
+
-7.494483879634312,
|
| 745 |
+
-7.459411212376186,
|
| 746 |
+
-7.515866688319615,
|
| 747 |
+
-7.571803229195731,
|
| 748 |
+
-7.573634147644043,
|
| 749 |
+
-7.562024729592459,
|
| 750 |
+
-7.550929818834577,
|
| 751 |
+
-7.53609037399292,
|
| 752 |
+
-7.563671180180141,
|
| 753 |
+
-7.578108038221087,
|
| 754 |
+
-7.531997203826904,
|
| 755 |
+
-7.600094999585833,
|
| 756 |
+
-7.639314787728446,
|
| 757 |
+
-7.65882362638201,
|
| 758 |
+
-7.624989918300083,
|
| 759 |
+
-7.5906588690621515,
|
| 760 |
+
-7.606484276907785,
|
| 761 |
+
-7.59099394934518,
|
| 762 |
+
-7.611005442483084,
|
| 763 |
+
-7.649059908730643,
|
| 764 |
+
-7.675145898546491,
|
| 765 |
+
-7.672644138336182,
|
| 766 |
+
-7.692336968013218,
|
| 767 |
+
-7.745299407414028,
|
| 768 |
+
-7.703637259347098,
|
| 769 |
+
-7.7217142922537665,
|
| 770 |
+
-7.755190917423794,
|
| 771 |
+
-7.724456242152622,
|
| 772 |
+
-7.718445096697126,
|
| 773 |
+
-7.70384795325143,
|
| 774 |
+
-7.769181455884661,
|
| 775 |
+
-7.737368106842041,
|
| 776 |
+
-7.775186266217913,
|
| 777 |
+
-7.79545715876988,
|
| 778 |
+
-7.832847731454032,
|
| 779 |
+
-7.88108880179269,
|
| 780 |
+
-7.83744832447597,
|
| 781 |
+
-7.77746193749564,
|
| 782 |
+
-7.796889645712716,
|
| 783 |
+
-7.857666560581753,
|
| 784 |
+
-7.839055401938302,
|
| 785 |
+
-7.861930165972028,
|
| 786 |
+
-7.839671952383859,
|
| 787 |
+
-7.864107472555978,
|
| 788 |
+
-7.894429411206927,
|
| 789 |
+
-7.918878350939069,
|
| 790 |
+
-7.923129831041608,
|
| 791 |
+
-7.894604206085205,
|
| 792 |
+
-7.947047778538296,
|
| 793 |
+
-8.002508435930524,
|
| 794 |
+
-7.973720823015485,
|
| 795 |
+
-7.974254812513079,
|
| 796 |
+
-8.013154642922538,
|
| 797 |
+
-8.002701963697161,
|
| 798 |
+
-8.019224030630928,
|
| 799 |
+
-8.05652904510498,
|
| 800 |
+
-8.018050602504186,
|
| 801 |
+
-7.991105215890067,
|
| 802 |
+
-8.010712623596191,
|
| 803 |
+
-8.022557122366768,
|
| 804 |
+
-8.00721972329276,
|
| 805 |
+
-8.078724997384208,
|
| 806 |
+
-8.01572070802961,
|
| 807 |
+
-8.054968425205775,
|
| 808 |
+
-7.982895714896066,
|
| 809 |
+
-7.920760086604527,
|
| 810 |
+
-7.992146287645612
|
| 811 |
+
]
|
| 812 |
+
}
|
MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/loss_graph_vocals.png
ADDED
|
MedleyVox-MultiSinger/singing_librispeech_ft_iSRNet/vocals.json
ADDED
|
@@ -0,0 +1,1321 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"args": {
|
| 3 |
+
"above_freq": 3000.0,
|
| 4 |
+
"architecture": "conv_tasnet_stft",
|
| 5 |
+
"batch_size": 5,
|
| 6 |
+
"beta1": 0.5,
|
| 7 |
+
"beta2": 0.9,
|
| 8 |
+
"bn_chan": 256,
|
| 9 |
+
"continual_train": true,
|
| 10 |
+
"dataset": "singing_librispeech",
|
| 11 |
+
"db_normalize": false,
|
| 12 |
+
"ema": true,
|
| 13 |
+
"encoder_activation": null,
|
| 14 |
+
"entity": "carson2050",
|
| 15 |
+
"epochs": 280,
|
| 16 |
+
"eps": 1e-08,
|
| 17 |
+
"exp_name": "singin_librispeech_ft_iSRNet",
|
| 18 |
+
"ff_activation": "relu",
|
| 19 |
+
"gpu": 0,
|
| 20 |
+
"gradient_clip": 5.0,
|
| 21 |
+
"hid_chan": 1024,
|
| 22 |
+
"load_ema_online_model": false,
|
| 23 |
+
"lr": 2e-05,
|
| 24 |
+
"lr_decay_gamma": 0.5,
|
| 25 |
+
"lr_decay_patience": 3,
|
| 26 |
+
"lr_scheduler": "cos_warmup",
|
| 27 |
+
"mask_act": "linear",
|
| 28 |
+
"max_n_src": 4,
|
| 29 |
+
"min_n_src": 2,
|
| 30 |
+
"mixed_precision": false,
|
| 31 |
+
"mixture_consistency": "sfsrnet",
|
| 32 |
+
"multi_spec_loss_log_scale": false,
|
| 33 |
+
"n_blocks": 6,
|
| 34 |
+
"n_filter": 512,
|
| 35 |
+
"n_kernel": 512,
|
| 36 |
+
"n_nodes": 1,
|
| 37 |
+
"n_repeats": 4,
|
| 38 |
+
"n_src": 2,
|
| 39 |
+
"nb_workers": 8,
|
| 40 |
+
"nfft": 2048,
|
| 41 |
+
"ngpus_per_node": 1,
|
| 42 |
+
"nhop": 512,
|
| 43 |
+
"no_cuda": false,
|
| 44 |
+
"no_mask": false,
|
| 45 |
+
"no_mask_residual": false,
|
| 46 |
+
"optimizer": "adam",
|
| 47 |
+
"output": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep/checkpoint/singin_librispeech_ft_iSRNet",
|
| 48 |
+
"output_directory": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep",
|
| 49 |
+
"part_of_data": null,
|
| 50 |
+
"patience": 15,
|
| 51 |
+
"pitch_formant_augment_prob": 0.4,
|
| 52 |
+
"port": null,
|
| 53 |
+
"project": "MedleyVox_home",
|
| 54 |
+
"quiet": false,
|
| 55 |
+
"rank": 0,
|
| 56 |
+
"reduced_training_data_ratio": 0.1,
|
| 57 |
+
"resume": "results/singing_sep/checkpoint/singing_librispeech_ft2",
|
| 58 |
+
"same_singer_dict_path": [
|
| 59 |
+
[
|
| 60 |
+
"../data/24k/OpenSinger",
|
| 61 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
|
| 62 |
+
"OpenSinger"
|
| 63 |
+
],
|
| 64 |
+
[
|
| 65 |
+
"../data/24k/k_multisinger",
|
| 66 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
|
| 67 |
+
"k_multisinger"
|
| 68 |
+
],
|
| 69 |
+
[
|
| 70 |
+
"../data/24k/CSD",
|
| 71 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
|
| 72 |
+
"CSD"
|
| 73 |
+
],
|
| 74 |
+
[
|
| 75 |
+
"../data/24k/jsut-song_ver1",
|
| 76 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
|
| 77 |
+
"jsut-song_ver1"
|
| 78 |
+
],
|
| 79 |
+
[
|
| 80 |
+
"../data/24k/jvs_music_ver1",
|
| 81 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
|
| 82 |
+
"jvs_music_ver1"
|
| 83 |
+
],
|
| 84 |
+
[
|
| 85 |
+
"../data/24k/k_multitimbre",
|
| 86 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
|
| 87 |
+
"k_multitimbre"
|
| 88 |
+
],
|
| 89 |
+
[
|
| 90 |
+
"../data/24k/kiritan_revised",
|
| 91 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
|
| 92 |
+
"kiritan"
|
| 93 |
+
],
|
| 94 |
+
[
|
| 95 |
+
"../data/24k/musdb_a_train",
|
| 96 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
|
| 97 |
+
"musdb_a_train"
|
| 98 |
+
],
|
| 99 |
+
[
|
| 100 |
+
"../data/24k/NUS",
|
| 101 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
|
| 102 |
+
"NUS"
|
| 103 |
+
],
|
| 104 |
+
[
|
| 105 |
+
"../data/24k/VocalSet",
|
| 106 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
|
| 107 |
+
"VocalSet"
|
| 108 |
+
]
|
| 109 |
+
],
|
| 110 |
+
"same_singer_ratio": 0.2,
|
| 111 |
+
"same_song_dict_path": [
|
| 112 |
+
[
|
| 113 |
+
"../data/24k/k_multisinger",
|
| 114 |
+
"./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
|
| 115 |
+
"k_multisinger"
|
| 116 |
+
]
|
| 117 |
+
],
|
| 118 |
+
"same_song_ratio": 0.2,
|
| 119 |
+
"same_speaker_dict_path": [
|
| 120 |
+
[
|
| 121 |
+
"../data/24k/LibriSpeech_train-clean-100",
|
| 122 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
|
| 123 |
+
"LibriSpeech_train-clean-100"
|
| 124 |
+
],
|
| 125 |
+
[
|
| 126 |
+
"../data/24k/LibriSpeech_train-clean-360",
|
| 127 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
|
| 128 |
+
"LibriSpeech_train-clean-360"
|
| 129 |
+
]
|
| 130 |
+
],
|
| 131 |
+
"same_speaker_ratio": 0.15,
|
| 132 |
+
"sample_rate": 24000,
|
| 133 |
+
"seed": 777,
|
| 134 |
+
"seq_dur": 3.0,
|
| 135 |
+
"sing_sing_ratio": 0.15,
|
| 136 |
+
"sing_speech_ratio": 0.15,
|
| 137 |
+
"skip_chan": 256,
|
| 138 |
+
"song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
|
| 139 |
+
"speech_train_root": [
|
| 140 |
+
"../data/24k/LibriSpeech_train-clean-360",
|
| 141 |
+
"../data/24k/LibriSpeech_train-clean-100"
|
| 142 |
+
],
|
| 143 |
+
"sr_input_res": false,
|
| 144 |
+
"sr_out_mix_consistency": false,
|
| 145 |
+
"srnet": "convnext",
|
| 146 |
+
"start_from_best": true,
|
| 147 |
+
"sweep": false,
|
| 148 |
+
"target": "vocals",
|
| 149 |
+
"train_loss_func": [
|
| 150 |
+
"pit_snr",
|
| 151 |
+
"multi_spectral_l1",
|
| 152 |
+
"snr"
|
| 153 |
+
],
|
| 154 |
+
"train_root": [
|
| 155 |
+
"../data/24k/CSD",
|
| 156 |
+
"../data/24k/NUS",
|
| 157 |
+
"../data/24k/TONAS",
|
| 158 |
+
"../data/24k/VocalSet",
|
| 159 |
+
"../data/24k/jsut-song_ver1",
|
| 160 |
+
"../data/24k/jvs_music_ver1",
|
| 161 |
+
"../data/24k/kiritan_revised",
|
| 162 |
+
"../data/24k/vocadito",
|
| 163 |
+
"../data/24k/musdb_a_train",
|
| 164 |
+
"../data/24k/OpenSinger",
|
| 165 |
+
"../data/24k/medleyDB_v1_in_musdb",
|
| 166 |
+
"../data/24k/k_multisinger",
|
| 167 |
+
"../data/24k/k_multitimbre"
|
| 168 |
+
],
|
| 169 |
+
"unison_prob": 0.3,
|
| 170 |
+
"use_wandb": true,
|
| 171 |
+
"valid_loss_func": [
|
| 172 |
+
"pit_si_sdr"
|
| 173 |
+
],
|
| 174 |
+
"valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
|
| 175 |
+
"valid_root": [
|
| 176 |
+
[
|
| 177 |
+
"../data/24k/musdb_a_test",
|
| 178 |
+
"../data/24k/musdb_a_test",
|
| 179 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
|
| 180 |
+
"sing_sing_diff"
|
| 181 |
+
],
|
| 182 |
+
[
|
| 183 |
+
"../data/24k/musdb_a_test",
|
| 184 |
+
"../data/24k/musdb_a_test",
|
| 185 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
|
| 186 |
+
"sing_sing_unison"
|
| 187 |
+
],
|
| 188 |
+
[
|
| 189 |
+
"../data/24k/musdb_a_test",
|
| 190 |
+
"../data/24k/musdb_a_test",
|
| 191 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
|
| 192 |
+
"sing_sing_same_singer"
|
| 193 |
+
],
|
| 194 |
+
[
|
| 195 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 196 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 197 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
|
| 198 |
+
"speech_speech_diff"
|
| 199 |
+
],
|
| 200 |
+
[
|
| 201 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 202 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 203 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
|
| 204 |
+
"speech_speech_unison"
|
| 205 |
+
],
|
| 206 |
+
[
|
| 207 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 208 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 209 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
|
| 210 |
+
"speech_speech_same_speaker"
|
| 211 |
+
],
|
| 212 |
+
[
|
| 213 |
+
"../data/24k/musdb_a_test",
|
| 214 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 215 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
|
| 216 |
+
"singing_speech"
|
| 217 |
+
]
|
| 218 |
+
],
|
| 219 |
+
"valid_root_orpit": [
|
| 220 |
+
[
|
| 221 |
+
"../data/24k/musdb_a_test",
|
| 222 |
+
"../data/24k/musdb_a_test",
|
| 223 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
|
| 224 |
+
"sing_sing_diff"
|
| 225 |
+
],
|
| 226 |
+
[
|
| 227 |
+
"../data/24k/musdb_a_test",
|
| 228 |
+
"../data/24k/musdb_a_test",
|
| 229 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
|
| 230 |
+
"sing_sing_unison"
|
| 231 |
+
],
|
| 232 |
+
[
|
| 233 |
+
"../data/24k/musdb_a_test",
|
| 234 |
+
"../data/24k/musdb_a_test",
|
| 235 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
|
| 236 |
+
"sing_sing_same_singer"
|
| 237 |
+
],
|
| 238 |
+
[
|
| 239 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 240 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 241 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
|
| 242 |
+
"speech_speech_diff"
|
| 243 |
+
],
|
| 244 |
+
[
|
| 245 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 246 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 247 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
|
| 248 |
+
"speech_speech_unison"
|
| 249 |
+
],
|
| 250 |
+
[
|
| 251 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 252 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 253 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
|
| 254 |
+
"speech_speech_same_speaker"
|
| 255 |
+
],
|
| 256 |
+
[
|
| 257 |
+
"../data/24k/musdb_a_test",
|
| 258 |
+
"../data/24k/LibriSpeech_dev-clean",
|
| 259 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
|
| 260 |
+
"singing_speech"
|
| 261 |
+
]
|
| 262 |
+
],
|
| 263 |
+
"weight_decay": 1e-06,
|
| 264 |
+
"world_size": 1
|
| 265 |
+
},
|
| 266 |
+
"best_epoch": 267,
|
| 267 |
+
"best_loss": -9.572482517787389,
|
| 268 |
+
"epochs_trained": 267,
|
| 269 |
+
"num_bad_epochs": 0,
|
| 270 |
+
"train_loss_history": [
|
| 271 |
+
-1.3311041593551636,
|
| 272 |
+
-3.4447357654571533,
|
| 273 |
+
-4.284253120422363,
|
| 274 |
+
-4.726616382598877,
|
| 275 |
+
-5.099369049072266,
|
| 276 |
+
-5.331325054168701,
|
| 277 |
+
-5.553539752960205,
|
| 278 |
+
-5.740077018737793,
|
| 279 |
+
-5.918744087219238,
|
| 280 |
+
-6.005505561828613,
|
| 281 |
+
-6.201973915100098,
|
| 282 |
+
-6.26826286315918,
|
| 283 |
+
-6.3942413330078125,
|
| 284 |
+
-6.4803619384765625,
|
| 285 |
+
-6.592747688293457,
|
| 286 |
+
-6.6781134605407715,
|
| 287 |
+
-6.777161121368408,
|
| 288 |
+
-6.848526477813721,
|
| 289 |
+
-6.911881923675537,
|
| 290 |
+
-7.017796993255615,
|
| 291 |
+
-7.12304162979126,
|
| 292 |
+
-7.14536190032959,
|
| 293 |
+
-7.289445400238037,
|
| 294 |
+
-7.409412384033203,
|
| 295 |
+
-7.7652082443237305,
|
| 296 |
+
-7.837531089782715,
|
| 297 |
+
-7.850446701049805,
|
| 298 |
+
-7.941095352172852,
|
| 299 |
+
-7.939220428466797,
|
| 300 |
+
-8.047593116760254,
|
| 301 |
+
-8.07531452178955,
|
| 302 |
+
-8.134244918823242,
|
| 303 |
+
-8.143590927124023,
|
| 304 |
+
-8.190814018249512,
|
| 305 |
+
-8.217510223388672,
|
| 306 |
+
-8.175138473510742,
|
| 307 |
+
-7.989644527435303,
|
| 308 |
+
-8.09794807434082,
|
| 309 |
+
-8.24197006225586,
|
| 310 |
+
-8.232804298400879,
|
| 311 |
+
-8.328511238098145,
|
| 312 |
+
-8.389233589172363,
|
| 313 |
+
-8.267472267150879,
|
| 314 |
+
-8.301199913024902,
|
| 315 |
+
-8.36364459991455,
|
| 316 |
+
-8.43917465209961,
|
| 317 |
+
-8.493982315063477,
|
| 318 |
+
-8.481128692626953,
|
| 319 |
+
-8.429868698120117,
|
| 320 |
+
-8.501734733581543,
|
| 321 |
+
-8.54090404510498,
|
| 322 |
+
-8.568470001220703,
|
| 323 |
+
-8.50845718383789,
|
| 324 |
+
-8.597081184387207,
|
| 325 |
+
-8.513223648071289,
|
| 326 |
+
-8.38924503326416,
|
| 327 |
+
-8.502962112426758,
|
| 328 |
+
-8.518073081970215,
|
| 329 |
+
-8.56679916381836,
|
| 330 |
+
-8.698277473449707,
|
| 331 |
+
-8.630810737609863,
|
| 332 |
+
-8.755276679992676,
|
| 333 |
+
-8.700800895690918,
|
| 334 |
+
-8.74862003326416,
|
| 335 |
+
-8.734071731567383,
|
| 336 |
+
-8.633768081665039,
|
| 337 |
+
-8.633097648620605,
|
| 338 |
+
-8.872031211853027,
|
| 339 |
+
-8.828736305236816,
|
| 340 |
+
-8.753975868225098,
|
| 341 |
+
-8.886126518249512,
|
| 342 |
+
-8.758654594421387,
|
| 343 |
+
-8.883810997009277,
|
| 344 |
+
-8.952722549438477,
|
| 345 |
+
-8.945046424865723,
|
| 346 |
+
-8.907071113586426,
|
| 347 |
+
-8.891634941101074,
|
| 348 |
+
-8.91631031036377,
|
| 349 |
+
-8.951156616210938,
|
| 350 |
+
-8.931319236755371,
|
| 351 |
+
-8.960397720336914,
|
| 352 |
+
-8.841835975646973,
|
| 353 |
+
-8.834044456481934,
|
| 354 |
+
-8.786222457885742,
|
| 355 |
+
-8.903646469116211,
|
| 356 |
+
-8.947869300842285,
|
| 357 |
+
-8.696074485778809,
|
| 358 |
+
-8.99515438079834,
|
| 359 |
+
-9.005078315734863,
|
| 360 |
+
-8.934849739074707,
|
| 361 |
+
-8.99370002746582,
|
| 362 |
+
-9.030400276184082,
|
| 363 |
+
-9.101688385009766,
|
| 364 |
+
-9.08572006225586,
|
| 365 |
+
-9.075435638427734,
|
| 366 |
+
-9.125774383544922,
|
| 367 |
+
-9.102258682250977,
|
| 368 |
+
-9.160833358764648,
|
| 369 |
+
-8.999387741088867,
|
| 370 |
+
-8.929178237915039,
|
| 371 |
+
-9.085306167602539,
|
| 372 |
+
-9.149312019348145,
|
| 373 |
+
-9.201435089111328,
|
| 374 |
+
-9.119452476501465,
|
| 375 |
+
-9.192963600158691,
|
| 376 |
+
-9.153352737426758,
|
| 377 |
+
-9.16665267944336,
|
| 378 |
+
-9.187670707702637,
|
| 379 |
+
-9.213151931762695,
|
| 380 |
+
-9.295731544494629,
|
| 381 |
+
-9.204228401184082,
|
| 382 |
+
-9.2329683303833,
|
| 383 |
+
-9.198917388916016,
|
| 384 |
+
-9.242225646972656,
|
| 385 |
+
-9.251509666442871,
|
| 386 |
+
-9.233222007751465,
|
| 387 |
+
-9.235602378845215,
|
| 388 |
+
-9.264388084411621,
|
| 389 |
+
-9.286247253417969,
|
| 390 |
+
-9.287186622619629,
|
| 391 |
+
-9.327977180480957,
|
| 392 |
+
-9.304702758789062,
|
| 393 |
+
-9.34760570526123,
|
| 394 |
+
-9.314836502075195,
|
| 395 |
+
-9.300081253051758,
|
| 396 |
+
-9.20028018951416,
|
| 397 |
+
-9.35509967803955,
|
| 398 |
+
-9.345370292663574,
|
| 399 |
+
-9.36442756652832,
|
| 400 |
+
-9.351317405700684,
|
| 401 |
+
-9.352913856506348,
|
| 402 |
+
-9.388010025024414,
|
| 403 |
+
-9.326189994812012,
|
| 404 |
+
-9.411141395568848,
|
| 405 |
+
-9.424927711486816,
|
| 406 |
+
-9.376615524291992,
|
| 407 |
+
-9.394768714904785,
|
| 408 |
+
-9.382343292236328,
|
| 409 |
+
-9.345908164978027,
|
| 410 |
+
-9.387025833129883,
|
| 411 |
+
-9.397958755493164,
|
| 412 |
+
-9.370079040527344,
|
| 413 |
+
-9.419344902038574,
|
| 414 |
+
-9.414657592773438,
|
| 415 |
+
-9.450013160705566,
|
| 416 |
+
-9.424891471862793,
|
| 417 |
+
-9.468652725219727,
|
| 418 |
+
-9.437067031860352,
|
| 419 |
+
-9.452010154724121,
|
| 420 |
+
-9.476055145263672,
|
| 421 |
+
-9.454631805419922,
|
| 422 |
+
-9.519726753234863,
|
| 423 |
+
-9.494053840637207,
|
| 424 |
+
-9.349456787109375,
|
| 425 |
+
-9.444249153137207,
|
| 426 |
+
-9.432062149047852,
|
| 427 |
+
-9.469500541687012,
|
| 428 |
+
-9.506385803222656,
|
| 429 |
+
-9.541167259216309,
|
| 430 |
+
-9.514572143554688,
|
| 431 |
+
-9.517498016357422,
|
| 432 |
+
-9.508042335510254,
|
| 433 |
+
-9.524667739868164,
|
| 434 |
+
-9.513023376464844,
|
| 435 |
+
-9.518259048461914,
|
| 436 |
+
-9.491355895996094,
|
| 437 |
+
-9.527623176574707,
|
| 438 |
+
-9.503666877746582,
|
| 439 |
+
-9.575556755065918,
|
| 440 |
+
-9.51135540008545,
|
| 441 |
+
-9.574329376220703,
|
| 442 |
+
-9.559322357177734,
|
| 443 |
+
-9.576539993286133,
|
| 444 |
+
-9.587591171264648,
|
| 445 |
+
-9.615789413452148,
|
| 446 |
+
-9.586484909057617,
|
| 447 |
+
-9.597373008728027,
|
| 448 |
+
-9.565719604492188,
|
| 449 |
+
-9.580348014831543,
|
| 450 |
+
-9.544068336486816,
|
| 451 |
+
-9.576735496520996,
|
| 452 |
+
-9.617915153503418,
|
| 453 |
+
-9.634200096130371,
|
| 454 |
+
-9.50833511352539,
|
| 455 |
+
-9.633086204528809,
|
| 456 |
+
-9.622976303100586,
|
| 457 |
+
-9.628181457519531,
|
| 458 |
+
-9.385575294494629,
|
| 459 |
+
-9.312309265136719,
|
| 460 |
+
-8.996809005737305,
|
| 461 |
+
-9.591567993164062,
|
| 462 |
+
-9.602102279663086,
|
| 463 |
+
-9.606905937194824,
|
| 464 |
+
-9.660425186157227,
|
| 465 |
+
-9.59228229522705,
|
| 466 |
+
-9.66215991973877,
|
| 467 |
+
-9.652912139892578,
|
| 468 |
+
-9.683008193969727,
|
| 469 |
+
-9.550703048706055,
|
| 470 |
+
-9.616209983825684,
|
| 471 |
+
-9.262633323669434,
|
| 472 |
+
-9.222973823547363,
|
| 473 |
+
-9.16146469116211,
|
| 474 |
+
-9.264670372009277,
|
| 475 |
+
-9.241007804870605,
|
| 476 |
+
-9.89056396484375,
|
| 477 |
+
-9.639961242675781,
|
| 478 |
+
-9.945752143859863,
|
| 479 |
+
-10.058592796325684,
|
| 480 |
+
-9.94412899017334,
|
| 481 |
+
-9.352773666381836,
|
| 482 |
+
-9.2145357131958,
|
| 483 |
+
-9.298417091369629,
|
| 484 |
+
-9.264565467834473,
|
| 485 |
+
-9.239808082580566,
|
| 486 |
+
-9.254988670349121,
|
| 487 |
+
-9.295654296875,
|
| 488 |
+
-9.311361312866211,
|
| 489 |
+
-9.360262870788574,
|
| 490 |
+
-9.345294952392578,
|
| 491 |
+
-9.313271522521973,
|
| 492 |
+
-9.594743728637695,
|
| 493 |
+
-9.626945495605469,
|
| 494 |
+
-9.65655517578125,
|
| 495 |
+
-9.62312126159668,
|
| 496 |
+
-9.634873390197754,
|
| 497 |
+
-9.6712007522583,
|
| 498 |
+
-9.6635160446167,
|
| 499 |
+
-9.65237045288086,
|
| 500 |
+
-9.622418403625488,
|
| 501 |
+
-9.854077339172363,
|
| 502 |
+
-9.864712715148926,
|
| 503 |
+
-9.863157272338867,
|
| 504 |
+
-9.855356216430664,
|
| 505 |
+
-9.89089584350586,
|
| 506 |
+
-9.856369972229004,
|
| 507 |
+
-9.876996040344238,
|
| 508 |
+
-9.889692306518555,
|
| 509 |
+
-9.916558265686035,
|
| 510 |
+
-10.030950546264648,
|
| 511 |
+
-10.064481735229492,
|
| 512 |
+
-10.070244789123535,
|
| 513 |
+
-10.073690414428711,
|
| 514 |
+
-10.170597076416016,
|
| 515 |
+
-10.180663108825684,
|
| 516 |
+
-10.210295677185059,
|
| 517 |
+
-10.190850257873535,
|
| 518 |
+
-10.214118957519531,
|
| 519 |
+
-7.201298236846924,
|
| 520 |
+
-8.108235359191895,
|
| 521 |
+
-8.210838317871094,
|
| 522 |
+
-8.138957023620605,
|
| 523 |
+
-8.41263484954834,
|
| 524 |
+
-8.359173774719238,
|
| 525 |
+
-8.656364440917969,
|
| 526 |
+
-8.237541198730469,
|
| 527 |
+
-8.040386199951172,
|
| 528 |
+
-8.405668258666992,
|
| 529 |
+
-8.21469497680664,
|
| 530 |
+
-8.536149978637695,
|
| 531 |
+
-8.825751304626465,
|
| 532 |
+
-8.615899085998535,
|
| 533 |
+
-8.655414581298828,
|
| 534 |
+
-8.60315990447998,
|
| 535 |
+
-8.940108299255371,
|
| 536 |
+
-9.022863388061523,
|
| 537 |
+
-8.983457565307617
|
| 538 |
+
],
|
| 539 |
+
"train_time_history": [
|
| 540 |
+
4284.811353683472,
|
| 541 |
+
4284.813168525696,
|
| 542 |
+
4239.820109844208,
|
| 543 |
+
4358.5235912799835,
|
| 544 |
+
4358.525362968445,
|
| 545 |
+
4289.520437240601,
|
| 546 |
+
4289.5296330451965,
|
| 547 |
+
4233.677313089371,
|
| 548 |
+
4233.679363965988,
|
| 549 |
+
4209.371140003204,
|
| 550 |
+
4209.381086587906,
|
| 551 |
+
4202.905996799469,
|
| 552 |
+
4469.978202342987,
|
| 553 |
+
4469.989181756973,
|
| 554 |
+
4247.160337924957,
|
| 555 |
+
4247.1704177856445,
|
| 556 |
+
4190.890568256378,
|
| 557 |
+
4190.900403022766,
|
| 558 |
+
4185.636907577515,
|
| 559 |
+
4185.647009372711,
|
| 560 |
+
4180.687466144562,
|
| 561 |
+
4215.30419754982,
|
| 562 |
+
4215.314230442047,
|
| 563 |
+
4206.753845453262,
|
| 564 |
+
4206.76371717453,
|
| 565 |
+
4206.280591726303,
|
| 566 |
+
4206.290879011154,
|
| 567 |
+
4222.331785202026,
|
| 568 |
+
4222.341979503632,
|
| 569 |
+
4220.95298576355,
|
| 570 |
+
4220.962949752808,
|
| 571 |
+
4199.66743016243,
|
| 572 |
+
4199.67768073082,
|
| 573 |
+
4200.696933507919,
|
| 574 |
+
4200.706924915314,
|
| 575 |
+
4200.071183204651,
|
| 576 |
+
4200.073669195175,
|
| 577 |
+
4201.461757183075,
|
| 578 |
+
4201.47197842598,
|
| 579 |
+
4212.675180196762,
|
| 580 |
+
4212.685215473175,
|
| 581 |
+
4266.539958238602,
|
| 582 |
+
4266.55042219162,
|
| 583 |
+
4254.028660058975,
|
| 584 |
+
4254.030869007111,
|
| 585 |
+
4586.545968532562,
|
| 586 |
+
4586.556686401367,
|
| 587 |
+
4486.801070451736,
|
| 588 |
+
4486.811651468277,
|
| 589 |
+
4201.306690454483,
|
| 590 |
+
4201.308066606522,
|
| 591 |
+
4204.077554225922,
|
| 592 |
+
4204.087781906128,
|
| 593 |
+
4194.944247722626,
|
| 594 |
+
4194.954358577728,
|
| 595 |
+
4193.961704969406,
|
| 596 |
+
4222.590797185898,
|
| 597 |
+
4222.594073057175,
|
| 598 |
+
4221.6570999622345,
|
| 599 |
+
4221.666466474533,
|
| 600 |
+
4221.045345544815,
|
| 601 |
+
4221.055670261383,
|
| 602 |
+
4214.11606669426,
|
| 603 |
+
4214.125596284866,
|
| 604 |
+
4479.404296398163,
|
| 605 |
+
4479.414994955063,
|
| 606 |
+
4262.62514591217,
|
| 607 |
+
4262.635618209839,
|
| 608 |
+
4214.268101215363,
|
| 609 |
+
4214.2785403728485,
|
| 610 |
+
4218.142910718918,
|
| 611 |
+
4218.15364408493,
|
| 612 |
+
4215.917347192764,
|
| 613 |
+
4215.927803516388,
|
| 614 |
+
4218.397645950317,
|
| 615 |
+
4218.408536672592,
|
| 616 |
+
4233.58446598053,
|
| 617 |
+
4233.59490442276,
|
| 618 |
+
4318.161808013916,
|
| 619 |
+
4318.171140432358,
|
| 620 |
+
4237.026048898697,
|
| 621 |
+
4237.036669015884,
|
| 622 |
+
4220.925004482269,
|
| 623 |
+
4220.9352016448975,
|
| 624 |
+
4226.221168041229,
|
| 625 |
+
4223.1825070381165,
|
| 626 |
+
4223.192782878876,
|
| 627 |
+
4219.102268218994,
|
| 628 |
+
4219.113127231598,
|
| 629 |
+
4216.297616004944,
|
| 630 |
+
4216.308108329773,
|
| 631 |
+
4217.926244974136,
|
| 632 |
+
4217.937202453613,
|
| 633 |
+
4426.571401119232,
|
| 634 |
+
4426.573066711426,
|
| 635 |
+
4612.790915489197,
|
| 636 |
+
4612.801674365997,
|
| 637 |
+
4719.1595368385315,
|
| 638 |
+
4719.169989824295,
|
| 639 |
+
4305.255445480347,
|
| 640 |
+
4305.266388177872,
|
| 641 |
+
4221.674624681473,
|
| 642 |
+
4221.686189174652,
|
| 643 |
+
4229.138904571533,
|
| 644 |
+
4178.568962574005,
|
| 645 |
+
4178.5717051029205,
|
| 646 |
+
4178.647545337677,
|
| 647 |
+
4178.650447130203,
|
| 648 |
+
4169.984578132629,
|
| 649 |
+
4169.995152950287,
|
| 650 |
+
4173.8019506931305,
|
| 651 |
+
4173.804402589798,
|
| 652 |
+
4179.692799806595,
|
| 653 |
+
4179.695784330368,
|
| 654 |
+
4176.926806688309,
|
| 655 |
+
4176.937863111496,
|
| 656 |
+
4189.7040383815765,
|
| 657 |
+
4189.7144474983215,
|
| 658 |
+
4194.854960680008,
|
| 659 |
+
4194.8661851882935,
|
| 660 |
+
4488.314256668091,
|
| 661 |
+
4488.324142932892,
|
| 662 |
+
4301.72206735611,
|
| 663 |
+
4301.732882022858,
|
| 664 |
+
4203.297667264938,
|
| 665 |
+
4203.307426214218,
|
| 666 |
+
4212.263510465622,
|
| 667 |
+
4212.2729642391205,
|
| 668 |
+
4202.838434457779,
|
| 669 |
+
4202.8495717048645,
|
| 670 |
+
4206.559844255447,
|
| 671 |
+
4206.570970535278,
|
| 672 |
+
4202.594026565552,
|
| 673 |
+
4202.6052367687225,
|
| 674 |
+
4204.671685695648,
|
| 675 |
+
4204.675058603287,
|
| 676 |
+
4201.653420209885,
|
| 677 |
+
4201.664590358734,
|
| 678 |
+
4203.356340646744,
|
| 679 |
+
4203.3675968647,
|
| 680 |
+
4226.834460258484,
|
| 681 |
+
4226.84539103508,
|
| 682 |
+
4432.4133422374725,
|
| 683 |
+
4432.424476385117,
|
| 684 |
+
4194.520195245743,
|
| 685 |
+
4194.531393289566,
|
| 686 |
+
4185.361557483673,
|
| 687 |
+
4185.372809171677,
|
| 688 |
+
4178.024575471878,
|
| 689 |
+
4178.035531282425,
|
| 690 |
+
4183.264570951462,
|
| 691 |
+
4183.275583267212,
|
| 692 |
+
4178.5521404743195,
|
| 693 |
+
4178.563311338425,
|
| 694 |
+
4178.228582620621,
|
| 695 |
+
4178.238200426102,
|
| 696 |
+
4181.432615280151,
|
| 697 |
+
4181.443482160568,
|
| 698 |
+
4181.636572599411,
|
| 699 |
+
4181.647958517075,
|
| 700 |
+
4180.119422197342,
|
| 701 |
+
4180.130319356918,
|
| 702 |
+
4181.348428249359,
|
| 703 |
+
4181.3601496219635,
|
| 704 |
+
4182.4969573020935,
|
| 705 |
+
4182.508371829987,
|
| 706 |
+
4255.815136909485,
|
| 707 |
+
4255.824706077576,
|
| 708 |
+
4447.2853989601135,
|
| 709 |
+
4447.294949054718,
|
| 710 |
+
4375.476977586746,
|
| 711 |
+
4375.488611936569,
|
| 712 |
+
4216.147409915924,
|
| 713 |
+
4216.157112836838,
|
| 714 |
+
4184.855574131012,
|
| 715 |
+
4184.867551803589,
|
| 716 |
+
4182.2731301784515,
|
| 717 |
+
4182.284587860107,
|
| 718 |
+
4182.427225112915,
|
| 719 |
+
4182.438867807388,
|
| 720 |
+
4181.939938545227,
|
| 721 |
+
4181.951656103134,
|
| 722 |
+
4183.5050485134125,
|
| 723 |
+
4183.516293287277,
|
| 724 |
+
4180.313590764999,
|
| 725 |
+
4180.325238704681,
|
| 726 |
+
4184.185824394226,
|
| 727 |
+
4184.196978807449,
|
| 728 |
+
4175.860624790192,
|
| 729 |
+
4175.8725233078,
|
| 730 |
+
4174.206290960312,
|
| 731 |
+
4174.217987298965,
|
| 732 |
+
4225.280811309814,
|
| 733 |
+
4346.787808179855,
|
| 734 |
+
4346.791662693024,
|
| 735 |
+
4299.203949213028,
|
| 736 |
+
4334.719336986542,
|
| 737 |
+
4334.72660279274,
|
| 738 |
+
4307.453342437744,
|
| 739 |
+
4307.463569164276,
|
| 740 |
+
4243.263749361038,
|
| 741 |
+
4243.27504825592,
|
| 742 |
+
4232.403777837753,
|
| 743 |
+
4232.415019750595,
|
| 744 |
+
4234.023860692978,
|
| 745 |
+
4234.036010503769,
|
| 746 |
+
4232.419568777084,
|
| 747 |
+
4232.430717229843,
|
| 748 |
+
4228.692707538605,
|
| 749 |
+
4228.695293188095,
|
| 750 |
+
4235.275017976761,
|
| 751 |
+
4235.286781549454,
|
| 752 |
+
4231.93186712265,
|
| 753 |
+
4231.934266328812,
|
| 754 |
+
4237.727004766464,
|
| 755 |
+
4237.736963748932,
|
| 756 |
+
4448.2472088336945,
|
| 757 |
+
4448.257912635803,
|
| 758 |
+
4283.024597644806,
|
| 759 |
+
4283.03609752655,
|
| 760 |
+
4270.3121337890625,
|
| 761 |
+
4270.324274778366,
|
| 762 |
+
4244.299434423447,
|
| 763 |
+
4244.311620950699,
|
| 764 |
+
4363.46278834343,
|
| 765 |
+
4180.62579703331,
|
| 766 |
+
4180.635629653931,
|
| 767 |
+
4363.069185256958,
|
| 768 |
+
4220.090236663818,
|
| 769 |
+
4220.102267503738,
|
| 770 |
+
4190.208593130112,
|
| 771 |
+
4190.220735549927,
|
| 772 |
+
4181.494255304337,
|
| 773 |
+
4181.50580906868,
|
| 774 |
+
4186.210835933685,
|
| 775 |
+
4186.214511394501,
|
| 776 |
+
4188.612834215164,
|
| 777 |
+
4188.625131607056,
|
| 778 |
+
4182.178534984589,
|
| 779 |
+
4182.189949512482,
|
| 780 |
+
4183.857384443283,
|
| 781 |
+
4183.869287014008,
|
| 782 |
+
4183.761756181717,
|
| 783 |
+
4241.330404281616,
|
| 784 |
+
4241.341110467911,
|
| 785 |
+
4207.978038311005,
|
| 786 |
+
4207.990997314453,
|
| 787 |
+
4209.410867214203,
|
| 788 |
+
4209.421168088913,
|
| 789 |
+
4207.717931270599,
|
| 790 |
+
4207.730401754379,
|
| 791 |
+
4204.301562309265,
|
| 792 |
+
4204.313354253769,
|
| 793 |
+
4297.861345052719,
|
| 794 |
+
4297.873908042908,
|
| 795 |
+
4282.807532548904,
|
| 796 |
+
4282.820100307465,
|
| 797 |
+
4269.668355226517,
|
| 798 |
+
4269.680841684341,
|
| 799 |
+
4198.918546676636,
|
| 800 |
+
4198.928604364395,
|
| 801 |
+
4239.654682636261,
|
| 802 |
+
4239.659080028534,
|
| 803 |
+
4419.87956905365,
|
| 804 |
+
4419.889652013779,
|
| 805 |
+
4302.591921806335,
|
| 806 |
+
4302.60400891304,
|
| 807 |
+
4199.097110033035,
|
| 808 |
+
4199.109765052795,
|
| 809 |
+
4202.586899995804,
|
| 810 |
+
4202.596865415573,
|
| 811 |
+
4223.580963373184,
|
| 812 |
+
4236.571214199066,
|
| 813 |
+
4236.583789110184,
|
| 814 |
+
4266.631365537643,
|
| 815 |
+
4266.643340587616,
|
| 816 |
+
4206.533836603165,
|
| 817 |
+
4206.543870687485,
|
| 818 |
+
4196.797498226166,
|
| 819 |
+
4196.809820890427,
|
| 820 |
+
4202.778592824936,
|
| 821 |
+
4202.791028261185,
|
| 822 |
+
4200.911655426025,
|
| 823 |
+
4200.922192811966,
|
| 824 |
+
4218.757748126984,
|
| 825 |
+
4218.7700316905975,
|
| 826 |
+
4197.834621667862,
|
| 827 |
+
4197.8472237586975,
|
| 828 |
+
4194.553659200668,
|
| 829 |
+
4194.558137655258,
|
| 830 |
+
4210.2872478961945,
|
| 831 |
+
4210.291656970978,
|
| 832 |
+
4269.952535390854,
|
| 833 |
+
4269.963551998138,
|
| 834 |
+
4214.965420722961,
|
| 835 |
+
4214.9777710437775,
|
| 836 |
+
4268.254637956619,
|
| 837 |
+
4268.267082452774,
|
| 838 |
+
4188.457591295242,
|
| 839 |
+
4188.467690706253,
|
| 840 |
+
4188.935349225998,
|
| 841 |
+
4188.947833776474,
|
| 842 |
+
4192.73951125145,
|
| 843 |
+
4192.749709367752,
|
| 844 |
+
4188.534428119659,
|
| 845 |
+
4188.53829908371,
|
| 846 |
+
4196.497691392899,
|
| 847 |
+
4196.510225534439,
|
| 848 |
+
4318.416720151901,
|
| 849 |
+
4318.4267864227295,
|
| 850 |
+
4209.298709154129,
|
| 851 |
+
4204.6052923202515,
|
| 852 |
+
4204.609621763229,
|
| 853 |
+
4192.598699092865,
|
| 854 |
+
4192.6110072135925,
|
| 855 |
+
4264.5488522052765,
|
| 856 |
+
4264.562687158585,
|
| 857 |
+
4342.3707575798035,
|
| 858 |
+
4342.3756980896,
|
| 859 |
+
4299.415410995483,
|
| 860 |
+
4299.425767421722,
|
| 861 |
+
4285.986501693726,
|
| 862 |
+
4285.999414205551,
|
| 863 |
+
4251.881839513779,
|
| 864 |
+
4251.89198923111,
|
| 865 |
+
4217.251371145248,
|
| 866 |
+
4217.262971401215,
|
| 867 |
+
4265.004074335098,
|
| 868 |
+
4265.016601800919,
|
| 869 |
+
4422.643936634064,
|
| 870 |
+
4453.576984167099,
|
| 871 |
+
4453.588968753815,
|
| 872 |
+
4183.795456409454,
|
| 873 |
+
4183.80871462822,
|
| 874 |
+
4183.177849292755,
|
| 875 |
+
4183.1909646987915,
|
| 876 |
+
4190.727601289749,
|
| 877 |
+
4190.740168809891,
|
| 878 |
+
4185.585786104202,
|
| 879 |
+
4185.596675872803,
|
| 880 |
+
4186.326423406601,
|
| 881 |
+
4186.3365132808685,
|
| 882 |
+
4188.701127767563,
|
| 883 |
+
4188.713495969772,
|
| 884 |
+
4183.693524837494,
|
| 885 |
+
4183.706875085831,
|
| 886 |
+
4182.603164672852,
|
| 887 |
+
4182.169225692749,
|
| 888 |
+
4182.182250261307,
|
| 889 |
+
4183.1377918720245,
|
| 890 |
+
4183.142628669739,
|
| 891 |
+
4179.616315603256,
|
| 892 |
+
4179.626562833786,
|
| 893 |
+
4304.994537830353,
|
| 894 |
+
4305.007478475571,
|
| 895 |
+
4361.554908275604,
|
| 896 |
+
4361.56044960022,
|
| 897 |
+
4368.104673624039,
|
| 898 |
+
4368.11031460762,
|
| 899 |
+
4246.525162935257,
|
| 900 |
+
4246.5380046367645,
|
| 901 |
+
4183.925352096558,
|
| 902 |
+
4232.265904188156,
|
| 903 |
+
4232.277180671692,
|
| 904 |
+
4238.892568349838,
|
| 905 |
+
4238.905729055405,
|
| 906 |
+
4187.827491521835,
|
| 907 |
+
4187.84108877182,
|
| 908 |
+
4190.126079082489,
|
| 909 |
+
4190.13965845108,
|
| 910 |
+
4190.435103654861,
|
| 911 |
+
4190.440406799316,
|
| 912 |
+
4191.884477853775,
|
| 913 |
+
4191.897578239441,
|
| 914 |
+
4187.4977107048035,
|
| 915 |
+
4172.838095903397,
|
| 916 |
+
4172.843760967255,
|
| 917 |
+
4177.684302330017,
|
| 918 |
+
4177.6969130039215,
|
| 919 |
+
4172.654875993729,
|
| 920 |
+
4172.667930603027,
|
| 921 |
+
4174.483522415161,
|
| 922 |
+
4174.496375083923,
|
| 923 |
+
4166.372047901154,
|
| 924 |
+
4166.384793281555,
|
| 925 |
+
4283.736061811447,
|
| 926 |
+
4257.7525935173035,
|
| 927 |
+
4257.7630007267,
|
| 928 |
+
4203.545964479446,
|
| 929 |
+
4203.558753013611,
|
| 930 |
+
4198.144237518311,
|
| 931 |
+
4198.157437801361,
|
| 932 |
+
4194.472889661789,
|
| 933 |
+
4194.487104177475,
|
| 934 |
+
4197.728852272034,
|
| 935 |
+
4197.739155769348,
|
| 936 |
+
4202.638717889786,
|
| 937 |
+
4179.445859909058,
|
| 938 |
+
4179.456418514252,
|
| 939 |
+
4170.633600950241,
|
| 940 |
+
4170.638606786728,
|
| 941 |
+
4173.595223903656,
|
| 942 |
+
4345.430767297745,
|
| 943 |
+
4345.4413626194,
|
| 944 |
+
4403.088153839111,
|
| 945 |
+
4403.099495649338,
|
| 946 |
+
4243.333677768707,
|
| 947 |
+
4243.347104310989,
|
| 948 |
+
4341.46756529808,
|
| 949 |
+
4341.480928659439,
|
| 950 |
+
4317.847608089447,
|
| 951 |
+
4317.858085870743,
|
| 952 |
+
4196.552426815033,
|
| 953 |
+
4196.5675711631775,
|
| 954 |
+
4192.795216798782,
|
| 955 |
+
4192.80850481987,
|
| 956 |
+
4247.769198179245,
|
| 957 |
+
4247.783056497574,
|
| 958 |
+
4450.5884919166565,
|
| 959 |
+
4450.602509021759,
|
| 960 |
+
4386.362091779709,
|
| 961 |
+
4386.375445127487,
|
| 962 |
+
4194.8344893455505,
|
| 963 |
+
4194.847893476486,
|
| 964 |
+
4365.374780893326,
|
| 965 |
+
4365.388372182846,
|
| 966 |
+
4594.672197341919,
|
| 967 |
+
4594.6835501194,
|
| 968 |
+
4450.229032039642,
|
| 969 |
+
4450.242944955826,
|
| 970 |
+
4768.1948499679565,
|
| 971 |
+
4768.208532333374,
|
| 972 |
+
4320.927686691284,
|
| 973 |
+
4320.932461023331,
|
| 974 |
+
4389.044877767563,
|
| 975 |
+
4389.060523271561,
|
| 976 |
+
4506.545570850372,
|
| 977 |
+
4506.56330370903,
|
| 978 |
+
4187.451607465744,
|
| 979 |
+
4492.475999116898,
|
| 980 |
+
4492.487664937973,
|
| 981 |
+
4207.333253145218,
|
| 982 |
+
4207.347226142883,
|
| 983 |
+
4454.522627592087,
|
| 984 |
+
4417.526381015778,
|
| 985 |
+
4195.074825525284,
|
| 986 |
+
4195.089487314224,
|
| 987 |
+
4224.457670927048,
|
| 988 |
+
4224.472229957581,
|
| 989 |
+
4764.19175863266,
|
| 990 |
+
4764.202345132828,
|
| 991 |
+
4315.793431043625,
|
| 992 |
+
4315.799200534821,
|
| 993 |
+
4265.365842103958,
|
| 994 |
+
4252.945762634277,
|
| 995 |
+
4478.979041814804,
|
| 996 |
+
4478.992881536484,
|
| 997 |
+
4318.3227870464325,
|
| 998 |
+
4318.337471246719,
|
| 999 |
+
4319.654689788818,
|
| 1000 |
+
4319.666926622391,
|
| 1001 |
+
4320.253043174744,
|
| 1002 |
+
4320.26652598381,
|
| 1003 |
+
4316.997335195541,
|
| 1004 |
+
4317.007848501205,
|
| 1005 |
+
4317.8134751319885,
|
| 1006 |
+
4317.827590227127,
|
| 1007 |
+
4315.411971092224,
|
| 1008 |
+
4315.42355298996,
|
| 1009 |
+
4325.969897270203,
|
| 1010 |
+
4325.972640752792,
|
| 1011 |
+
4311.006960868835,
|
| 1012 |
+
4311.017538309097,
|
| 1013 |
+
4324.960598230362,
|
| 1014 |
+
3680.7179527282715,
|
| 1015 |
+
3680.7326424121857,
|
| 1016 |
+
3721.3555817604065,
|
| 1017 |
+
3721.3586716651917,
|
| 1018 |
+
3818.7410044670105,
|
| 1019 |
+
3818.744511604309,
|
| 1020 |
+
3689.685672521591,
|
| 1021 |
+
3689.7003977298737,
|
| 1022 |
+
3688.9338262081146,
|
| 1023 |
+
3688.9487912654877,
|
| 1024 |
+
3734.376760005951,
|
| 1025 |
+
3734.3914697170258,
|
| 1026 |
+
3721.3628540039062,
|
| 1027 |
+
3721.37668967247,
|
| 1028 |
+
3655.3936855793,
|
| 1029 |
+
3655.4077792167664,
|
| 1030 |
+
3610.097437620163,
|
| 1031 |
+
3610.111466407776,
|
| 1032 |
+
3715.0868566036224,
|
| 1033 |
+
3715.099429130554,
|
| 1034 |
+
3636.3001956939697,
|
| 1035 |
+
3636.3086059093475,
|
| 1036 |
+
3668.0241372585297,
|
| 1037 |
+
3668.034808397293,
|
| 1038 |
+
3659.740085363388,
|
| 1039 |
+
3659.7512934207916,
|
| 1040 |
+
3611.7954156398773,
|
| 1041 |
+
3611.810293197632,
|
| 1042 |
+
3611.7872862815857,
|
| 1043 |
+
3611.802482843399,
|
| 1044 |
+
3612.5097110271454,
|
| 1045 |
+
3612.520439386368,
|
| 1046 |
+
3609.9256060123444,
|
| 1047 |
+
3609.9406599998474,
|
| 1048 |
+
3615.199702978134,
|
| 1049 |
+
3615.213776111603,
|
| 1050 |
+
3614.617516040802
|
| 1051 |
+
],
|
| 1052 |
+
"valid_loss_history": [
|
| 1053 |
+
-2.2420080729893277,
|
| 1054 |
+
-3.6040473665509904,
|
| 1055 |
+
-4.652349131447928,
|
| 1056 |
+
-5.269411563873291,
|
| 1057 |
+
-5.602223873138428,
|
| 1058 |
+
-5.948959009987967,
|
| 1059 |
+
-6.180064678192139,
|
| 1060 |
+
-6.373329707554409,
|
| 1061 |
+
-6.4635710035051614,
|
| 1062 |
+
-6.628378936222622,
|
| 1063 |
+
-6.765629632132394,
|
| 1064 |
+
-6.878908634185791,
|
| 1065 |
+
-6.975889819008963,
|
| 1066 |
+
-7.089849744524274,
|
| 1067 |
+
-7.137168339320591,
|
| 1068 |
+
-7.214839458465576,
|
| 1069 |
+
-7.248862539018903,
|
| 1070 |
+
-7.323270389011928,
|
| 1071 |
+
-7.374068532671247,
|
| 1072 |
+
-7.447478975568499,
|
| 1073 |
+
-7.470496041434152,
|
| 1074 |
+
-7.578763212476458,
|
| 1075 |
+
-7.638515608651297,
|
| 1076 |
+
-7.603791032518659,
|
| 1077 |
+
-7.658165522984096,
|
| 1078 |
+
-7.660087721688407,
|
| 1079 |
+
-7.711926255907331,
|
| 1080 |
+
-7.763034411839077,
|
| 1081 |
+
-7.80566440309797,
|
| 1082 |
+
-7.829599516732352,
|
| 1083 |
+
-7.908110482352121,
|
| 1084 |
+
-7.871029717581613,
|
| 1085 |
+
-7.790640013558524,
|
| 1086 |
+
-7.807113443102155,
|
| 1087 |
+
-7.826304980686733,
|
| 1088 |
+
-7.77531235558646,
|
| 1089 |
+
-7.879563399723598,
|
| 1090 |
+
-7.897988796234131,
|
| 1091 |
+
-7.845814909253802,
|
| 1092 |
+
-7.848473821367536,
|
| 1093 |
+
-7.912371976034982,
|
| 1094 |
+
-7.943405968802316,
|
| 1095 |
+
-8.085525648934501,
|
| 1096 |
+
-8.010899543762207,
|
| 1097 |
+
-8.028815746307373,
|
| 1098 |
+
-8.061845302581787,
|
| 1099 |
+
-8.02747140611921,
|
| 1100 |
+
-8.03413268498012,
|
| 1101 |
+
-8.033596924373082,
|
| 1102 |
+
-8.068816934313093,
|
| 1103 |
+
-8.067536762782506,
|
| 1104 |
+
-8.144167695726667,
|
| 1105 |
+
-8.148260184696742,
|
| 1106 |
+
-8.180625711168561,
|
| 1107 |
+
-8.180845873696464,
|
| 1108 |
+
-8.25086770738874,
|
| 1109 |
+
-8.261961323874337,
|
| 1110 |
+
-8.260808059147426,
|
| 1111 |
+
-8.186679295131139,
|
| 1112 |
+
-8.165157794952393,
|
| 1113 |
+
-8.194125039236885,
|
| 1114 |
+
-8.254536492483956,
|
| 1115 |
+
-8.292360033307757,
|
| 1116 |
+
-8.267435346330915,
|
| 1117 |
+
-8.27747208731515,
|
| 1118 |
+
-8.366285255977086,
|
| 1119 |
+
-8.354675361088344,
|
| 1120 |
+
-8.365063190460205,
|
| 1121 |
+
-8.427791595458984,
|
| 1122 |
+
-8.452910355159215,
|
| 1123 |
+
-8.395057133265905,
|
| 1124 |
+
-8.455147879464286,
|
| 1125 |
+
-8.485073634556361,
|
| 1126 |
+
-8.504877976008824,
|
| 1127 |
+
-8.502339363098145,
|
| 1128 |
+
-8.485261576516288,
|
| 1129 |
+
-8.50761045728411,
|
| 1130 |
+
-8.482435567038399,
|
| 1131 |
+
-8.516456604003906,
|
| 1132 |
+
-8.503895146506173,
|
| 1133 |
+
-8.515655858176094,
|
| 1134 |
+
-8.574515002114433,
|
| 1135 |
+
-8.580681255885533,
|
| 1136 |
+
-8.593669959477015,
|
| 1137 |
+
-8.538264206477574,
|
| 1138 |
+
-8.570460319519043,
|
| 1139 |
+
-8.610838617597308,
|
| 1140 |
+
-8.576563426426478,
|
| 1141 |
+
-8.631826945713588,
|
| 1142 |
+
-8.593990189688546,
|
| 1143 |
+
-8.584804126194545,
|
| 1144 |
+
-8.616937228611537,
|
| 1145 |
+
-8.616405078342982,
|
| 1146 |
+
-8.636415685926165,
|
| 1147 |
+
-8.736162253788539,
|
| 1148 |
+
-8.684600080762591,
|
| 1149 |
+
-8.751097747257777,
|
| 1150 |
+
-8.744481086730957,
|
| 1151 |
+
-8.760670593806676,
|
| 1152 |
+
-8.81410721370152,
|
| 1153 |
+
-8.762031418936592,
|
| 1154 |
+
-8.731195313589913,
|
| 1155 |
+
-8.680067879813057,
|
| 1156 |
+
-8.73148284639631,
|
| 1157 |
+
-8.770104340144567,
|
| 1158 |
+
-8.83363403592791,
|
| 1159 |
+
-8.797364848000663,
|
| 1160 |
+
-8.756126131330218,
|
| 1161 |
+
-8.717773846217565,
|
| 1162 |
+
-8.755549158368792,
|
| 1163 |
+
-8.798967293330602,
|
| 1164 |
+
-8.80781262261527,
|
| 1165 |
+
-8.879967212677002,
|
| 1166 |
+
-8.83057907649449,
|
| 1167 |
+
-8.910664354051862,
|
| 1168 |
+
-8.930669920785087,
|
| 1169 |
+
-8.850233895438057,
|
| 1170 |
+
-8.87684679031372,
|
| 1171 |
+
-8.860790797642299,
|
| 1172 |
+
-8.854635306767054,
|
| 1173 |
+
-8.871529306684222,
|
| 1174 |
+
-8.870055334908622,
|
| 1175 |
+
-8.814562388828822,
|
| 1176 |
+
-8.895111628941127,
|
| 1177 |
+
-8.95235286440168,
|
| 1178 |
+
-8.978583880833217,
|
| 1179 |
+
-8.970093931470599,
|
| 1180 |
+
-8.94366032736642,
|
| 1181 |
+
-8.930564199175153,
|
| 1182 |
+
-8.896938255855016,
|
| 1183 |
+
-9.003027439117432,
|
| 1184 |
+
-8.967686380658831,
|
| 1185 |
+
-8.945790427071708,
|
| 1186 |
+
-8.978134904588972,
|
| 1187 |
+
-8.926983833312988,
|
| 1188 |
+
-8.911829403468541,
|
| 1189 |
+
-9.004649843488421,
|
| 1190 |
+
-8.982011726924352,
|
| 1191 |
+
-9.004248074122838,
|
| 1192 |
+
-9.022075244358607,
|
| 1193 |
+
-9.055972508021764,
|
| 1194 |
+
-9.095445496695381,
|
| 1195 |
+
-9.014348983764648,
|
| 1196 |
+
-9.017100266047887,
|
| 1197 |
+
-9.06740631375994,
|
| 1198 |
+
-9.062205382755824,
|
| 1199 |
+
-9.006571020398821,
|
| 1200 |
+
-9.060756206512451,
|
| 1201 |
+
-9.114073821476527,
|
| 1202 |
+
-9.12088053567069,
|
| 1203 |
+
-9.146572181156703,
|
| 1204 |
+
-9.129499162946429,
|
| 1205 |
+
-9.162499564034599,
|
| 1206 |
+
-9.146372726985387,
|
| 1207 |
+
-9.138916151864189,
|
| 1208 |
+
-9.140360014779228,
|
| 1209 |
+
-9.14337342126029,
|
| 1210 |
+
-9.13001537322998,
|
| 1211 |
+
-9.089552674974714,
|
| 1212 |
+
-9.172866821289062,
|
| 1213 |
+
-9.200943265642438,
|
| 1214 |
+
-9.191112245832171,
|
| 1215 |
+
-9.207633904048375,
|
| 1216 |
+
-9.147029059273857,
|
| 1217 |
+
-9.17673145021711,
|
| 1218 |
+
-9.129148755754743,
|
| 1219 |
+
-9.157607623508998,
|
| 1220 |
+
-9.13064786366054,
|
| 1221 |
+
-9.154420512063163,
|
| 1222 |
+
-9.181631565093994,
|
| 1223 |
+
-9.155359063829694,
|
| 1224 |
+
-9.158296721322197,
|
| 1225 |
+
-9.156671251569476,
|
| 1226 |
+
-9.154706001281738,
|
| 1227 |
+
-9.167226382664271,
|
| 1228 |
+
-9.163607052394322,
|
| 1229 |
+
-9.209595475878034,
|
| 1230 |
+
-9.310745784214564,
|
| 1231 |
+
-9.238739694867816,
|
| 1232 |
+
-9.288273334503174,
|
| 1233 |
+
-9.2847033228193,
|
| 1234 |
+
-9.313508306230817,
|
| 1235 |
+
-9.334877354758126,
|
| 1236 |
+
-9.270281859806605,
|
| 1237 |
+
-9.189015797206334,
|
| 1238 |
+
-9.247245516095843,
|
| 1239 |
+
-9.272651195526123,
|
| 1240 |
+
-9.430454867226738,
|
| 1241 |
+
-9.431772300175258,
|
| 1242 |
+
-9.406911509377617,
|
| 1243 |
+
-9.434791496821813,
|
| 1244 |
+
-9.40122835976737,
|
| 1245 |
+
-9.331563881465367,
|
| 1246 |
+
-9.266850130898613,
|
| 1247 |
+
-9.263189588274274,
|
| 1248 |
+
-9.341036796569824,
|
| 1249 |
+
-9.302794524601527,
|
| 1250 |
+
-9.364838123321533,
|
| 1251 |
+
-9.468104021889824,
|
| 1252 |
+
-9.427109173366002,
|
| 1253 |
+
-9.488504341670446,
|
| 1254 |
+
-9.461405617850167,
|
| 1255 |
+
-9.434092794145856,
|
| 1256 |
+
-9.448193890707833,
|
| 1257 |
+
-9.491405623299736,
|
| 1258 |
+
-9.586788518088204,
|
| 1259 |
+
-9.494200706481934,
|
| 1260 |
+
-9.47681747164045,
|
| 1261 |
+
-9.457686015537806,
|
| 1262 |
+
-9.591959748949323,
|
| 1263 |
+
-9.581428391592842,
|
| 1264 |
+
-9.579002380371094,
|
| 1265 |
+
-9.538570063454765,
|
| 1266 |
+
-9.59873376573835,
|
| 1267 |
+
-9.606725556509835,
|
| 1268 |
+
-9.610026700156075,
|
| 1269 |
+
-9.668677466256279,
|
| 1270 |
+
-9.631781101226807,
|
| 1271 |
+
-9.603316238948278,
|
| 1272 |
+
-9.663758277893066,
|
| 1273 |
+
-9.63963794708252,
|
| 1274 |
+
-9.662949085235596,
|
| 1275 |
+
-9.706490448543004,
|
| 1276 |
+
-9.720975807734899,
|
| 1277 |
+
-9.734819480351039,
|
| 1278 |
+
-9.786265100751605,
|
| 1279 |
+
-9.737053121839251,
|
| 1280 |
+
-9.700168677738734,
|
| 1281 |
+
-9.778143337794713,
|
| 1282 |
+
-9.780944415501185,
|
| 1283 |
+
-9.77445820399693,
|
| 1284 |
+
-9.772279262542725,
|
| 1285 |
+
-9.786255019051689,
|
| 1286 |
+
-9.787315436771937,
|
| 1287 |
+
-9.809428351266044,
|
| 1288 |
+
-9.77673625946045,
|
| 1289 |
+
-9.78390223639352,
|
| 1290 |
+
-9.798577308654785,
|
| 1291 |
+
-9.79927212851388,
|
| 1292 |
+
-9.762826034000941,
|
| 1293 |
+
-9.76457827431815,
|
| 1294 |
+
-9.798729487827845,
|
| 1295 |
+
-9.776831013815743,
|
| 1296 |
+
-9.773336342402867,
|
| 1297 |
+
-9.794628483908516,
|
| 1298 |
+
-9.795281887054443,
|
| 1299 |
+
-9.76816953931536,
|
| 1300 |
+
-9.776653221675328,
|
| 1301 |
+
-7.936585630689349,
|
| 1302 |
+
-9.18690013885498,
|
| 1303 |
+
-9.344774450574603,
|
| 1304 |
+
-9.366782733372279,
|
| 1305 |
+
-9.370089326586042,
|
| 1306 |
+
-9.374211038861956,
|
| 1307 |
+
-9.36532722200666,
|
| 1308 |
+
-9.383607932499476,
|
| 1309 |
+
-9.390820026397705,
|
| 1310 |
+
-9.426494870867048,
|
| 1311 |
+
-9.435582705906459,
|
| 1312 |
+
-9.458767277853829,
|
| 1313 |
+
-9.49565941946847,
|
| 1314 |
+
-9.505523000444684,
|
| 1315 |
+
-9.5315888268607,
|
| 1316 |
+
-9.545233454023089,
|
| 1317 |
+
-9.539818559374128,
|
| 1318 |
+
-9.561526230403356,
|
| 1319 |
+
-9.572482517787389
|
| 1320 |
+
]
|
| 1321 |
+
}
|
MedleyVox-MultiSinger/singing_librispeech_iSRNet/loss_graph_vocals.png
ADDED
|
MedleyVox-MultiSinger/singing_librispeech_iSRNet/vocals.json
ADDED
|
@@ -0,0 +1,1180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"args": {
|
| 3 |
+
"above_freq": 3000.0,
|
| 4 |
+
"architecture": "conv_tasnet_stft",
|
| 5 |
+
"batch_size": 5,
|
| 6 |
+
"beta1": 0.5,
|
| 7 |
+
"beta2": 0.9,
|
| 8 |
+
"bn_chan": 256,
|
| 9 |
+
"continual_train": true,
|
| 10 |
+
"dataset": "singing_librispeech",
|
| 11 |
+
"db_normalize": false,
|
| 12 |
+
"ema": true,
|
| 13 |
+
"encoder_activation": null,
|
| 14 |
+
"entity": "carson2050",
|
| 15 |
+
"epochs": 230,
|
| 16 |
+
"eps": 1e-08,
|
| 17 |
+
"exp_name": "singing_librispeech_iSRNet",
|
| 18 |
+
"ff_activation": "relu",
|
| 19 |
+
"gpu": 0,
|
| 20 |
+
"gradient_clip": 5.0,
|
| 21 |
+
"hid_chan": 1024,
|
| 22 |
+
"load_ema_online_model": false,
|
| 23 |
+
"lr": 3e-05,
|
| 24 |
+
"lr_decay_gamma": 0.5,
|
| 25 |
+
"lr_decay_patience": 6,
|
| 26 |
+
"lr_scheduler": "step_lr",
|
| 27 |
+
"mask_act": "linear",
|
| 28 |
+
"max_n_src": 4,
|
| 29 |
+
"min_n_src": 2,
|
| 30 |
+
"mixed_precision": false,
|
| 31 |
+
"mixture_consistency": "sfsrnet",
|
| 32 |
+
"multi_spec_loss_log_scale": false,
|
| 33 |
+
"n_blocks": 6,
|
| 34 |
+
"n_filter": 512,
|
| 35 |
+
"n_kernel": 512,
|
| 36 |
+
"n_nodes": 1,
|
| 37 |
+
"n_repeats": 4,
|
| 38 |
+
"n_src": 2,
|
| 39 |
+
"nb_workers": 10,
|
| 40 |
+
"nfft": 2048,
|
| 41 |
+
"ngpus_per_node": 1,
|
| 42 |
+
"nhop": 512,
|
| 43 |
+
"no_cuda": false,
|
| 44 |
+
"no_mask": false,
|
| 45 |
+
"no_mask_residual": false,
|
| 46 |
+
"optimizer": "adam",
|
| 47 |
+
"output": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep/checkpoint/singing_librispeech_iSRNet",
|
| 48 |
+
"output_directory": "/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/MedleyVox/results/singing_sep",
|
| 49 |
+
"part_of_data": null,
|
| 50 |
+
"patience": 15,
|
| 51 |
+
"pitch_formant_augment_prob": 0.4,
|
| 52 |
+
"port": null,
|
| 53 |
+
"project": "MedleyVox_home",
|
| 54 |
+
"quiet": false,
|
| 55 |
+
"rank": 0,
|
| 56 |
+
"reduced_training_data_ratio": 0.1,
|
| 57 |
+
"resume": "results/singing_sep/checkpoint/singing_librispeech_iSRNet",
|
| 58 |
+
"same_singer_dict_path": [
|
| 59 |
+
[
|
| 60 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/OpenSinger",
|
| 61 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_OpenSinger.json",
|
| 62 |
+
"OpenSinger"
|
| 63 |
+
],
|
| 64 |
+
[
|
| 65 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multisinger",
|
| 66 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multisinger.json",
|
| 67 |
+
"k_multisinger"
|
| 68 |
+
],
|
| 69 |
+
[
|
| 70 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/CSD",
|
| 71 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_CSD.json",
|
| 72 |
+
"CSD"
|
| 73 |
+
],
|
| 74 |
+
[
|
| 75 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jsut-song_ver1",
|
| 76 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jsut-song_ver1.json",
|
| 77 |
+
"jsut-song_ver1"
|
| 78 |
+
],
|
| 79 |
+
[
|
| 80 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jvs_music_ver1",
|
| 81 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_jvs_music_ver1.json",
|
| 82 |
+
"jvs_music_ver1"
|
| 83 |
+
],
|
| 84 |
+
[
|
| 85 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multitimbre",
|
| 86 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_k_multitimbre.json",
|
| 87 |
+
"k_multitimbre"
|
| 88 |
+
],
|
| 89 |
+
[
|
| 90 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/kiritan_revised",
|
| 91 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_kiritan.json",
|
| 92 |
+
"kiritan"
|
| 93 |
+
],
|
| 94 |
+
[
|
| 95 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_train",
|
| 96 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_musdb_a_train.json",
|
| 97 |
+
"musdb_a_train"
|
| 98 |
+
],
|
| 99 |
+
[
|
| 100 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/NUS",
|
| 101 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_NUS.json",
|
| 102 |
+
"NUS"
|
| 103 |
+
],
|
| 104 |
+
[
|
| 105 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/VocalSet",
|
| 106 |
+
"./svs/preprocess/make_same_singer_dict/same_singer_VocalSet.json",
|
| 107 |
+
"VocalSet"
|
| 108 |
+
]
|
| 109 |
+
],
|
| 110 |
+
"same_singer_ratio": 0.2,
|
| 111 |
+
"same_song_dict_path": [
|
| 112 |
+
[
|
| 113 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multisinger",
|
| 114 |
+
"./svs/preprocess/make_same_song_dict/same_song_k_multisinger.json",
|
| 115 |
+
"k_multisinger"
|
| 116 |
+
]
|
| 117 |
+
],
|
| 118 |
+
"same_song_ratio": 0.2,
|
| 119 |
+
"same_speaker_dict_path": [
|
| 120 |
+
[
|
| 121 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-100",
|
| 122 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-100.json",
|
| 123 |
+
"LibriSpeech_train-clean-100"
|
| 124 |
+
],
|
| 125 |
+
[
|
| 126 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-360",
|
| 127 |
+
"./svs/preprocess/make_same_speaker_dict/same_singer_LibriSpeech_train-clean-360.json",
|
| 128 |
+
"LibriSpeech_train-clean-360"
|
| 129 |
+
]
|
| 130 |
+
],
|
| 131 |
+
"same_speaker_ratio": 0.15,
|
| 132 |
+
"sample_rate": 24000,
|
| 133 |
+
"seed": 777,
|
| 134 |
+
"seq_dur": 3.0,
|
| 135 |
+
"sing_sing_ratio": 0.15,
|
| 136 |
+
"sing_speech_ratio": 0.15,
|
| 137 |
+
"skip_chan": 256,
|
| 138 |
+
"song_length_dict_path": "./svs/preprocess/song_length_dict_24k.json",
|
| 139 |
+
"speech_train_root": [
|
| 140 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-360",
|
| 141 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_train-clean-100"
|
| 142 |
+
],
|
| 143 |
+
"sr_input_res": false,
|
| 144 |
+
"sr_out_mix_consistency": false,
|
| 145 |
+
"srnet": "convnext",
|
| 146 |
+
"start_from_best": true,
|
| 147 |
+
"sweep": false,
|
| 148 |
+
"target": "vocals",
|
| 149 |
+
"train_loss_func": [
|
| 150 |
+
"pit_snr",
|
| 151 |
+
"multi_spectral_l1",
|
| 152 |
+
"snr"
|
| 153 |
+
],
|
| 154 |
+
"train_root": [
|
| 155 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/CSD",
|
| 156 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/NUS",
|
| 157 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/TONAS",
|
| 158 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/VocalSet",
|
| 159 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jsut-song_ver1",
|
| 160 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/jvs_music_ver1",
|
| 161 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/kiritan_revised",
|
| 162 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/vocadito",
|
| 163 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_train",
|
| 164 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/OpenSinger",
|
| 165 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/medleyDB_v1_in_musdb",
|
| 166 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multisinger",
|
| 167 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/k_multitimbre"
|
| 168 |
+
],
|
| 169 |
+
"unison_prob": 0.3,
|
| 170 |
+
"use_wandb": true,
|
| 171 |
+
"valid_loss_func": [
|
| 172 |
+
"pit_si_sdr"
|
| 173 |
+
],
|
| 174 |
+
"valid_regions_dict_path": "./svs/preprocess/valid_regions_dict_singing_singing.json",
|
| 175 |
+
"valid_root": [
|
| 176 |
+
[
|
| 177 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 178 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 179 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing.json",
|
| 180 |
+
"sing_sing_diff"
|
| 181 |
+
],
|
| 182 |
+
[
|
| 183 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 184 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 185 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_unison.json",
|
| 186 |
+
"sing_sing_unison"
|
| 187 |
+
],
|
| 188 |
+
[
|
| 189 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 190 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 191 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_singing_same_singer.json",
|
| 192 |
+
"sing_sing_same_singer"
|
| 193 |
+
],
|
| 194 |
+
[
|
| 195 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 196 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 197 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech.json",
|
| 198 |
+
"speech_speech_diff"
|
| 199 |
+
],
|
| 200 |
+
[
|
| 201 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 202 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 203 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_unison.json",
|
| 204 |
+
"speech_speech_unison"
|
| 205 |
+
],
|
| 206 |
+
[
|
| 207 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 208 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 209 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_speech_speech_same_speaker.json",
|
| 210 |
+
"speech_speech_same_speaker"
|
| 211 |
+
],
|
| 212 |
+
[
|
| 213 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 214 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 215 |
+
"./svs/preprocess/make_validation_dict/for_2_srcs/valid_regions_dict_singing_speech.json",
|
| 216 |
+
"singing_speech"
|
| 217 |
+
]
|
| 218 |
+
],
|
| 219 |
+
"valid_root_orpit": [
|
| 220 |
+
[
|
| 221 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 222 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 223 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_n_srcs.json",
|
| 224 |
+
"sing_sing_diff"
|
| 225 |
+
],
|
| 226 |
+
[
|
| 227 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 228 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 229 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_unison_n_srcs.json",
|
| 230 |
+
"sing_sing_unison"
|
| 231 |
+
],
|
| 232 |
+
[
|
| 233 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 234 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 235 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_singing_same_singer_n_srcs.json",
|
| 236 |
+
"sing_sing_same_singer"
|
| 237 |
+
],
|
| 238 |
+
[
|
| 239 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 240 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 241 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_n_srcs.json",
|
| 242 |
+
"speech_speech_diff"
|
| 243 |
+
],
|
| 244 |
+
[
|
| 245 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 246 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 247 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_unison_n_srcs.json",
|
| 248 |
+
"speech_speech_unison"
|
| 249 |
+
],
|
| 250 |
+
[
|
| 251 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 252 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 253 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_speech_speech_same_speaker_n_srcs.json",
|
| 254 |
+
"speech_speech_same_speaker"
|
| 255 |
+
],
|
| 256 |
+
[
|
| 257 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/musdb_a_test",
|
| 258 |
+
"/media/carson/80AC3E70AC3E60B8/Users/Carson/Documents/data/24k/LibriSpeech_dev-clean",
|
| 259 |
+
"./svs/preprocess/make_validation_dict/for_n_srcs/valid_regions_dict_singing_speech_n_srcs.json",
|
| 260 |
+
"singing_speech"
|
| 261 |
+
]
|
| 262 |
+
],
|
| 263 |
+
"weight_decay": 1e-06,
|
| 264 |
+
"world_size": 1
|
| 265 |
+
},
|
| 266 |
+
"best_epoch": 216,
|
| 267 |
+
"best_loss": -9.209092957632881,
|
| 268 |
+
"epochs_trained": 230,
|
| 269 |
+
"num_bad_epochs": 14,
|
| 270 |
+
"train_loss_history": [
|
| 271 |
+
-1.3311041593551636,
|
| 272 |
+
-3.4447357654571533,
|
| 273 |
+
-4.284253120422363,
|
| 274 |
+
-4.726616382598877,
|
| 275 |
+
-5.099369049072266,
|
| 276 |
+
-5.331325054168701,
|
| 277 |
+
-5.553539752960205,
|
| 278 |
+
-5.740077018737793,
|
| 279 |
+
-5.918744087219238,
|
| 280 |
+
-6.005505561828613,
|
| 281 |
+
-6.201973915100098,
|
| 282 |
+
-6.26826286315918,
|
| 283 |
+
-6.3942413330078125,
|
| 284 |
+
-6.4803619384765625,
|
| 285 |
+
-6.592747688293457,
|
| 286 |
+
-6.6781134605407715,
|
| 287 |
+
-6.777161121368408,
|
| 288 |
+
-6.848526477813721,
|
| 289 |
+
-6.911881923675537,
|
| 290 |
+
-7.017796993255615,
|
| 291 |
+
-7.12304162979126,
|
| 292 |
+
-7.14536190032959,
|
| 293 |
+
-7.289445400238037,
|
| 294 |
+
-7.409412384033203,
|
| 295 |
+
-7.7652082443237305,
|
| 296 |
+
-7.837531089782715,
|
| 297 |
+
-7.850446701049805,
|
| 298 |
+
-7.941095352172852,
|
| 299 |
+
-7.939220428466797,
|
| 300 |
+
-8.047593116760254,
|
| 301 |
+
-8.07531452178955,
|
| 302 |
+
-8.134244918823242,
|
| 303 |
+
-8.143590927124023,
|
| 304 |
+
-8.190814018249512,
|
| 305 |
+
-8.217510223388672,
|
| 306 |
+
-8.175138473510742,
|
| 307 |
+
-7.989644527435303,
|
| 308 |
+
-8.09794807434082,
|
| 309 |
+
-8.24197006225586,
|
| 310 |
+
-8.232804298400879,
|
| 311 |
+
-8.328511238098145,
|
| 312 |
+
-8.389233589172363,
|
| 313 |
+
-8.267472267150879,
|
| 314 |
+
-8.301199913024902,
|
| 315 |
+
-8.36364459991455,
|
| 316 |
+
-8.43917465209961,
|
| 317 |
+
-8.493982315063477,
|
| 318 |
+
-8.481128692626953,
|
| 319 |
+
-8.429868698120117,
|
| 320 |
+
-8.501734733581543,
|
| 321 |
+
-8.54090404510498,
|
| 322 |
+
-8.568470001220703,
|
| 323 |
+
-8.50845718383789,
|
| 324 |
+
-8.597081184387207,
|
| 325 |
+
-8.513223648071289,
|
| 326 |
+
-8.38924503326416,
|
| 327 |
+
-8.502962112426758,
|
| 328 |
+
-8.518073081970215,
|
| 329 |
+
-8.56679916381836,
|
| 330 |
+
-8.698277473449707,
|
| 331 |
+
-8.630810737609863,
|
| 332 |
+
-8.755276679992676,
|
| 333 |
+
-8.700800895690918,
|
| 334 |
+
-8.74862003326416,
|
| 335 |
+
-8.734071731567383,
|
| 336 |
+
-8.633768081665039,
|
| 337 |
+
-8.633097648620605,
|
| 338 |
+
-8.872031211853027,
|
| 339 |
+
-8.828736305236816,
|
| 340 |
+
-8.753975868225098,
|
| 341 |
+
-8.886126518249512,
|
| 342 |
+
-8.758654594421387,
|
| 343 |
+
-8.883810997009277,
|
| 344 |
+
-8.952722549438477,
|
| 345 |
+
-8.945046424865723,
|
| 346 |
+
-8.907071113586426,
|
| 347 |
+
-8.891634941101074,
|
| 348 |
+
-8.91631031036377,
|
| 349 |
+
-8.951156616210938,
|
| 350 |
+
-8.931319236755371,
|
| 351 |
+
-8.960397720336914,
|
| 352 |
+
-8.841835975646973,
|
| 353 |
+
-8.834044456481934,
|
| 354 |
+
-8.786222457885742,
|
| 355 |
+
-8.903646469116211,
|
| 356 |
+
-8.947869300842285,
|
| 357 |
+
-8.696074485778809,
|
| 358 |
+
-8.99515438079834,
|
| 359 |
+
-9.005078315734863,
|
| 360 |
+
-8.934849739074707,
|
| 361 |
+
-8.99370002746582,
|
| 362 |
+
-9.030400276184082,
|
| 363 |
+
-9.101688385009766,
|
| 364 |
+
-9.08572006225586,
|
| 365 |
+
-9.075435638427734,
|
| 366 |
+
-9.125774383544922,
|
| 367 |
+
-9.102258682250977,
|
| 368 |
+
-9.160833358764648,
|
| 369 |
+
-8.999387741088867,
|
| 370 |
+
-8.929178237915039,
|
| 371 |
+
-9.085306167602539,
|
| 372 |
+
-9.149312019348145,
|
| 373 |
+
-9.201435089111328,
|
| 374 |
+
-9.119452476501465,
|
| 375 |
+
-9.192963600158691,
|
| 376 |
+
-9.153352737426758,
|
| 377 |
+
-9.16665267944336,
|
| 378 |
+
-9.187670707702637,
|
| 379 |
+
-9.213151931762695,
|
| 380 |
+
-9.295731544494629,
|
| 381 |
+
-9.204228401184082,
|
| 382 |
+
-9.2329683303833,
|
| 383 |
+
-9.198917388916016,
|
| 384 |
+
-9.242225646972656,
|
| 385 |
+
-9.251509666442871,
|
| 386 |
+
-9.233222007751465,
|
| 387 |
+
-9.235602378845215,
|
| 388 |
+
-9.264388084411621,
|
| 389 |
+
-9.286247253417969,
|
| 390 |
+
-9.287186622619629,
|
| 391 |
+
-9.327977180480957,
|
| 392 |
+
-9.304702758789062,
|
| 393 |
+
-9.34760570526123,
|
| 394 |
+
-9.314836502075195,
|
| 395 |
+
-9.300081253051758,
|
| 396 |
+
-9.20028018951416,
|
| 397 |
+
-9.35509967803955,
|
| 398 |
+
-9.345370292663574,
|
| 399 |
+
-9.36442756652832,
|
| 400 |
+
-9.351317405700684,
|
| 401 |
+
-9.352913856506348,
|
| 402 |
+
-9.388010025024414,
|
| 403 |
+
-9.326189994812012,
|
| 404 |
+
-9.411141395568848,
|
| 405 |
+
-9.424927711486816,
|
| 406 |
+
-9.376615524291992,
|
| 407 |
+
-9.394768714904785,
|
| 408 |
+
-9.382343292236328,
|
| 409 |
+
-9.345908164978027,
|
| 410 |
+
-9.387025833129883,
|
| 411 |
+
-9.397958755493164,
|
| 412 |
+
-9.370079040527344,
|
| 413 |
+
-9.419344902038574,
|
| 414 |
+
-9.414657592773438,
|
| 415 |
+
-9.450013160705566,
|
| 416 |
+
-9.424891471862793,
|
| 417 |
+
-9.468652725219727,
|
| 418 |
+
-9.437067031860352,
|
| 419 |
+
-9.452010154724121,
|
| 420 |
+
-9.476055145263672,
|
| 421 |
+
-9.454631805419922,
|
| 422 |
+
-9.519726753234863,
|
| 423 |
+
-9.494053840637207,
|
| 424 |
+
-9.349456787109375,
|
| 425 |
+
-9.444249153137207,
|
| 426 |
+
-9.432062149047852,
|
| 427 |
+
-9.469500541687012,
|
| 428 |
+
-9.506385803222656,
|
| 429 |
+
-9.541167259216309,
|
| 430 |
+
-9.514572143554688,
|
| 431 |
+
-9.517498016357422,
|
| 432 |
+
-9.508042335510254,
|
| 433 |
+
-9.524667739868164,
|
| 434 |
+
-9.513023376464844,
|
| 435 |
+
-9.518259048461914,
|
| 436 |
+
-9.491355895996094,
|
| 437 |
+
-9.527623176574707,
|
| 438 |
+
-9.503666877746582,
|
| 439 |
+
-9.575556755065918,
|
| 440 |
+
-9.51135540008545,
|
| 441 |
+
-9.574329376220703,
|
| 442 |
+
-9.559322357177734,
|
| 443 |
+
-9.576539993286133,
|
| 444 |
+
-9.587591171264648,
|
| 445 |
+
-9.615789413452148,
|
| 446 |
+
-9.586484909057617,
|
| 447 |
+
-9.597373008728027,
|
| 448 |
+
-9.565719604492188,
|
| 449 |
+
-9.580348014831543,
|
| 450 |
+
-9.544068336486816,
|
| 451 |
+
-9.576735496520996,
|
| 452 |
+
-9.617915153503418,
|
| 453 |
+
-9.634200096130371,
|
| 454 |
+
-9.50833511352539,
|
| 455 |
+
-9.633086204528809,
|
| 456 |
+
-9.622976303100586,
|
| 457 |
+
-9.628181457519531,
|
| 458 |
+
-9.385575294494629,
|
| 459 |
+
-9.312309265136719,
|
| 460 |
+
-8.996809005737305,
|
| 461 |
+
-9.591567993164062,
|
| 462 |
+
-9.602102279663086,
|
| 463 |
+
-9.606905937194824,
|
| 464 |
+
-9.660425186157227,
|
| 465 |
+
-9.59228229522705,
|
| 466 |
+
-9.66215991973877,
|
| 467 |
+
-9.652912139892578,
|
| 468 |
+
-9.683008193969727,
|
| 469 |
+
-9.550703048706055,
|
| 470 |
+
-9.616209983825684,
|
| 471 |
+
-7.74033784866333,
|
| 472 |
+
-8.984460830688477,
|
| 473 |
+
-9.136533737182617,
|
| 474 |
+
-9.426427841186523,
|
| 475 |
+
-9.425248146057129,
|
| 476 |
+
-9.407462120056152,
|
| 477 |
+
-9.523107528686523,
|
| 478 |
+
-9.644789695739746,
|
| 479 |
+
-9.905366897583008,
|
| 480 |
+
-10.454096794128418,
|
| 481 |
+
-10.200395584106445,
|
| 482 |
+
-10.226968765258789,
|
| 483 |
+
-10.290560722351074,
|
| 484 |
+
-10.246292114257812,
|
| 485 |
+
-10.437302589416504,
|
| 486 |
+
-10.198098182678223,
|
| 487 |
+
-10.481816291809082,
|
| 488 |
+
-10.32693862915039,
|
| 489 |
+
-10.341702461242676,
|
| 490 |
+
-10.409103393554688,
|
| 491 |
+
-10.289705276489258,
|
| 492 |
+
-10.6312255859375,
|
| 493 |
+
-10.405678749084473,
|
| 494 |
+
-10.4303617477417,
|
| 495 |
+
-10.45798110961914,
|
| 496 |
+
-10.44364070892334,
|
| 497 |
+
-10.312222480773926,
|
| 498 |
+
-10.264429092407227,
|
| 499 |
+
-10.573458671569824,
|
| 500 |
+
-10.27155590057373
|
| 501 |
+
],
|
| 502 |
+
"train_time_history": [
|
| 503 |
+
4284.811353683472,
|
| 504 |
+
4284.813168525696,
|
| 505 |
+
4239.820109844208,
|
| 506 |
+
4358.5235912799835,
|
| 507 |
+
4358.525362968445,
|
| 508 |
+
4289.520437240601,
|
| 509 |
+
4289.5296330451965,
|
| 510 |
+
4233.677313089371,
|
| 511 |
+
4233.679363965988,
|
| 512 |
+
4209.371140003204,
|
| 513 |
+
4209.381086587906,
|
| 514 |
+
4202.905996799469,
|
| 515 |
+
4469.978202342987,
|
| 516 |
+
4469.989181756973,
|
| 517 |
+
4247.160337924957,
|
| 518 |
+
4247.1704177856445,
|
| 519 |
+
4190.890568256378,
|
| 520 |
+
4190.900403022766,
|
| 521 |
+
4185.636907577515,
|
| 522 |
+
4185.647009372711,
|
| 523 |
+
4180.687466144562,
|
| 524 |
+
4215.30419754982,
|
| 525 |
+
4215.314230442047,
|
| 526 |
+
4206.753845453262,
|
| 527 |
+
4206.76371717453,
|
| 528 |
+
4206.280591726303,
|
| 529 |
+
4206.290879011154,
|
| 530 |
+
4222.331785202026,
|
| 531 |
+
4222.341979503632,
|
| 532 |
+
4220.95298576355,
|
| 533 |
+
4220.962949752808,
|
| 534 |
+
4199.66743016243,
|
| 535 |
+
4199.67768073082,
|
| 536 |
+
4200.696933507919,
|
| 537 |
+
4200.706924915314,
|
| 538 |
+
4200.071183204651,
|
| 539 |
+
4200.073669195175,
|
| 540 |
+
4201.461757183075,
|
| 541 |
+
4201.47197842598,
|
| 542 |
+
4212.675180196762,
|
| 543 |
+
4212.685215473175,
|
| 544 |
+
4266.539958238602,
|
| 545 |
+
4266.55042219162,
|
| 546 |
+
4254.028660058975,
|
| 547 |
+
4254.030869007111,
|
| 548 |
+
4586.545968532562,
|
| 549 |
+
4586.556686401367,
|
| 550 |
+
4486.801070451736,
|
| 551 |
+
4486.811651468277,
|
| 552 |
+
4201.306690454483,
|
| 553 |
+
4201.308066606522,
|
| 554 |
+
4204.077554225922,
|
| 555 |
+
4204.087781906128,
|
| 556 |
+
4194.944247722626,
|
| 557 |
+
4194.954358577728,
|
| 558 |
+
4193.961704969406,
|
| 559 |
+
4222.590797185898,
|
| 560 |
+
4222.594073057175,
|
| 561 |
+
4221.6570999622345,
|
| 562 |
+
4221.666466474533,
|
| 563 |
+
4221.045345544815,
|
| 564 |
+
4221.055670261383,
|
| 565 |
+
4214.11606669426,
|
| 566 |
+
4214.125596284866,
|
| 567 |
+
4479.404296398163,
|
| 568 |
+
4479.414994955063,
|
| 569 |
+
4262.62514591217,
|
| 570 |
+
4262.635618209839,
|
| 571 |
+
4214.268101215363,
|
| 572 |
+
4214.2785403728485,
|
| 573 |
+
4218.142910718918,
|
| 574 |
+
4218.15364408493,
|
| 575 |
+
4215.917347192764,
|
| 576 |
+
4215.927803516388,
|
| 577 |
+
4218.397645950317,
|
| 578 |
+
4218.408536672592,
|
| 579 |
+
4233.58446598053,
|
| 580 |
+
4233.59490442276,
|
| 581 |
+
4318.161808013916,
|
| 582 |
+
4318.171140432358,
|
| 583 |
+
4237.026048898697,
|
| 584 |
+
4237.036669015884,
|
| 585 |
+
4220.925004482269,
|
| 586 |
+
4220.9352016448975,
|
| 587 |
+
4226.221168041229,
|
| 588 |
+
4223.1825070381165,
|
| 589 |
+
4223.192782878876,
|
| 590 |
+
4219.102268218994,
|
| 591 |
+
4219.113127231598,
|
| 592 |
+
4216.297616004944,
|
| 593 |
+
4216.308108329773,
|
| 594 |
+
4217.926244974136,
|
| 595 |
+
4217.937202453613,
|
| 596 |
+
4426.571401119232,
|
| 597 |
+
4426.573066711426,
|
| 598 |
+
4612.790915489197,
|
| 599 |
+
4612.801674365997,
|
| 600 |
+
4719.1595368385315,
|
| 601 |
+
4719.169989824295,
|
| 602 |
+
4305.255445480347,
|
| 603 |
+
4305.266388177872,
|
| 604 |
+
4221.674624681473,
|
| 605 |
+
4221.686189174652,
|
| 606 |
+
4229.138904571533,
|
| 607 |
+
4178.568962574005,
|
| 608 |
+
4178.5717051029205,
|
| 609 |
+
4178.647545337677,
|
| 610 |
+
4178.650447130203,
|
| 611 |
+
4169.984578132629,
|
| 612 |
+
4169.995152950287,
|
| 613 |
+
4173.8019506931305,
|
| 614 |
+
4173.804402589798,
|
| 615 |
+
4179.692799806595,
|
| 616 |
+
4179.695784330368,
|
| 617 |
+
4176.926806688309,
|
| 618 |
+
4176.937863111496,
|
| 619 |
+
4189.7040383815765,
|
| 620 |
+
4189.7144474983215,
|
| 621 |
+
4194.854960680008,
|
| 622 |
+
4194.8661851882935,
|
| 623 |
+
4488.314256668091,
|
| 624 |
+
4488.324142932892,
|
| 625 |
+
4301.72206735611,
|
| 626 |
+
4301.732882022858,
|
| 627 |
+
4203.297667264938,
|
| 628 |
+
4203.307426214218,
|
| 629 |
+
4212.263510465622,
|
| 630 |
+
4212.2729642391205,
|
| 631 |
+
4202.838434457779,
|
| 632 |
+
4202.8495717048645,
|
| 633 |
+
4206.559844255447,
|
| 634 |
+
4206.570970535278,
|
| 635 |
+
4202.594026565552,
|
| 636 |
+
4202.6052367687225,
|
| 637 |
+
4204.671685695648,
|
| 638 |
+
4204.675058603287,
|
| 639 |
+
4201.653420209885,
|
| 640 |
+
4201.664590358734,
|
| 641 |
+
4203.356340646744,
|
| 642 |
+
4203.3675968647,
|
| 643 |
+
4226.834460258484,
|
| 644 |
+
4226.84539103508,
|
| 645 |
+
4432.4133422374725,
|
| 646 |
+
4432.424476385117,
|
| 647 |
+
4194.520195245743,
|
| 648 |
+
4194.531393289566,
|
| 649 |
+
4185.361557483673,
|
| 650 |
+
4185.372809171677,
|
| 651 |
+
4178.024575471878,
|
| 652 |
+
4178.035531282425,
|
| 653 |
+
4183.264570951462,
|
| 654 |
+
4183.275583267212,
|
| 655 |
+
4178.5521404743195,
|
| 656 |
+
4178.563311338425,
|
| 657 |
+
4178.228582620621,
|
| 658 |
+
4178.238200426102,
|
| 659 |
+
4181.432615280151,
|
| 660 |
+
4181.443482160568,
|
| 661 |
+
4181.636572599411,
|
| 662 |
+
4181.647958517075,
|
| 663 |
+
4180.119422197342,
|
| 664 |
+
4180.130319356918,
|
| 665 |
+
4181.348428249359,
|
| 666 |
+
4181.3601496219635,
|
| 667 |
+
4182.4969573020935,
|
| 668 |
+
4182.508371829987,
|
| 669 |
+
4255.815136909485,
|
| 670 |
+
4255.824706077576,
|
| 671 |
+
4447.2853989601135,
|
| 672 |
+
4447.294949054718,
|
| 673 |
+
4375.476977586746,
|
| 674 |
+
4375.488611936569,
|
| 675 |
+
4216.147409915924,
|
| 676 |
+
4216.157112836838,
|
| 677 |
+
4184.855574131012,
|
| 678 |
+
4184.867551803589,
|
| 679 |
+
4182.2731301784515,
|
| 680 |
+
4182.284587860107,
|
| 681 |
+
4182.427225112915,
|
| 682 |
+
4182.438867807388,
|
| 683 |
+
4181.939938545227,
|
| 684 |
+
4181.951656103134,
|
| 685 |
+
4183.5050485134125,
|
| 686 |
+
4183.516293287277,
|
| 687 |
+
4180.313590764999,
|
| 688 |
+
4180.325238704681,
|
| 689 |
+
4184.185824394226,
|
| 690 |
+
4184.196978807449,
|
| 691 |
+
4175.860624790192,
|
| 692 |
+
4175.8725233078,
|
| 693 |
+
4174.206290960312,
|
| 694 |
+
4174.217987298965,
|
| 695 |
+
4225.280811309814,
|
| 696 |
+
4346.787808179855,
|
| 697 |
+
4346.791662693024,
|
| 698 |
+
4299.203949213028,
|
| 699 |
+
4334.719336986542,
|
| 700 |
+
4334.72660279274,
|
| 701 |
+
4307.453342437744,
|
| 702 |
+
4307.463569164276,
|
| 703 |
+
4243.263749361038,
|
| 704 |
+
4243.27504825592,
|
| 705 |
+
4232.403777837753,
|
| 706 |
+
4232.415019750595,
|
| 707 |
+
4234.023860692978,
|
| 708 |
+
4234.036010503769,
|
| 709 |
+
4232.419568777084,
|
| 710 |
+
4232.430717229843,
|
| 711 |
+
4228.692707538605,
|
| 712 |
+
4228.695293188095,
|
| 713 |
+
4235.275017976761,
|
| 714 |
+
4235.286781549454,
|
| 715 |
+
4231.93186712265,
|
| 716 |
+
4231.934266328812,
|
| 717 |
+
4237.727004766464,
|
| 718 |
+
4237.736963748932,
|
| 719 |
+
4448.2472088336945,
|
| 720 |
+
4448.257912635803,
|
| 721 |
+
4283.024597644806,
|
| 722 |
+
4283.03609752655,
|
| 723 |
+
4270.3121337890625,
|
| 724 |
+
4270.324274778366,
|
| 725 |
+
4244.299434423447,
|
| 726 |
+
4244.311620950699,
|
| 727 |
+
4363.46278834343,
|
| 728 |
+
4180.62579703331,
|
| 729 |
+
4180.635629653931,
|
| 730 |
+
4363.069185256958,
|
| 731 |
+
4220.090236663818,
|
| 732 |
+
4220.102267503738,
|
| 733 |
+
4190.208593130112,
|
| 734 |
+
4190.220735549927,
|
| 735 |
+
4181.494255304337,
|
| 736 |
+
4181.50580906868,
|
| 737 |
+
4186.210835933685,
|
| 738 |
+
4186.214511394501,
|
| 739 |
+
4188.612834215164,
|
| 740 |
+
4188.625131607056,
|
| 741 |
+
4182.178534984589,
|
| 742 |
+
4182.189949512482,
|
| 743 |
+
4183.857384443283,
|
| 744 |
+
4183.869287014008,
|
| 745 |
+
4183.761756181717,
|
| 746 |
+
4241.330404281616,
|
| 747 |
+
4241.341110467911,
|
| 748 |
+
4207.978038311005,
|
| 749 |
+
4207.990997314453,
|
| 750 |
+
4209.410867214203,
|
| 751 |
+
4209.421168088913,
|
| 752 |
+
4207.717931270599,
|
| 753 |
+
4207.730401754379,
|
| 754 |
+
4204.301562309265,
|
| 755 |
+
4204.313354253769,
|
| 756 |
+
4297.861345052719,
|
| 757 |
+
4297.873908042908,
|
| 758 |
+
4282.807532548904,
|
| 759 |
+
4282.820100307465,
|
| 760 |
+
4269.668355226517,
|
| 761 |
+
4269.680841684341,
|
| 762 |
+
4198.918546676636,
|
| 763 |
+
4198.928604364395,
|
| 764 |
+
4239.654682636261,
|
| 765 |
+
4239.659080028534,
|
| 766 |
+
4419.87956905365,
|
| 767 |
+
4419.889652013779,
|
| 768 |
+
4302.591921806335,
|
| 769 |
+
4302.60400891304,
|
| 770 |
+
4199.097110033035,
|
| 771 |
+
4199.109765052795,
|
| 772 |
+
4202.586899995804,
|
| 773 |
+
4202.596865415573,
|
| 774 |
+
4223.580963373184,
|
| 775 |
+
4236.571214199066,
|
| 776 |
+
4236.583789110184,
|
| 777 |
+
4266.631365537643,
|
| 778 |
+
4266.643340587616,
|
| 779 |
+
4206.533836603165,
|
| 780 |
+
4206.543870687485,
|
| 781 |
+
4196.797498226166,
|
| 782 |
+
4196.809820890427,
|
| 783 |
+
4202.778592824936,
|
| 784 |
+
4202.791028261185,
|
| 785 |
+
4200.911655426025,
|
| 786 |
+
4200.922192811966,
|
| 787 |
+
4218.757748126984,
|
| 788 |
+
4218.7700316905975,
|
| 789 |
+
4197.834621667862,
|
| 790 |
+
4197.8472237586975,
|
| 791 |
+
4194.553659200668,
|
| 792 |
+
4194.558137655258,
|
| 793 |
+
4210.2872478961945,
|
| 794 |
+
4210.291656970978,
|
| 795 |
+
4269.952535390854,
|
| 796 |
+
4269.963551998138,
|
| 797 |
+
4214.965420722961,
|
| 798 |
+
4214.9777710437775,
|
| 799 |
+
4268.254637956619,
|
| 800 |
+
4268.267082452774,
|
| 801 |
+
4188.457591295242,
|
| 802 |
+
4188.467690706253,
|
| 803 |
+
4188.935349225998,
|
| 804 |
+
4188.947833776474,
|
| 805 |
+
4192.73951125145,
|
| 806 |
+
4192.749709367752,
|
| 807 |
+
4188.534428119659,
|
| 808 |
+
4188.53829908371,
|
| 809 |
+
4196.497691392899,
|
| 810 |
+
4196.510225534439,
|
| 811 |
+
4318.416720151901,
|
| 812 |
+
4318.4267864227295,
|
| 813 |
+
4209.298709154129,
|
| 814 |
+
4204.6052923202515,
|
| 815 |
+
4204.609621763229,
|
| 816 |
+
4192.598699092865,
|
| 817 |
+
4192.6110072135925,
|
| 818 |
+
4264.5488522052765,
|
| 819 |
+
4264.562687158585,
|
| 820 |
+
4342.3707575798035,
|
| 821 |
+
4342.3756980896,
|
| 822 |
+
4299.415410995483,
|
| 823 |
+
4299.425767421722,
|
| 824 |
+
4285.986501693726,
|
| 825 |
+
4285.999414205551,
|
| 826 |
+
4251.881839513779,
|
| 827 |
+
4251.89198923111,
|
| 828 |
+
4217.251371145248,
|
| 829 |
+
4217.262971401215,
|
| 830 |
+
4265.004074335098,
|
| 831 |
+
4265.016601800919,
|
| 832 |
+
4422.643936634064,
|
| 833 |
+
4453.576984167099,
|
| 834 |
+
4453.588968753815,
|
| 835 |
+
4183.795456409454,
|
| 836 |
+
4183.80871462822,
|
| 837 |
+
4183.177849292755,
|
| 838 |
+
4183.1909646987915,
|
| 839 |
+
4190.727601289749,
|
| 840 |
+
4190.740168809891,
|
| 841 |
+
4185.585786104202,
|
| 842 |
+
4185.596675872803,
|
| 843 |
+
4186.326423406601,
|
| 844 |
+
4186.3365132808685,
|
| 845 |
+
4188.701127767563,
|
| 846 |
+
4188.713495969772,
|
| 847 |
+
4183.693524837494,
|
| 848 |
+
4183.706875085831,
|
| 849 |
+
4182.603164672852,
|
| 850 |
+
4182.169225692749,
|
| 851 |
+
4182.182250261307,
|
| 852 |
+
4183.1377918720245,
|
| 853 |
+
4183.142628669739,
|
| 854 |
+
4179.616315603256,
|
| 855 |
+
4179.626562833786,
|
| 856 |
+
4304.994537830353,
|
| 857 |
+
4305.007478475571,
|
| 858 |
+
4361.554908275604,
|
| 859 |
+
4361.56044960022,
|
| 860 |
+
4368.104673624039,
|
| 861 |
+
4368.11031460762,
|
| 862 |
+
4246.525162935257,
|
| 863 |
+
4246.5380046367645,
|
| 864 |
+
4183.925352096558,
|
| 865 |
+
4232.265904188156,
|
| 866 |
+
4232.277180671692,
|
| 867 |
+
4238.892568349838,
|
| 868 |
+
4238.905729055405,
|
| 869 |
+
4187.827491521835,
|
| 870 |
+
4187.84108877182,
|
| 871 |
+
4190.126079082489,
|
| 872 |
+
4190.13965845108,
|
| 873 |
+
4190.435103654861,
|
| 874 |
+
4190.440406799316,
|
| 875 |
+
4191.884477853775,
|
| 876 |
+
4191.897578239441,
|
| 877 |
+
4187.4977107048035,
|
| 878 |
+
4172.838095903397,
|
| 879 |
+
4172.843760967255,
|
| 880 |
+
4177.684302330017,
|
| 881 |
+
4177.6969130039215,
|
| 882 |
+
4172.654875993729,
|
| 883 |
+
4172.667930603027,
|
| 884 |
+
4174.483522415161,
|
| 885 |
+
4174.496375083923,
|
| 886 |
+
4166.372047901154,
|
| 887 |
+
4166.384793281555,
|
| 888 |
+
4283.736061811447,
|
| 889 |
+
3653.9717135429382,
|
| 890 |
+
3653.97727560997,
|
| 891 |
+
3628.154771566391,
|
| 892 |
+
3628.159923315048,
|
| 893 |
+
3652.242630004883,
|
| 894 |
+
3652.2448382377625,
|
| 895 |
+
3646.540367603302,
|
| 896 |
+
3646.542966604233,
|
| 897 |
+
3608.7122309207916,
|
| 898 |
+
3608.717301607132,
|
| 899 |
+
3608.8411026000977,
|
| 900 |
+
3608.846682548523,
|
| 901 |
+
3606.5311863422394,
|
| 902 |
+
3606.5361762046814,
|
| 903 |
+
3611.4129967689514,
|
| 904 |
+
3611.418157339096,
|
| 905 |
+
3610.7246301174164,
|
| 906 |
+
3610.729764699936,
|
| 907 |
+
3607.0119185447693,
|
| 908 |
+
3607.0174593925476,
|
| 909 |
+
3607.5829951763153,
|
| 910 |
+
3607.5891518592834,
|
| 911 |
+
3607.95986866951,
|
| 912 |
+
3607.964668035507,
|
| 913 |
+
3614.2318153381348,
|
| 914 |
+
3614.2375481128693,
|
| 915 |
+
3618.1517746448517,
|
| 916 |
+
3618.1568336486816,
|
| 917 |
+
3622.268902540207,
|
| 918 |
+
3667.4287581443787,
|
| 919 |
+
3667.433854341507,
|
| 920 |
+
3623.2074506282806,
|
| 921 |
+
3623.212779045105,
|
| 922 |
+
3643.333916425705,
|
| 923 |
+
3643.339797258377,
|
| 924 |
+
3641.6545128822327,
|
| 925 |
+
3641.6596987247467,
|
| 926 |
+
3627.3986847400665,
|
| 927 |
+
3627.4038894176483,
|
| 928 |
+
3628.012758731842,
|
| 929 |
+
3628.017865419388,
|
| 930 |
+
3635.4565312862396,
|
| 931 |
+
3635.461765527725,
|
| 932 |
+
3620.5242562294006,
|
| 933 |
+
3620.529673099518,
|
| 934 |
+
3640.751862049103,
|
| 935 |
+
3640.7576014995575,
|
| 936 |
+
3647.903746366501,
|
| 937 |
+
3647.9063782691956,
|
| 938 |
+
3645.3971898555756,
|
| 939 |
+
3645.4029626846313,
|
| 940 |
+
3645.676437139511,
|
| 941 |
+
3645.680727005005,
|
| 942 |
+
3624.857933282852,
|
| 943 |
+
3624.863513469696,
|
| 944 |
+
3629.3647339344025,
|
| 945 |
+
3629.370223760605,
|
| 946 |
+
3664.6942942142487
|
| 947 |
+
],
|
| 948 |
+
"valid_loss_history": [
|
| 949 |
+
-2.2420080729893277,
|
| 950 |
+
-3.6040473665509904,
|
| 951 |
+
-4.652349131447928,
|
| 952 |
+
-5.269411563873291,
|
| 953 |
+
-5.602223873138428,
|
| 954 |
+
-5.948959009987967,
|
| 955 |
+
-6.180064678192139,
|
| 956 |
+
-6.373329707554409,
|
| 957 |
+
-6.4635710035051614,
|
| 958 |
+
-6.628378936222622,
|
| 959 |
+
-6.765629632132394,
|
| 960 |
+
-6.878908634185791,
|
| 961 |
+
-6.975889819008963,
|
| 962 |
+
-7.089849744524274,
|
| 963 |
+
-7.137168339320591,
|
| 964 |
+
-7.214839458465576,
|
| 965 |
+
-7.248862539018903,
|
| 966 |
+
-7.323270389011928,
|
| 967 |
+
-7.374068532671247,
|
| 968 |
+
-7.447478975568499,
|
| 969 |
+
-7.470496041434152,
|
| 970 |
+
-7.578763212476458,
|
| 971 |
+
-7.638515608651297,
|
| 972 |
+
-7.603791032518659,
|
| 973 |
+
-7.658165522984096,
|
| 974 |
+
-7.660087721688407,
|
| 975 |
+
-7.711926255907331,
|
| 976 |
+
-7.763034411839077,
|
| 977 |
+
-7.80566440309797,
|
| 978 |
+
-7.829599516732352,
|
| 979 |
+
-7.908110482352121,
|
| 980 |
+
-7.871029717581613,
|
| 981 |
+
-7.790640013558524,
|
| 982 |
+
-7.807113443102155,
|
| 983 |
+
-7.826304980686733,
|
| 984 |
+
-7.77531235558646,
|
| 985 |
+
-7.879563399723598,
|
| 986 |
+
-7.897988796234131,
|
| 987 |
+
-7.845814909253802,
|
| 988 |
+
-7.848473821367536,
|
| 989 |
+
-7.912371976034982,
|
| 990 |
+
-7.943405968802316,
|
| 991 |
+
-8.085525648934501,
|
| 992 |
+
-8.010899543762207,
|
| 993 |
+
-8.028815746307373,
|
| 994 |
+
-8.061845302581787,
|
| 995 |
+
-8.02747140611921,
|
| 996 |
+
-8.03413268498012,
|
| 997 |
+
-8.033596924373082,
|
| 998 |
+
-8.068816934313093,
|
| 999 |
+
-8.067536762782506,
|
| 1000 |
+
-8.144167695726667,
|
| 1001 |
+
-8.148260184696742,
|
| 1002 |
+
-8.180625711168561,
|
| 1003 |
+
-8.180845873696464,
|
| 1004 |
+
-8.25086770738874,
|
| 1005 |
+
-8.261961323874337,
|
| 1006 |
+
-8.260808059147426,
|
| 1007 |
+
-8.186679295131139,
|
| 1008 |
+
-8.165157794952393,
|
| 1009 |
+
-8.194125039236885,
|
| 1010 |
+
-8.254536492483956,
|
| 1011 |
+
-8.292360033307757,
|
| 1012 |
+
-8.267435346330915,
|
| 1013 |
+
-8.27747208731515,
|
| 1014 |
+
-8.366285255977086,
|
| 1015 |
+
-8.354675361088344,
|
| 1016 |
+
-8.365063190460205,
|
| 1017 |
+
-8.427791595458984,
|
| 1018 |
+
-8.452910355159215,
|
| 1019 |
+
-8.395057133265905,
|
| 1020 |
+
-8.455147879464286,
|
| 1021 |
+
-8.485073634556361,
|
| 1022 |
+
-8.504877976008824,
|
| 1023 |
+
-8.502339363098145,
|
| 1024 |
+
-8.485261576516288,
|
| 1025 |
+
-8.50761045728411,
|
| 1026 |
+
-8.482435567038399,
|
| 1027 |
+
-8.516456604003906,
|
| 1028 |
+
-8.503895146506173,
|
| 1029 |
+
-8.515655858176094,
|
| 1030 |
+
-8.574515002114433,
|
| 1031 |
+
-8.580681255885533,
|
| 1032 |
+
-8.593669959477015,
|
| 1033 |
+
-8.538264206477574,
|
| 1034 |
+
-8.570460319519043,
|
| 1035 |
+
-8.610838617597308,
|
| 1036 |
+
-8.576563426426478,
|
| 1037 |
+
-8.631826945713588,
|
| 1038 |
+
-8.593990189688546,
|
| 1039 |
+
-8.584804126194545,
|
| 1040 |
+
-8.616937228611537,
|
| 1041 |
+
-8.616405078342982,
|
| 1042 |
+
-8.636415685926165,
|
| 1043 |
+
-8.736162253788539,
|
| 1044 |
+
-8.684600080762591,
|
| 1045 |
+
-8.751097747257777,
|
| 1046 |
+
-8.744481086730957,
|
| 1047 |
+
-8.760670593806676,
|
| 1048 |
+
-8.81410721370152,
|
| 1049 |
+
-8.762031418936592,
|
| 1050 |
+
-8.731195313589913,
|
| 1051 |
+
-8.680067879813057,
|
| 1052 |
+
-8.73148284639631,
|
| 1053 |
+
-8.770104340144567,
|
| 1054 |
+
-8.83363403592791,
|
| 1055 |
+
-8.797364848000663,
|
| 1056 |
+
-8.756126131330218,
|
| 1057 |
+
-8.717773846217565,
|
| 1058 |
+
-8.755549158368792,
|
| 1059 |
+
-8.798967293330602,
|
| 1060 |
+
-8.80781262261527,
|
| 1061 |
+
-8.879967212677002,
|
| 1062 |
+
-8.83057907649449,
|
| 1063 |
+
-8.910664354051862,
|
| 1064 |
+
-8.930669920785087,
|
| 1065 |
+
-8.850233895438057,
|
| 1066 |
+
-8.87684679031372,
|
| 1067 |
+
-8.860790797642299,
|
| 1068 |
+
-8.854635306767054,
|
| 1069 |
+
-8.871529306684222,
|
| 1070 |
+
-8.870055334908622,
|
| 1071 |
+
-8.814562388828822,
|
| 1072 |
+
-8.895111628941127,
|
| 1073 |
+
-8.95235286440168,
|
| 1074 |
+
-8.978583880833217,
|
| 1075 |
+
-8.970093931470599,
|
| 1076 |
+
-8.94366032736642,
|
| 1077 |
+
-8.930564199175153,
|
| 1078 |
+
-8.896938255855016,
|
| 1079 |
+
-9.003027439117432,
|
| 1080 |
+
-8.967686380658831,
|
| 1081 |
+
-8.945790427071708,
|
| 1082 |
+
-8.978134904588972,
|
| 1083 |
+
-8.926983833312988,
|
| 1084 |
+
-8.911829403468541,
|
| 1085 |
+
-9.004649843488421,
|
| 1086 |
+
-8.982011726924352,
|
| 1087 |
+
-9.004248074122838,
|
| 1088 |
+
-9.022075244358607,
|
| 1089 |
+
-9.055972508021764,
|
| 1090 |
+
-9.095445496695381,
|
| 1091 |
+
-9.014348983764648,
|
| 1092 |
+
-9.017100266047887,
|
| 1093 |
+
-9.06740631375994,
|
| 1094 |
+
-9.062205382755824,
|
| 1095 |
+
-9.006571020398821,
|
| 1096 |
+
-9.060756206512451,
|
| 1097 |
+
-9.114073821476527,
|
| 1098 |
+
-9.12088053567069,
|
| 1099 |
+
-9.146572181156703,
|
| 1100 |
+
-9.129499162946429,
|
| 1101 |
+
-9.162499564034599,
|
| 1102 |
+
-9.146372726985387,
|
| 1103 |
+
-9.138916151864189,
|
| 1104 |
+
-9.140360014779228,
|
| 1105 |
+
-9.14337342126029,
|
| 1106 |
+
-9.13001537322998,
|
| 1107 |
+
-9.089552674974714,
|
| 1108 |
+
-9.172866821289062,
|
| 1109 |
+
-9.200943265642438,
|
| 1110 |
+
-9.191112245832171,
|
| 1111 |
+
-9.207633904048375,
|
| 1112 |
+
-9.147029059273857,
|
| 1113 |
+
-9.17673145021711,
|
| 1114 |
+
-9.129148755754743,
|
| 1115 |
+
-9.157607623508998,
|
| 1116 |
+
-9.13064786366054,
|
| 1117 |
+
-9.154420512063163,
|
| 1118 |
+
-9.181631565093994,
|
| 1119 |
+
-9.155359063829694,
|
| 1120 |
+
-9.158296721322197,
|
| 1121 |
+
-9.156671251569476,
|
| 1122 |
+
-9.154706001281738,
|
| 1123 |
+
-9.167226382664271,
|
| 1124 |
+
-9.163607052394322,
|
| 1125 |
+
-9.209595475878034,
|
| 1126 |
+
-9.310745784214564,
|
| 1127 |
+
-9.238739694867816,
|
| 1128 |
+
-9.288273334503174,
|
| 1129 |
+
-9.2847033228193,
|
| 1130 |
+
-9.313508306230817,
|
| 1131 |
+
-9.334877354758126,
|
| 1132 |
+
-9.270281859806605,
|
| 1133 |
+
-9.189015797206334,
|
| 1134 |
+
-9.247245516095843,
|
| 1135 |
+
-9.272651195526123,
|
| 1136 |
+
-9.430454867226738,
|
| 1137 |
+
-9.431772300175258,
|
| 1138 |
+
-9.406911509377617,
|
| 1139 |
+
-9.434791496821813,
|
| 1140 |
+
-9.40122835976737,
|
| 1141 |
+
-9.331563881465367,
|
| 1142 |
+
-9.266850130898613,
|
| 1143 |
+
-9.263189588274274,
|
| 1144 |
+
-9.341036796569824,
|
| 1145 |
+
-9.302794524601527,
|
| 1146 |
+
-9.364838123321533,
|
| 1147 |
+
-9.468104021889824,
|
| 1148 |
+
-9.427109173366002,
|
| 1149 |
+
-7.843665736062186,
|
| 1150 |
+
-8.728734561375209,
|
| 1151 |
+
-8.838280609675817,
|
| 1152 |
+
-8.92993450164795,
|
| 1153 |
+
-8.983613082340785,
|
| 1154 |
+
-9.073682171957833,
|
| 1155 |
+
-9.108558654785156,
|
| 1156 |
+
-9.189598287854876,
|
| 1157 |
+
-9.265264647347587,
|
| 1158 |
+
-9.251329898834229,
|
| 1159 |
+
-9.130606515066964,
|
| 1160 |
+
-9.009151935577393,
|
| 1161 |
+
-8.934782436915807,
|
| 1162 |
+
-8.819936275482178,
|
| 1163 |
+
-8.798521995544434,
|
| 1164 |
+
-9.209092957632881,
|
| 1165 |
+
-9.022621767861503,
|
| 1166 |
+
-8.851909978049141,
|
| 1167 |
+
-8.771939413888115,
|
| 1168 |
+
-8.668200629098076,
|
| 1169 |
+
-8.705128737858363,
|
| 1170 |
+
-8.659645216805595,
|
| 1171 |
+
-8.618493284497942,
|
| 1172 |
+
-8.664817196982247,
|
| 1173 |
+
-8.702061380658831,
|
| 1174 |
+
-8.703030790601458,
|
| 1175 |
+
-8.59453991481236,
|
| 1176 |
+
-8.648234503609794,
|
| 1177 |
+
-8.672728061676025,
|
| 1178 |
+
-8.749418190547399
|
| 1179 |
+
]
|
| 1180 |
+
}
|
MedleyVox-MultiSinger/vocal 231/loss_graph_vocals.png
ADDED
|
MelBand-Roformer-Deux-Becruily/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
MelBand-Roformer-Deux-Becruily/README.md
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
---
|
| 4 |
+
Dual model for vocal and instrumental separation based on Mel-Band RoFormer architecture.
|
| 5 |
+
|
| 6 |
+
Metric sdr for instrum: 17.5466
|
| 7 |
+
|
| 8 |
+
Metric sdr for vocals: 11.3695
|
MelBand-Roformer-Deux-Becruily/config_deux_becruily.yaml
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 573300
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 256
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 12
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 2
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
num_bands: 60
|
| 19 |
+
dim_head: 64
|
| 20 |
+
heads: 8
|
| 21 |
+
attn_dropout: 0
|
| 22 |
+
ff_dropout: 0
|
| 23 |
+
flash_attn: true
|
| 24 |
+
dim_freqs_in: 1025
|
| 25 |
+
sample_rate: 44100
|
| 26 |
+
stft_n_fft: 2048
|
| 27 |
+
stft_hop_length: 441
|
| 28 |
+
stft_win_length: 2048
|
| 29 |
+
stft_normalized: false
|
| 30 |
+
mask_estimator_depth: 2
|
| 31 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 32 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 33 |
+
- 4096
|
| 34 |
+
- 2048
|
| 35 |
+
- 1024
|
| 36 |
+
- 512
|
| 37 |
+
- 256
|
| 38 |
+
multi_stft_hop_size: 147
|
| 39 |
+
multi_stft_normalized: false
|
| 40 |
+
|
| 41 |
+
training:
|
| 42 |
+
batch_size: 1
|
| 43 |
+
gradient_accumulation_steps: 1
|
| 44 |
+
grad_clip: 0
|
| 45 |
+
instruments:
|
| 46 |
+
- Vocals
|
| 47 |
+
- Instrumental
|
| 48 |
+
lr: 0.0001
|
| 49 |
+
patience: 2
|
| 50 |
+
reduce_factor: 0.95
|
| 51 |
+
target_instrument:
|
| 52 |
+
num_epochs: 1000
|
| 53 |
+
num_steps: 1000
|
| 54 |
+
q: 0.95
|
| 55 |
+
coarse_loss_clip: false
|
| 56 |
+
ema_momentum: 0.999
|
| 57 |
+
optimizer: adamw
|
| 58 |
+
other_fix: false
|
| 59 |
+
use_amp: true
|
| 60 |
+
|
| 61 |
+
inference:
|
| 62 |
+
batch_size: 1
|
| 63 |
+
dim_t: 1101
|
| 64 |
+
num_overlap: 2
|
MelBandRoformer-Original/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
MelBandRoformer-Original/README.md
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
---
|
MelBandRoformers/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
MelBandRoformers/bsroformers/karaoke_bs_roformer.yaml
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 801 # don't work (use in model)
|
| 5 |
+
hop_length: 441 # don't work (use in model)
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 12
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
linear_transformer_depth: 0
|
| 19 |
+
freqs_per_bands: !!python/tuple
|
| 20 |
+
- 2
|
| 21 |
+
- 2
|
| 22 |
+
- 2
|
| 23 |
+
- 2
|
| 24 |
+
- 2
|
| 25 |
+
- 2
|
| 26 |
+
- 2
|
| 27 |
+
- 2
|
| 28 |
+
- 2
|
| 29 |
+
- 2
|
| 30 |
+
- 2
|
| 31 |
+
- 2
|
| 32 |
+
- 2
|
| 33 |
+
- 2
|
| 34 |
+
- 2
|
| 35 |
+
- 2
|
| 36 |
+
- 2
|
| 37 |
+
- 2
|
| 38 |
+
- 2
|
| 39 |
+
- 2
|
| 40 |
+
- 2
|
| 41 |
+
- 2
|
| 42 |
+
- 2
|
| 43 |
+
- 2
|
| 44 |
+
- 4
|
| 45 |
+
- 4
|
| 46 |
+
- 4
|
| 47 |
+
- 4
|
| 48 |
+
- 4
|
| 49 |
+
- 4
|
| 50 |
+
- 4
|
| 51 |
+
- 4
|
| 52 |
+
- 4
|
| 53 |
+
- 4
|
| 54 |
+
- 4
|
| 55 |
+
- 4
|
| 56 |
+
- 12
|
| 57 |
+
- 12
|
| 58 |
+
- 12
|
| 59 |
+
- 12
|
| 60 |
+
- 12
|
| 61 |
+
- 12
|
| 62 |
+
- 12
|
| 63 |
+
- 12
|
| 64 |
+
- 24
|
| 65 |
+
- 24
|
| 66 |
+
- 24
|
| 67 |
+
- 24
|
| 68 |
+
- 24
|
| 69 |
+
- 24
|
| 70 |
+
- 24
|
| 71 |
+
- 24
|
| 72 |
+
- 48
|
| 73 |
+
- 48
|
| 74 |
+
- 48
|
| 75 |
+
- 48
|
| 76 |
+
- 48
|
| 77 |
+
- 48
|
| 78 |
+
- 48
|
| 79 |
+
- 48
|
| 80 |
+
- 128
|
| 81 |
+
- 129
|
| 82 |
+
dim_head: 64
|
| 83 |
+
heads: 8
|
| 84 |
+
attn_dropout: 0.0
|
| 85 |
+
ff_dropout: 0.0
|
| 86 |
+
flash_attn: true
|
| 87 |
+
dim_freqs_in: 1025
|
| 88 |
+
stft_n_fft: 2048
|
| 89 |
+
stft_hop_length: 512
|
| 90 |
+
stft_win_length: 2048
|
| 91 |
+
stft_normalized: false
|
| 92 |
+
mask_estimator_depth: 2
|
| 93 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 94 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 95 |
+
- 4096
|
| 96 |
+
- 2048
|
| 97 |
+
- 1024
|
| 98 |
+
- 512
|
| 99 |
+
- 256
|
| 100 |
+
multi_stft_hop_size: 147
|
| 101 |
+
multi_stft_normalized: False
|
| 102 |
+
mlp_expansion_factor: 4
|
| 103 |
+
use_torch_checkpoint: True
|
| 104 |
+
skip_connection: False
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
training:
|
| 108 |
+
batch_size: 1
|
| 109 |
+
gradient_accumulation_steps: 999
|
| 110 |
+
grad_clip: 1
|
| 111 |
+
instruments: ['vocals', 'other']
|
| 112 |
+
lr: 1.0e-5
|
| 113 |
+
patience: 1000000
|
| 114 |
+
reduce_factor: 0.75
|
| 115 |
+
target_instrument: vocals
|
| 116 |
+
num_epochs: 1000
|
| 117 |
+
num_steps: 1000
|
| 118 |
+
q: 0.95
|
| 119 |
+
coarse_loss_clip: true
|
| 120 |
+
ema_momentum: 0.999
|
| 121 |
+
optimizer: Fira
|
| 122 |
+
other_fix: True # it's needed for checking on multisong dataset if other is actually instrumental
|
| 123 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 124 |
+
use_torch_checkpoint: True
|
| 125 |
+
|
| 126 |
+
inference:
|
| 127 |
+
batch_size: 6
|
| 128 |
+
dim_t: 1251
|
| 129 |
+
num_overlap: 2
|
MelBandRoformers/melbandroformers/instrumental/inst_gabox.yaml
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 485100
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 1101
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 384
|
| 13 |
+
depth: 6
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
num_bands: 60
|
| 19 |
+
dim_head: 64
|
| 20 |
+
heads: 8
|
| 21 |
+
attn_dropout: 0
|
| 22 |
+
ff_dropout: 0
|
| 23 |
+
flash_attn: True
|
| 24 |
+
dim_freqs_in: 1025
|
| 25 |
+
sample_rate: 44100 # needed for mel filter bank from librosa
|
| 26 |
+
stft_n_fft: 2048
|
| 27 |
+
stft_hop_length: 441
|
| 28 |
+
stft_win_length: 2048
|
| 29 |
+
stft_normalized: False
|
| 30 |
+
mask_estimator_depth: 2
|
| 31 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 32 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 33 |
+
- 4096
|
| 34 |
+
- 2048
|
| 35 |
+
- 1024
|
| 36 |
+
- 512
|
| 37 |
+
- 256
|
| 38 |
+
multi_stft_hop_size: 147
|
| 39 |
+
multi_stft_normalized: False
|
| 40 |
+
|
| 41 |
+
training:
|
| 42 |
+
instruments:
|
| 43 |
+
- Instrumental
|
| 44 |
+
- Vocals
|
| 45 |
+
target_instrument: Instrumental
|
| 46 |
+
use_amp: True
|
| 47 |
+
|
| 48 |
+
inference:
|
| 49 |
+
batch_size: 1
|
| 50 |
+
dim_t: 1101
|
| 51 |
+
num_overlap: 2
|
MelBandRoformers/melbandroformers/instrumental/v10.yaml
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 256
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.00
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 256
|
| 13 |
+
depth: 12
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
num_bands: 60
|
| 19 |
+
dim_head: 64
|
| 20 |
+
heads: 8
|
| 21 |
+
attn_dropout: 0
|
| 22 |
+
ff_dropout: 0
|
| 23 |
+
flash_attn: true
|
| 24 |
+
dim_freqs_in: 1025
|
| 25 |
+
sample_rate: 44100
|
| 26 |
+
stft_n_fft: 2048
|
| 27 |
+
stft_hop_length: 441
|
| 28 |
+
stft_win_length: 2048
|
| 29 |
+
stft_normalized: true
|
| 30 |
+
mask_estimator_depth: 2
|
| 31 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 32 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 33 |
+
- 4096
|
| 34 |
+
- 2048
|
| 35 |
+
- 1024
|
| 36 |
+
- 512
|
| 37 |
+
- 256
|
| 38 |
+
multi_stft_hop_size: 250
|
| 39 |
+
multi_stft_normalized: false
|
| 40 |
+
use_torch_checkpoint: true
|
| 41 |
+
|
| 42 |
+
training:
|
| 43 |
+
batch_size: 1
|
| 44 |
+
gradient_accumulation_steps: 999999999999999999999999
|
| 45 |
+
grad_clip: 0
|
| 46 |
+
instruments:
|
| 47 |
+
- other
|
| 48 |
+
- vocals
|
| 49 |
+
lr: 0.00001
|
| 50 |
+
patience: 100000000
|
| 51 |
+
reduce_factor: 0.95
|
| 52 |
+
target_instrument: other
|
| 53 |
+
num_epochs: 1000
|
| 54 |
+
num_steps: 1000
|
| 55 |
+
augmentation: false # enable augmentations by audiomentations and pedalboard
|
| 56 |
+
augmentation_type: simple1
|
| 57 |
+
use_mp3_compress: false # Deprecated
|
| 58 |
+
augmentation_mix: false # Mix several stems of the same type with some probability
|
| 59 |
+
augmentation_loudness: true # randomly change loudness of each stem
|
| 60 |
+
augmentation_loudness_type: 1 # Type 1 or 2
|
| 61 |
+
augmentation_loudness_min: 0
|
| 62 |
+
augmentation_loudness_max: 0
|
| 63 |
+
q: 0.95
|
| 64 |
+
coarse_loss_clip: false
|
| 65 |
+
ema_momentum: 0.999
|
| 66 |
+
optimizer: adamw
|
| 67 |
+
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
|
| 68 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 69 |
+
|
| 70 |
+
inference:
|
| 71 |
+
batch_size: 1
|
| 72 |
+
dim_t: 1101
|
| 73 |
+
num_overlap: 2
|
MelBandRoformers/melbandroformers/karaoke/karaokegabox_1750911344.yaml
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 485100
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 256
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.000
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 384
|
| 13 |
+
depth: 6
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
num_bands: 60
|
| 19 |
+
dim_head: 64
|
| 20 |
+
heads: 8
|
| 21 |
+
attn_dropout: 0
|
| 22 |
+
ff_dropout: 0
|
| 23 |
+
flash_attn: true
|
| 24 |
+
dim_freqs_in: 1025
|
| 25 |
+
sample_rate: 44100 # needed for mel filter bank from librosa
|
| 26 |
+
stft_n_fft: 2048
|
| 27 |
+
stft_hop_length: 441
|
| 28 |
+
stft_win_length: 2048
|
| 29 |
+
stft_normalized: false
|
| 30 |
+
mask_estimator_depth: 2
|
| 31 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 32 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 33 |
+
- 4096
|
| 34 |
+
- 2048
|
| 35 |
+
- 1024
|
| 36 |
+
- 512
|
| 37 |
+
- 256
|
| 38 |
+
multi_stft_hop_size: 147
|
| 39 |
+
multi_stft_normalized: true
|
| 40 |
+
|
| 41 |
+
training:
|
| 42 |
+
batch_size: 1
|
| 43 |
+
gradient_accumulation_steps: 1
|
| 44 |
+
grad_clip: 0
|
| 45 |
+
instruments:
|
| 46 |
+
- Vocals
|
| 47 |
+
- Instrumental
|
| 48 |
+
lr: 0.0005
|
| 49 |
+
patience: 2
|
| 50 |
+
reduce_factor: 0.95
|
| 51 |
+
target_instrument: Vocals
|
| 52 |
+
num_epochs: 1000
|
| 53 |
+
num_steps: 1000
|
| 54 |
+
augmentation: false # enable augmentations by audiomentations and pedalboard
|
| 55 |
+
augmentation_type:
|
| 56 |
+
use_mp3_compress: false # Deprecated
|
| 57 |
+
augmentation_mix: false # Mix several stems of the same type with some probability
|
| 58 |
+
augmentation_loudness: false # randomly change loudness of each stem
|
| 59 |
+
augmentation_loudness_type: 1 # Type 1 or 2
|
| 60 |
+
augmentation_loudness_min: 0
|
| 61 |
+
augmentation_loudness_max: 0
|
| 62 |
+
q: 0.95
|
| 63 |
+
coarse_loss_clip: false
|
| 64 |
+
ema_momentum: 0.999
|
| 65 |
+
optimizer: adamw
|
| 66 |
+
other_fix: false # it's needed for checking on multisong dataset if other is actually instrumental
|
| 67 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 68 |
+
|
| 69 |
+
inference:
|
| 70 |
+
batch_size: 1
|
| 71 |
+
dim_t: 1101
|
| 72 |
+
num_overlap: 8
|
MelBandRoformers/melbandroformers/vocals/voc_gabox.yaml
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 352800
|
| 3 |
+
dim_f: 1024
|
| 4 |
+
dim_t: 256
|
| 5 |
+
hop_length: 441
|
| 6 |
+
n_fft: 2048
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.001
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
dim: 384
|
| 13 |
+
depth: 6
|
| 14 |
+
stereo: true
|
| 15 |
+
num_stems: 1
|
| 16 |
+
time_transformer_depth: 1
|
| 17 |
+
freq_transformer_depth: 1
|
| 18 |
+
num_bands: 60
|
| 19 |
+
dim_head: 64
|
| 20 |
+
heads: 8
|
| 21 |
+
attn_dropout: 0
|
| 22 |
+
ff_dropout: 0
|
| 23 |
+
flash_attn: True
|
| 24 |
+
dim_freqs_in: 1025
|
| 25 |
+
sample_rate: 44100 # needed for mel filter bank from librosa
|
| 26 |
+
stft_n_fft: 2048
|
| 27 |
+
stft_hop_length: 441
|
| 28 |
+
stft_win_length: 2048
|
| 29 |
+
stft_normalized: False
|
| 30 |
+
mask_estimator_depth: 2
|
| 31 |
+
multi_stft_resolution_loss_weight: 1.0
|
| 32 |
+
multi_stft_resolutions_window_sizes: !!python/tuple
|
| 33 |
+
- 4096
|
| 34 |
+
- 2048
|
| 35 |
+
- 1024
|
| 36 |
+
- 512
|
| 37 |
+
- 256
|
| 38 |
+
multi_stft_hop_size: 147
|
| 39 |
+
multi_stft_normalized: False
|
| 40 |
+
|
| 41 |
+
training:
|
| 42 |
+
instruments:
|
| 43 |
+
- Vocals
|
| 44 |
+
- Instrumental
|
| 45 |
+
target_instrument: Vocals
|
| 46 |
+
|
| 47 |
+
inference:
|
| 48 |
+
batch_size: 1
|
| 49 |
+
dim_t: 1101
|
| 50 |
+
num_overlap: 1
|
| 51 |
+
chunk_size: 352800
|
Single_Models/ZFTurbo/Vocals/config_vocals_htdemucs.yaml
ADDED
|
@@ -0,0 +1,123 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 485100 # samplerate * segment
|
| 3 |
+
min_mean_abs: 0.001
|
| 4 |
+
hop_length: 1024
|
| 5 |
+
|
| 6 |
+
training:
|
| 7 |
+
batch_size: 10
|
| 8 |
+
gradient_accumulation_steps: 1
|
| 9 |
+
grad_clip: 0
|
| 10 |
+
segment: 11
|
| 11 |
+
shift: 1
|
| 12 |
+
samplerate: 44100
|
| 13 |
+
channels: 2
|
| 14 |
+
normalize: true
|
| 15 |
+
instruments: ['vocals', 'other']
|
| 16 |
+
target_instrument: null
|
| 17 |
+
num_epochs: 1000
|
| 18 |
+
num_steps: 1000
|
| 19 |
+
optimizer: adam
|
| 20 |
+
lr: 9.0e-05
|
| 21 |
+
patience: 2
|
| 22 |
+
reduce_factor: 0.95
|
| 23 |
+
q: 0.95
|
| 24 |
+
coarse_loss_clip: true
|
| 25 |
+
ema_momentum: 0.999
|
| 26 |
+
other_fix: true # it's needed for checking on multisong dataset if other is actually instrumental
|
| 27 |
+
use_amp: true # enable or disable usage of mixed precision (float16) - usually it must be true
|
| 28 |
+
|
| 29 |
+
augmentations:
|
| 30 |
+
enable: true # enable or disable all augmentations (to fast disable if needed)
|
| 31 |
+
loudness: true # randomly change loudness of each stem on the range (loudness_min; loudness_max)
|
| 32 |
+
loudness_min: 0.5
|
| 33 |
+
loudness_max: 1.5
|
| 34 |
+
mixup: true # mix several stems of same type with some probability (only works for dataset types: 1, 2, 3)
|
| 35 |
+
mixup_probs: [0.2, 0.02]
|
| 36 |
+
mixup_loudness_min: 0.5
|
| 37 |
+
mixup_loudness_max: 1.5
|
| 38 |
+
|
| 39 |
+
inference:
|
| 40 |
+
num_overlap: 2
|
| 41 |
+
batch_size: 8
|
| 42 |
+
|
| 43 |
+
model: htdemucs
|
| 44 |
+
|
| 45 |
+
htdemucs: # see demucs/htdemucs.py for a detailed description
|
| 46 |
+
# Channels
|
| 47 |
+
channels: 48
|
| 48 |
+
channels_time:
|
| 49 |
+
growth: 2
|
| 50 |
+
# STFT
|
| 51 |
+
num_subbands: 1
|
| 52 |
+
nfft: 4096
|
| 53 |
+
wiener_iters: 0
|
| 54 |
+
end_iters: 0
|
| 55 |
+
wiener_residual: false
|
| 56 |
+
cac: true
|
| 57 |
+
# Main structure
|
| 58 |
+
depth: 4
|
| 59 |
+
rewrite: true
|
| 60 |
+
# Frequency Branch
|
| 61 |
+
multi_freqs: []
|
| 62 |
+
multi_freqs_depth: 3
|
| 63 |
+
freq_emb: 0.2
|
| 64 |
+
emb_scale: 10
|
| 65 |
+
emb_smooth: true
|
| 66 |
+
# Convolutions
|
| 67 |
+
kernel_size: 8
|
| 68 |
+
stride: 4
|
| 69 |
+
time_stride: 2
|
| 70 |
+
context: 1
|
| 71 |
+
context_enc: 0
|
| 72 |
+
# normalization
|
| 73 |
+
norm_starts: 4
|
| 74 |
+
norm_groups: 4
|
| 75 |
+
# DConv residual branch
|
| 76 |
+
dconv_mode: 3
|
| 77 |
+
dconv_depth: 2
|
| 78 |
+
dconv_comp: 8
|
| 79 |
+
dconv_init: 1e-3
|
| 80 |
+
# Before the Transformer
|
| 81 |
+
bottom_channels: 512
|
| 82 |
+
# CrossTransformer
|
| 83 |
+
# ------ Common to all
|
| 84 |
+
# Regular parameters
|
| 85 |
+
t_layers: 5
|
| 86 |
+
t_hidden_scale: 4.0
|
| 87 |
+
t_heads: 8
|
| 88 |
+
t_dropout: 0.0
|
| 89 |
+
t_layer_scale: True
|
| 90 |
+
t_gelu: True
|
| 91 |
+
# ------------- Positional Embedding
|
| 92 |
+
t_emb: sin
|
| 93 |
+
t_max_positions: 10000 # for the scaled embedding
|
| 94 |
+
t_max_period: 10000.0
|
| 95 |
+
t_weight_pos_embed: 1.0
|
| 96 |
+
t_cape_mean_normalize: True
|
| 97 |
+
t_cape_augment: True
|
| 98 |
+
t_cape_glob_loc_scale: [5000.0, 1.0, 1.4]
|
| 99 |
+
t_sin_random_shift: 0
|
| 100 |
+
# ------------- norm before a transformer encoder
|
| 101 |
+
t_norm_in: True
|
| 102 |
+
t_norm_in_group: False
|
| 103 |
+
# ------------- norm inside the encoder
|
| 104 |
+
t_group_norm: False
|
| 105 |
+
t_norm_first: True
|
| 106 |
+
t_norm_out: True
|
| 107 |
+
# ------------- optim
|
| 108 |
+
t_weight_decay: 0.0
|
| 109 |
+
t_lr:
|
| 110 |
+
# ------------- sparsity
|
| 111 |
+
t_sparse_self_attn: False
|
| 112 |
+
t_sparse_cross_attn: False
|
| 113 |
+
t_mask_type: diag
|
| 114 |
+
t_mask_random_seed: 42
|
| 115 |
+
t_sparse_attn_window: 400
|
| 116 |
+
t_global_window: 100
|
| 117 |
+
t_sparsity: 0.95
|
| 118 |
+
t_auto_sparsity: False
|
| 119 |
+
# Cross Encoder First (False)
|
| 120 |
+
t_cross_first: False
|
| 121 |
+
# Weight init
|
| 122 |
+
rescale: 0.1
|
| 123 |
+
|
Single_Models/ZFTurbo/Vocals/config_vocals_mdx23c.yaml
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
audio:
|
| 2 |
+
chunk_size: 261120
|
| 3 |
+
dim_f: 4096
|
| 4 |
+
dim_t: 256
|
| 5 |
+
hop_length: 1024
|
| 6 |
+
n_fft: 8192
|
| 7 |
+
num_channels: 2
|
| 8 |
+
sample_rate: 44100
|
| 9 |
+
min_mean_abs: 0.001
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
act: gelu
|
| 13 |
+
bottleneck_factor: 4
|
| 14 |
+
growth: 128
|
| 15 |
+
norm: InstanceNorm
|
| 16 |
+
num_blocks_per_scale: 2
|
| 17 |
+
num_channels: 128
|
| 18 |
+
num_scales: 5
|
| 19 |
+
num_subbands: 4
|
| 20 |
+
scale:
|
| 21 |
+
- 2
|
| 22 |
+
- 2
|
| 23 |
+
|
| 24 |
+
training:
|
| 25 |
+
batch_size: 6
|
| 26 |
+
gradient_accumulation_steps: 1
|
| 27 |
+
grad_clip: 0
|
| 28 |
+
instruments:
|
| 29 |
+
- vocals
|
| 30 |
+
- other
|
| 31 |
+
lr: 9.0e-05
|
| 32 |
+
patience: 2
|
| 33 |
+
reduce_factor: 0.95
|
| 34 |
+
target_instrument: null
|
| 35 |
+
num_epochs: 1000
|
| 36 |
+
num_steps: 1000
|
| 37 |
+
augmentation: false # enable augmentations by audiomentations and pedalboard
|
| 38 |
+
augmentation_type: simple1
|
| 39 |
+
use_mp3_compress: false # Deprecated
|
| 40 |
+
augmentation_mix: true # Mix several stems of the same type with some probability
|
| 41 |
+
augmentation_loudness: true # randomly change loudness of each stem
|
| 42 |
+
augmentation_loudness_type: 1 # Type 1 or 2
|
| 43 |
+
augmentation_loudness_min: 0.5
|
| 44 |
+
augmentation_loudness_max: 1.5
|
| 45 |
+
q: 0.95
|
| 46 |
+
coarse_loss_clip: true
|
| 47 |
+
ema_momentum: 0.999
|
| 48 |
+
optimizer: adam
|
| 49 |
+
other_fix: true # it's needed for checking on multisong dataset if other is actually instrumental
|
| 50 |
+
|
| 51 |
+
inference:
|
| 52 |
+
batch_size: 1
|
| 53 |
+
dim_t: 256
|
| 54 |
+
num_overlap: 4
|
Stable-Audio-Open-1.0/LICENSE.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
STABILITY AI COMMUNITY LICENSE AGREEMENT
|
| 2 |
+
|
| 3 |
+
Last Updated: July 5, 2024
|
| 4 |
+
|
| 5 |
+
1. INTRODUCTION
|
| 6 |
+
|
| 7 |
+
This Agreement applies to any individual person or entity (“You”, “Your” or “Licensee”) that uses or distributes any portion or element of the Stability AI Materials or Derivative Works thereof for any Research & Non-Commercial or Commercial purpose. Capitalized terms not otherwise defined herein are defined in Section V below.
|
| 8 |
+
|
| 9 |
+
This Agreement is intended to allow research, non-commercial, and limited commercial uses of the Models free of charge. In order to ensure that certain limited commercial uses of the Models continue to be allowed, this Agreement preserves free access to the Models for people or organizations generating annual revenue of less than US $1,000,000 (or local currency equivalent).
|
| 10 |
+
|
| 11 |
+
By clicking “I Accept” or by using or distributing or using any portion or element of the Stability Materials or Derivative Works, You agree that You have read, understood and are bound by the terms of this Agreement. If You are acting on behalf of a company, organization or other entity, then “You” includes you and that entity, and You agree that You: (i) are an authorized representative of such entity with the authority to bind such entity to this Agreement, and (ii) You agree to the terms of this Agreement on that entity’s behalf.
|
| 12 |
+
|
| 13 |
+
2. RESEARCH & NON-COMMERCIAL USE LICENSE
|
| 14 |
+
|
| 15 |
+
Subject to the terms of this Agreement, Stability AI grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable and royalty-free limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Stability AI Materials to use, reproduce, distribute, and create Derivative Works of, and make modifications to, the Stability AI Materials for any Research or Non-Commercial Purpose. “Research Purpose” means academic or scientific advancement, and in each case, is not primarily intended for commercial advantage or monetary compensation to You or others. “Non-Commercial Purpose” means any purpose other than a Research Purpose that is not primarily intended for commercial advantage or monetary compensation to You or others, such as personal use (i.e., hobbyist) or evaluation and testing.
|
| 16 |
+
|
| 17 |
+
3. COMMERCIAL USE LICENSE
|
| 18 |
+
|
| 19 |
+
Subject to the terms of this Agreement (including the remainder of this Section III), Stability AI grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable and royalty-free limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Stability AI Materials to use, reproduce, distribute, and create Derivative Works of, and make modifications to, the Stability AI Materials for any Commercial Purpose. “Commercial Purpose” means any purpose other than a Research Purpose or Non-Commercial Purpose that is primarily intended for commercial advantage or monetary compensation to You or others, including but not limited to, (i) creating, modifying, or distributing Your product or service, including via a hosted service or application programming interface, and (ii) for Your business’s or organization’s internal operations.
|
| 20 |
+
If You are using or distributing the Stability AI Materials for a Commercial Purpose, You must register with Stability AI at (https://stability.ai/community-license). If at any time You or Your Affiliate(s), either individually or in aggregate, generate more than USD $1,000,000 in annual revenue (or the equivalent thereof in Your local currency), regardless of whether that revenue is generated directly or indirectly from the Stability AI Materials or Derivative Works, any licenses granted to You under this Agreement shall terminate as of such date. You must request a license from Stability AI at (https://stability.ai/enterprise) , which Stability AI may grant to You in its sole discretion. If you receive Stability AI Materials, or any Derivative Works thereof, from a Licensee as part of an integrated end user product, then Section III of this Agreement will not apply to you.
|
| 21 |
+
|
| 22 |
+
4. GENERAL TERMS
|
| 23 |
+
|
| 24 |
+
Your Research, Non-Commercial, and Commercial License(s) under this Agreement are subject to the following terms.
|
| 25 |
+
a. Distribution & Attribution. If You distribute or make available the Stability AI Materials or a Derivative Work to a third party, or a product or service that uses any portion of them, You shall: (i) provide a copy of this Agreement to that third party, (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "This Stability AI Model is licensed under the Stability AI Community License, Copyright © Stability AI Ltd. All Rights Reserved”, and (iii) prominently display “Powered by Stability AI” on a related website, user interface, blogpost, about page, or product documentation. If You create a Derivative Work, You may add your own attribution notice(s) to the “Notice” text file included with that Derivative Work, provided that You clearly indicate which attributions apply to the Stability AI Materials and state in the “Notice” text file that You changed the Stability AI Materials and how it was modified.
|
| 26 |
+
b. Use Restrictions. Your use of the Stability AI Materials and Derivative Works, including any output or results of the Stability AI Materials or Derivative Works, must comply with applicable laws and regulations (including Trade Control Laws and equivalent regulations) and adhere to the Documentation and Stability AI’s AUP, which is hereby incorporated by reference. Furthermore, You will not use the Stability AI Materials or Derivative Works, or any output or results of the Stability AI Materials or Derivative Works, to create or improve any foundational generative AI model (excluding the Models or Derivative Works).
|
| 27 |
+
c. Intellectual Property.
|
| 28 |
+
(i) Trademark License. No trademark licenses are granted under this Agreement, and in connection with the Stability AI Materials or Derivative Works, You may not use any name or mark owned by or associated with Stability AI or any of its Affiliates, except as required under Section IV(a) herein.
|
| 29 |
+
(ii) Ownership of Derivative Works. As between You and Stability AI, You are the owner of Derivative Works You create, subject to Stability AI’s ownership of the Stability AI Materials and any Derivative Works made by or for Stability AI.
|
| 30 |
+
(iii) Ownership of Outputs. As between You and Stability AI, You own any outputs generated from the Models or Derivative Works to the extent permitted by applicable law.
|
| 31 |
+
(iv) Disputes. If You or Your Affiliate(s) institute litigation or other proceedings against Stability AI (including a cross-claim or counterclaim in a lawsuit) alleging that the Stability AI Materials, Derivative Works or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by You, then any licenses granted to You under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Stability AI from and against any claim by any third party arising out of or related to Your use or distribution of the Stability AI Materials or Derivative Works in violation of this Agreement.
|
| 32 |
+
(v) Feedback. From time to time, You may provide Stability AI with verbal and/or written suggestions, comments or other feedback related to Stability AI’s existing or prospective technology, products or services (collectively, “Feedback”). You are not obligated to provide Stability AI with Feedback, but to the extent that You do, You hereby grant Stability AI a perpetual, irrevocable, royalty-free, fully-paid, sub-licensable, transferable, non-exclusive, worldwide right and license to exploit the Feedback in any manner without restriction. Your Feedback is provided “AS IS” and You make no warranties whatsoever about any Feedback.
|
| 33 |
+
d. Disclaimer Of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE STABILITY AI MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OR LAWFULNESS OF USING OR REDISTRIBUTING THE STABILITY AI MATERIALS, DERIVATIVE WORKS OR ANY OUTPUT OR RESULTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE STABILITY AI MATERIALS, DERIVATIVE WORKS AND ANY OUTPUT AND RESULTS.
|
| 34 |
+
e. Limitation Of Liability. IN NO EVENT WILL STABILITY AI OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF STABILITY AI OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
| 35 |
+
f. Term And Termination. The term of this Agreement will commence upon Your acceptance of this Agreement or access to the Stability AI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if You are in breach of any term or condition of this Agreement. Upon termination of this Agreement, You shall delete and cease use of any Stability AI Materials or Derivative Works. Section IV(d), (e), and (g) shall survive the termination of this Agreement.
|
| 36 |
+
g. Governing Law. This Agreement will be governed by and constructed in accordance with the laws of the United States and the State of California without regard to choice of law principles, and the UN Convention on Contracts for International Sale of Goods does not apply to this Agreement.
|
| 37 |
+
|
| 38 |
+
5. DEFINITIONS
|
| 39 |
+
|
| 40 |
+
“Affiliate(s)” means any entity that directly or indirectly controls, is controlled by, or is under common control with the subject entity; for purposes of this definition, “control” means direct or indirect ownership or control of more than 50% of the voting interests of the subject entity.
|
| 41 |
+
|
| 42 |
+
"Agreement" means this Stability AI Community License Agreement.
|
| 43 |
+
|
| 44 |
+
“AUP” means the Stability AI Acceptable Use Policy available at (https://stability.ai/use-policy), as may be updated from time to time.
|
| 45 |
+
|
| 46 |
+
"Derivative Work(s)” means (a) any derivative work of the Stability AI Materials as recognized by U.S. copyright laws and (b) any modifications to a Model, and any other model created which is based on or derived from the Model or the Model’s output, including “fine tune” and “low-rank adaptation” models derived from a Model or a Model’s output, but do not include the output of any Model.
|
| 47 |
+
|
| 48 |
+
“Documentation” means any specifications, manuals, documentation, and other written information provided by Stability AI related to the Software or Models.
|
| 49 |
+
|
| 50 |
+
“Model(s)" means, collectively, Stability AI’s proprietary models and algorithms, including machine-learning models, trained model weights and other elements of the foregoing listed on Stability’s Core Models Webpage available at (https://stability.ai/core-models), as may be updated from time to time.
|
| 51 |
+
|
| 52 |
+
"Stability AI" or "we" means Stability AI Ltd. and its Affiliates.
|
| 53 |
+
|
| 54 |
+
"Software" means Stability AI’s proprietary software made available under this Agreement now or in the future.
|
| 55 |
+
|
| 56 |
+
“Stability AI Materials” means, collectively, Stability’s proprietary Models, Software and Documentation (and any portion or combination thereof) made available under this Agreement.
|
| 57 |
+
|
| 58 |
+
“Trade Control Laws” means any applicable U.S. and non-U.S. export control and trade sanctions laws and regulations.
|
Stable-Audio-Open-1.0/README.md
ADDED
|
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
library_name: stable-audio-tools
|
| 5 |
+
license: other
|
| 6 |
+
license_name: stable-audio-community
|
| 7 |
+
license_link: LICENSE
|
| 8 |
+
pipeline_tag: text-to-audio
|
| 9 |
+
extra_gated_prompt: By clicking "Agree", you agree to the [License Agreement](https://huggingface.co/stabilityai/stable-audio-open-1.0/blob/main/LICENSE.md)
|
| 10 |
+
and acknowledge Stability AI's [Privacy Policy](https://stability.ai/privacy-policy).
|
| 11 |
+
extra_gated_fields:
|
| 12 |
+
Name: text
|
| 13 |
+
Email: text
|
| 14 |
+
Country: country
|
| 15 |
+
Organization or Affiliation: text
|
| 16 |
+
Receive email updates and promotions on Stability AI products, services, and research?:
|
| 17 |
+
type: select
|
| 18 |
+
options:
|
| 19 |
+
- 'Yes'
|
| 20 |
+
- 'No'
|
| 21 |
+
What do you intend to use the model for?:
|
| 22 |
+
type: select
|
| 23 |
+
options:
|
| 24 |
+
- Research
|
| 25 |
+
- Personal use
|
| 26 |
+
- Creative Professional
|
| 27 |
+
- Startup
|
| 28 |
+
- Enterprise
|
| 29 |
+
I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
# Stable Audio Open 1.0
|
| 33 |
+
|
| 34 |
+

|
| 35 |
+
|
| 36 |
+
Please note: For commercial use, please refer to [https://stability.ai/license](https://stability.ai/license)
|
| 37 |
+
|
| 38 |
+
## Model Description
|
| 39 |
+
`Stable Audio Open 1.0` generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder.
|
| 40 |
+
|
| 41 |
+
## Usage
|
| 42 |
+
|
| 43 |
+
This model can be used with:
|
| 44 |
+
1. the [`stable-audio-tools`](https://github.com/Stability-AI/stable-audio-tools) library
|
| 45 |
+
2. the [`diffusers`](https://huggingface.co/docs/diffusers/main/en/index) library
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
### Using with `stable-audio-tools`
|
| 49 |
+
|
| 50 |
+
This model is made to be used with the [`stable-audio-tools`](https://github.com/Stability-AI/stable-audio-tools) library for inference, for example:
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
import torch
|
| 54 |
+
import torchaudio
|
| 55 |
+
from einops import rearrange
|
| 56 |
+
from stable_audio_tools import get_pretrained_model
|
| 57 |
+
from stable_audio_tools.inference.generation import generate_diffusion_cond
|
| 58 |
+
|
| 59 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 60 |
+
|
| 61 |
+
# Download model
|
| 62 |
+
model, model_config = get_pretrained_model("stabilityai/stable-audio-open-1.0")
|
| 63 |
+
sample_rate = model_config["sample_rate"]
|
| 64 |
+
sample_size = model_config["sample_size"]
|
| 65 |
+
|
| 66 |
+
model = model.to(device)
|
| 67 |
+
|
| 68 |
+
# Set up text and timing conditioning
|
| 69 |
+
conditioning = [{
|
| 70 |
+
"prompt": "128 BPM tech house drum loop",
|
| 71 |
+
"seconds_start": 0,
|
| 72 |
+
"seconds_total": 30
|
| 73 |
+
}]
|
| 74 |
+
|
| 75 |
+
# Generate stereo audio
|
| 76 |
+
output = generate_diffusion_cond(
|
| 77 |
+
model,
|
| 78 |
+
steps=100,
|
| 79 |
+
cfg_scale=7,
|
| 80 |
+
conditioning=conditioning,
|
| 81 |
+
sample_size=sample_size,
|
| 82 |
+
sigma_min=0.3,
|
| 83 |
+
sigma_max=500,
|
| 84 |
+
sampler_type="dpmpp-3m-sde",
|
| 85 |
+
device=device
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
# Rearrange audio batch to a single sequence
|
| 89 |
+
output = rearrange(output, "b d n -> d (b n)")
|
| 90 |
+
|
| 91 |
+
# Peak normalize, clip, convert to int16, and save to file
|
| 92 |
+
output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
|
| 93 |
+
torchaudio.save("output.wav", output, sample_rate)
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Using with `diffusers`
|
| 97 |
+
|
| 98 |
+
Make sure you upgrade to the latest version of diffusers: `pip install -U diffusers`. And then you can run:
|
| 99 |
+
|
| 100 |
+
```py
|
| 101 |
+
import torch
|
| 102 |
+
import soundfile as sf
|
| 103 |
+
from diffusers import StableAudioPipeline
|
| 104 |
+
|
| 105 |
+
pipe = StableAudioPipeline.from_pretrained("stabilityai/stable-audio-open-1.0", torch_dtype=torch.float16)
|
| 106 |
+
pipe = pipe.to("cuda")
|
| 107 |
+
|
| 108 |
+
# define the prompts
|
| 109 |
+
prompt = "The sound of a hammer hitting a wooden surface."
|
| 110 |
+
negative_prompt = "Low quality."
|
| 111 |
+
|
| 112 |
+
# set the seed for generator
|
| 113 |
+
generator = torch.Generator("cuda").manual_seed(0)
|
| 114 |
+
|
| 115 |
+
# run the generation
|
| 116 |
+
audio = pipe(
|
| 117 |
+
prompt,
|
| 118 |
+
negative_prompt=negative_prompt,
|
| 119 |
+
num_inference_steps=200,
|
| 120 |
+
audio_end_in_s=10.0,
|
| 121 |
+
num_waveforms_per_prompt=3,
|
| 122 |
+
generator=generator,
|
| 123 |
+
).audios
|
| 124 |
+
|
| 125 |
+
output = audio[0].T.float().cpu().numpy()
|
| 126 |
+
sf.write("hammer.wav", output, pipe.vae.sampling_rate)
|
| 127 |
+
|
| 128 |
+
```
|
| 129 |
+
Refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/index) for more details on optimization and usage.
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
## Model Details
|
| 135 |
+
* **Model type**: `Stable Audio Open 1.0` is a latent diffusion model based on a transformer architecture.
|
| 136 |
+
* **Language(s)**: English
|
| 137 |
+
* **License**: [Stability AI Community License](https://huggingface.co/stabilityai/stable-audio-open-1.0/blob/main/LICENSE.md).
|
| 138 |
+
* **Commercial License**: to use this model commercially, please refer to [https://stability.ai/license](https://stability.ai/license)
|
| 139 |
+
* **Research Paper**: [https://arxiv.org/abs/2407.14358](https://arxiv.org/abs/2407.14358)
|
| 140 |
+
|
| 141 |
+
## Training dataset
|
| 142 |
+
|
| 143 |
+
### Datasets Used
|
| 144 |
+
Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model ([t5-base](https://huggingface.co/google-t5/t5-base)) for text conditioning.
|
| 145 |
+
|
| 146 |
+
### Attribution
|
| 147 |
+
Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found on our [attribution page](https://info.stability.ai/attributions).
|
| 148 |
+
|
| 149 |
+
### Mitigations
|
| 150 |
+
We conducted an in-depth analysis to ensure no unauthorized copyrighted music was present in our training data before we began training.
|
| 151 |
+
|
| 152 |
+
To that end, we first identified music samples in Freesound using the [PANNs](https://github.com/qiuqiangkong/audioset_tagging_cnn) music classifier based on AudioSet classes. The identified music samples had at least 30 seconds of music that was predicted to belong to a music-related class with a threshold of 0.15 (PANNs output probabilities range from 0 to 1). This threshold was determined by classifying known music examples from FMA and ensuring no false negatives were present.
|
| 153 |
+
|
| 154 |
+
The identified music samples were sent to Audible Magic’s identification services, a trusted content detection company, to ensure the absence of copyrighted music. Audible Magic flagged suspected copyrighted music, which we subsequently removed before training on the dataset. The majority of the removed content was field recordings in which copyrighted music was playing in the background. Following this procedure, we were left with 266324 CC0, 194840 CC-BY, and 11454 CC Sampling+ audio recordings.
|
| 155 |
+
|
| 156 |
+
We also conducted an in-depth analysis to ensure no copyrighted content was present in FMA's subset. In this case, the procedure was slightly different because the FMA subset consists of music signals. We did a metadata search against a large database of copyrighted music (https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset) and flagged any potential match. The flagged content was reviewed individually by humans. After this process, we ended up with 8967 CC-BY and 4907 CC0 tracks.
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
## Use and Limitations
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
### Intended Use
|
| 163 |
+
The primary use of Stable Audio Open is research and experimentation on AI-based music and audio generation, including:
|
| 164 |
+
|
| 165 |
+
- Research efforts to better understand the limitations of generative models and further improve the state of science.
|
| 166 |
+
- Generation of music and audio guided by text to explore current abilities of generative AI models by machine learning practitioners and artists.
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
### Out-of-Scope Use Cases
|
| 170 |
+
The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate audio or music pieces that create hostile or alienating environments for people.
|
| 171 |
+
|
| 172 |
+
|
| 173 |
+
### Limitations
|
| 174 |
+
- The model is not able to generate realistic vocals.
|
| 175 |
+
- The model has been trained with English descriptions and will not perform as well in other languages.
|
| 176 |
+
- The model does not perform equally well for all music styles and cultures.
|
| 177 |
+
- The model is better at generating sound effects and field recordings than music.
|
| 178 |
+
- It is sometimes difficult to assess what types of text descriptions provide the best generations. Prompt engineering may be required to obtain satisfying results.
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
### Biases
|
| 182 |
+
The source of data is potentially lacking diversity and all cultures are not equally represented in the dataset. The model may not perform equally well on the wide variety of music genres and sound effects that exist. The generated samples from the model will reflect the biases from the training data.
|