ConvTasNet (code, models, paper)
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +2 -0
- An empirical study of Conv-TasNet.pdf +3 -0
- Conv-TasNet. Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.pdf +3 -0
- code/Conv-TasNet [jwr1995] +2 -1 original-model +74 -1 multichan +68 -1.zip +3 -0
- code/Conv-TasNet [nobel861017] +16 -1 DEMO.zip +3 -0
- code/Conv-TasNet [perottievan] +10.zip +3 -0
- code/Conv-TasNet [yoshonabee] +8 -1.zip +3 -0
- code/Conv-TasNet [zhenhaoge] +2.zip +3 -0
- code/Conv-TasNet.zip +3 -0
- code/Forked-Conv-TasNet [OfekCohen1] +8 -1.zip +3 -0
- models/ConvTasNet-DAMP-Vocals/.gitattributes +16 -0
- models/ConvTasNet-DAMP-Vocals/README.md +69 -0
- models/ConvTasNet-DAMP-Vocals/metadata.json +1 -0
- models/ConvTasNet-DAMP-Vocals/model.pt +3 -0
- models/ConvTasNet-DAMP-Vocals/source.txt +1 -0
- models/ConvTasNet-IF-Itera-SepNoisy8k-FT/.gitattributes +35 -0
- models/ConvTasNet-IF-Itera-SepNoisy8k-FT/ConvTasNet-IF-Itera-SepNoisy8k-FT.pth +3 -0
- models/ConvTasNet-IF-Itera-SepNoisy8k-FT/README.md +75 -0
- models/ConvTasNet-IF-Itera-SepNoisy8k-FT/source.txt +1 -0
- models/ConvTasNet-ONNX (broken)/conv_tasnet.onnx +3 -0
- models/ConvTasNet-ONNX (broken)/conv_tasnet.py +393 -0
- models/ConvTasNet-ONNX (broken)/source.txt +5 -0
- models/ConvTasNet-ONNX/conv_tasnet.onnx +3 -0
- models/ConvTasNet-ONNX/source.txt +2 -0
- models/ConvTasNet_DAMP-VSEP_enhboth/.gitattributes +16 -0
- models/ConvTasNet_DAMP-VSEP_enhboth/README.md +73 -0
- models/ConvTasNet_DAMP-VSEP_enhboth/pytorch_model.bin +3 -0
- models/ConvTasNet_DAMP-VSEP_enhboth/source.txt +1 -0
- models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/.gitattributes +27 -0
- models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/README.md +106 -0
- models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/pytorch_model.bin +3 -0
- models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/source.txt +1 -0
- models/ConvTasNet_Libri1Mix_enhsignle_16k/.gitattributes +17 -0
- models/ConvTasNet_Libri1Mix_enhsignle_16k/metadata.json +1 -0
- models/ConvTasNet_Libri1Mix_enhsignle_16k/model.pt +3 -0
- models/ConvTasNet_Libri1Mix_enhsignle_16k/source.txt +1 -0
- models/ConvTasNet_Libri1Mix_enhsingle_8k/.gitattributes +16 -0
- models/ConvTasNet_Libri1Mix_enhsingle_8k/README.md +73 -0
- models/ConvTasNet_Libri1Mix_enhsingle_8k/pytorch_model.bin +3 -0
- models/ConvTasNet_Libri1Mix_enhsingle_8k/source.txt +1 -0
- models/ConvTasNet_Libri2Mix_SepClean/.gitattributes +34 -0
- models/ConvTasNet_Libri2Mix_SepClean/README.md +25 -0
- models/ConvTasNet_Libri2Mix_SepClean/model.bin +3 -0
- models/ConvTasNet_Libri2Mix_SepClean/source.txt +1 -0
- models/ConvTasNet_Libri2Mix_sepclean_16k/.gitattributes +9 -0
- models/ConvTasNet_Libri2Mix_sepclean_16k/README.md +74 -0
- models/ConvTasNet_Libri2Mix_sepclean_16k/pytorch_model.bin +3 -0
- models/ConvTasNet_Libri2Mix_sepclean_16k/source.txt +1 -0
- models/ConvTasNet_Libri2Mix_sepclean_8k/.gitattributes +9 -0
- models/ConvTasNet_Libri2Mix_sepclean_8k/README.md +75 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
An[[:space:]]empirical[[:space:]]study[[:space:]]of[[:space:]]Conv-TasNet.pdf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Conv-TasNet.[[:space:]]Surpassing[[:space:]]Ideal[[:space:]]Time-Frequency[[:space:]]Magnitude[[:space:]]Masking[[:space:]]for[[:space:]]Speech[[:space:]]Separation.pdf filter=lfs diff=lfs merge=lfs -text
|
An empirical study of Conv-TasNet.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2eaf57ff260e0e5f6b7c91ce84666dcdbda886639df6f759c85b5a4dbb6cfa99
|
| 3 |
+
size 2052626
|
Conv-TasNet. Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:27983424312ccfe350faa0cafbeff880a5b7ac165ecc74114b8f370eed20e9ce
|
| 3 |
+
size 1121503
|
code/Conv-TasNet [jwr1995] +2 -1 original-model +74 -1 multichan +68 -1.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:60f7e7dcf59d3c2c4a2e18bba2241b9e993690baf5f38c8e02121b38d0db8ad3
|
| 3 |
+
size 2722654
|
code/Conv-TasNet [nobel861017] +16 -1 DEMO.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c602cf74f1f83271458f77e4d080aab50b7ff565fa7cf36e622751084e39413e
|
| 3 |
+
size 119439140
|
code/Conv-TasNet [perottievan] +10.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:05e9da2c9ae3f630622f6f92a34bf0f7a55e4046813d3f81e75cf7f711b82df0
|
| 3 |
+
size 2635035
|
code/Conv-TasNet [yoshonabee] +8 -1.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5e940a912feffdb915a3159598abec64eda6ba9163040822654d4d902534315f
|
| 3 |
+
size 2574608
|
code/Conv-TasNet [zhenhaoge] +2.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7f79df4cd604fb52a6d9293daedbb7a6128d15a467891b85cd29a081b44a74ee
|
| 3 |
+
size 2683582
|
code/Conv-TasNet.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e81ade8475daa75010b757bbe075b3621c35fee4aa3bf29fa3cfb45473df038f
|
| 3 |
+
size 2620033
|
code/Forked-Conv-TasNet [OfekCohen1] +8 -1.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:347cd40cccd86f013501f1197f8505f75247fdbf2a2443aff8621ae6cf4ed31f
|
| 3 |
+
size 2439577
|
models/ConvTasNet-DAMP-Vocals/.gitattributes
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet-DAMP-Vocals/README.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- audacity
|
| 4 |
+
inference: false
|
| 5 |
+
sample_rate: 8000
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
This is an Audacity wrapper for the model, forked from the repository `groadabike/ConvTasNet_DAMP-VSEP_enhboth`,
|
| 10 |
+
This model was trained using the Asteroid library: https://github.com/asteroid-team/asteroid.
|
| 11 |
+
|
| 12 |
+
The following info was copied directly from `groadabike/ConvTasNet_DAMP-VSEP_enhboth`:
|
| 13 |
+
|
| 14 |
+
### Description:
|
| 15 |
+
This model was trained by Gerardo Roa Dabike using Asteroid. It was trained on the enh_both task of the DAMP-VSEP dataset.
|
| 16 |
+
### Training config:
|
| 17 |
+
```yaml
|
| 18 |
+
data:
|
| 19 |
+
channels: 1
|
| 20 |
+
n_src: 2
|
| 21 |
+
root_path: data
|
| 22 |
+
sample_rate: 16000
|
| 23 |
+
samples_per_track: 10
|
| 24 |
+
segment: 3.0
|
| 25 |
+
task: enh_both
|
| 26 |
+
filterbank:
|
| 27 |
+
kernel_size: 20
|
| 28 |
+
n_filters: 256
|
| 29 |
+
stride: 10
|
| 30 |
+
main_args:
|
| 31 |
+
exp_dir: exp/train_convtasnet
|
| 32 |
+
help: None
|
| 33 |
+
masknet:
|
| 34 |
+
bn_chan: 256
|
| 35 |
+
conv_kernel_size: 3
|
| 36 |
+
hid_chan: 512
|
| 37 |
+
mask_act: relu
|
| 38 |
+
n_blocks: 8
|
| 39 |
+
n_repeats: 4
|
| 40 |
+
n_src: 2
|
| 41 |
+
norm_type: gLN
|
| 42 |
+
skip_chan: 256
|
| 43 |
+
optim:
|
| 44 |
+
lr: 0.0003
|
| 45 |
+
optimizer: adam
|
| 46 |
+
weight_decay: 0.0
|
| 47 |
+
positional arguments:
|
| 48 |
+
training:
|
| 49 |
+
batch_size: 12
|
| 50 |
+
early_stop: True
|
| 51 |
+
epochs: 50
|
| 52 |
+
half_lr: True
|
| 53 |
+
num_workers: 12
|
| 54 |
+
```
|
| 55 |
+
### Results:
|
| 56 |
+
```yaml
|
| 57 |
+
si_sdr: 14.018196157142519
|
| 58 |
+
si_sdr_imp: 14.017103133809577
|
| 59 |
+
sdr: 14.498517291333885
|
| 60 |
+
sdr_imp: 14.463389151567865
|
| 61 |
+
sir: 24.149634529133372
|
| 62 |
+
sir_imp: 24.11450638936735
|
| 63 |
+
sar: 15.338597389045935
|
| 64 |
+
sar_imp: -137.30634122401517
|
| 65 |
+
stoi: 0.7639416744417206
|
| 66 |
+
stoi_imp: 0.1843383526963759
|
| 67 |
+
```
|
| 68 |
+
### License notice:
|
| 69 |
+
This work "ConvTasNet_DAMP-VSEP_enhboth" is a derivative of DAMP-VSEP: Smule Digital Archive of Mobile Performances - Vocal Separation (Version 1.0.1) by Smule, Inc, used under Smule's Research Data License Agreement (Research only). "ConvTasNet_DAMP-VSEP_enhboth" is licensed under Attribution-ShareAlike 3.0 Unported by Gerardo Roa Dabike.
|
models/ConvTasNet-DAMP-Vocals/metadata.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"sample_rate": 8000, "domain_tags": ["music"], "tags": ["vocals separation"], "effect_type": "waveform-to-waveform", "multichannel": false, "labels": ["source-0", "source-1"], "short_description": "Use me for separating vocals from music!", "long_description": "Instant karaoke! A vocals separation model, trained on the DAMP dataset. Forked from groadabike/ConvTasNet_DAMP-VSEP_enhboth. Trained using Asteroid."}
|
models/ConvTasNet-DAMP-Vocals/model.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2304be278f6d81ca09b27fc8d68359fa66fb7d2d46b65a8da38fb854fceb2648
|
| 3 |
+
size 52373994
|
models/ConvTasNet-DAMP-Vocals/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/hugggof/ConvTasNet-DAMP-Vocals
|
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/ConvTasNet-IF-Itera-SepNoisy8k-FT.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:62d4bd9e929d95b7b407778c03b29b8dc6fee87d826ab4bea18aba5f9ac406bd
|
| 3 |
+
size 20273170
|
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/README.md
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- id
|
| 5 |
+
- en
|
| 6 |
+
library_name: pytorch
|
| 7 |
+
tags:
|
| 8 |
+
- audio-source-separation
|
| 9 |
+
- speech-separation
|
| 10 |
+
- convtasnet
|
| 11 |
+
- asteroid
|
| 12 |
+
- itera
|
| 13 |
+
datasets:
|
| 14 |
+
- librimix
|
| 15 |
+
- custom-indonesian-noisy-speech
|
| 16 |
+
metrics:
|
| 17 |
+
- si-sdr
|
| 18 |
+
base_model: JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k
|
| 19 |
+
pipeline_tag: audio-to-audio
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Fine-tuned model: [FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT](https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT)
|
| 23 |
+
|
| 24 |
+
Model ini adalah versi *fine-tuned* dari [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k).
|
| 25 |
+
|
| 26 |
+
### Description:
|
| 27 |
+
Model ini di-*fine-tuning* oleh peneliti dari **Teknik Informatika, Institut Teknologi Sumatera (ITERA)**. Proses *fine-tuning* menggunakan skrip yang tersedia di [repositori GitHub proyek](https://github.com/fransiskus-121140010/itera-informatics-convtasnet-ft). Model dilatih pada dataset *custom* yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam *noise*.
|
| 28 |
+
|
| 29 |
+
### Fine-tuning config:
|
| 30 |
+
```yaml
|
| 31 |
+
# Konfigurasi yang digunakan selama fine-tuning
|
| 32 |
+
data:
|
| 33 |
+
root: "data/processed/"
|
| 34 |
+
sample_rate: 8000
|
| 35 |
+
segment_seconds: 4
|
| 36 |
+
num_workers: 4
|
| 37 |
+
|
| 38 |
+
training:
|
| 39 |
+
project_name: "itera-speech-separation-ft"
|
| 40 |
+
model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training
|
| 41 |
+
epochs: 50
|
| 42 |
+
batch_size: 8
|
| 43 |
+
learning_rate: 0.0005
|
| 44 |
+
gradient_clip_val: 0.5
|
| 45 |
+
precision: "16-mixed"
|
| 46 |
+
early_stopping_patience: 5
|
| 47 |
+
|
| 48 |
+
model:
|
| 49 |
+
freeze_encoder_decoder: false
|
| 50 |
+
|
| 51 |
+
remix:
|
| 52 |
+
dynamic: true
|
| 53 |
+
snr_low: 0.0
|
| 54 |
+
snr_high: 10.0
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Results
|
| 58 |
+
|
| 59 |
+
Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut:
|
| 60 |
+
```yaml
|
| 61 |
+
si_sdr:
|
| 62 |
+
baseline_score: -30.2842
|
| 63 |
+
fine_tuned_score: -24.9016
|
| 64 |
+
improvement: +5.3826
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
### License Notice
|
| 68 |
+
|
| 69 |
+
This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). The original work is a derivative of:
|
| 70 |
+
> * [LibriSpeech ASR corpus](https://www.openslr.org/12) by Vassil Panayotov, used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/);
|
| 71 |
+
> * The WSJ0 Hipster Ambient Mixtures dataset by [Whisper.ai](https://whisper.ai/), used under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
|
| 72 |
+
>
|
| 73 |
+
> The original work is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Joris Cosentino.
|
| 74 |
+
|
| 75 |
+
This derivative work is licensed under the **[MIT License](https://opensource.org/licenses/MIT)** by the project authors at Institut Teknologi Sumatera.
|
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT
|
models/ConvTasNet-ONNX (broken)/conv_tasnet.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1a48dec63f5c8691482d8cd5560fa1ba7b3d449fff378fa8085fc66012186a6c
|
| 3 |
+
size 35618928
|
models/ConvTasNet-ONNX (broken)/conv_tasnet.py
ADDED
|
@@ -0,0 +1,393 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import torch
|
| 2 |
+
import torch.nn as nn
|
| 3 |
+
import torch.nn.functional as F
|
| 4 |
+
|
| 5 |
+
from signal_processors.conv_tasnet.utils import overlap_and_add
|
| 6 |
+
|
| 7 |
+
EPS = 1e-8
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class ConvTasNet(nn.Module):
|
| 11 |
+
def __init__(self, N, L, B, H, P, X, R, C, norm_type="gLN", causal=False,
|
| 12 |
+
mask_nonlinear='relu'):
|
| 13 |
+
"""
|
| 14 |
+
Args:
|
| 15 |
+
N: Number of filters in autoencoder
|
| 16 |
+
L: Length of the filters (in samples)
|
| 17 |
+
B: Number of channels in bottleneck 1 × 1-conv block
|
| 18 |
+
H: Number of channels in convolutional blocks
|
| 19 |
+
P: Kernel size in convolutional blocks
|
| 20 |
+
X: Number of convolutional blocks in each repeat
|
| 21 |
+
R: Number of repeats
|
| 22 |
+
C: Number of speakers
|
| 23 |
+
norm_type: BN, gLN, cLN
|
| 24 |
+
causal: causal or non-causal
|
| 25 |
+
mask_nonlinear: use which non-linear function to generate mask
|
| 26 |
+
"""
|
| 27 |
+
super(ConvTasNet, self).__init__()
|
| 28 |
+
# Hyper-parameter
|
| 29 |
+
self.N, self.L, self.B, self.H, self.P, self.X, self.R, self.C = N, L, B, H, P, X, R, C
|
| 30 |
+
self.norm_type = norm_type
|
| 31 |
+
self.causal = causal
|
| 32 |
+
self.mask_nonlinear = mask_nonlinear
|
| 33 |
+
# Components
|
| 34 |
+
self.encoder = Encoder(L, N)
|
| 35 |
+
self.separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type, causal, mask_nonlinear)
|
| 36 |
+
self.decoder = Decoder(N, L)
|
| 37 |
+
# init
|
| 38 |
+
for p in self.parameters():
|
| 39 |
+
if p.dim() > 1:
|
| 40 |
+
nn.init.xavier_normal_(p)
|
| 41 |
+
|
| 42 |
+
def forward(self, mixture):
|
| 43 |
+
"""
|
| 44 |
+
Args:
|
| 45 |
+
mixture: [M, T], M is batch size, T is #samples
|
| 46 |
+
Returns:
|
| 47 |
+
est_source: [M, C, T]
|
| 48 |
+
"""
|
| 49 |
+
mixture_w = self.encoder(mixture)
|
| 50 |
+
est_mask = self.separator(mixture_w)
|
| 51 |
+
est_source = self.decoder(mixture_w, est_mask)
|
| 52 |
+
|
| 53 |
+
# T changed after conv1d in encoder, fix it here
|
| 54 |
+
# T_origin = mixture.size(-1)
|
| 55 |
+
# T_conv = est_source.size(-1)
|
| 56 |
+
T_origin = torch.tensor(88200)
|
| 57 |
+
T_conv = torch.tensor(88200)
|
| 58 |
+
est_source = F.pad(est_source, (0, T_origin - T_conv))
|
| 59 |
+
return est_source
|
| 60 |
+
|
| 61 |
+
@classmethod
|
| 62 |
+
def load_model(cls, path):
|
| 63 |
+
# Load to CPU
|
| 64 |
+
package = torch.load(path, map_location=lambda storage, loc: storage)
|
| 65 |
+
model = cls.load_model_from_package(package)
|
| 66 |
+
return model
|
| 67 |
+
|
| 68 |
+
@classmethod
|
| 69 |
+
def load_model_from_package(cls, package):
|
| 70 |
+
model = cls(package['N'], package['L'], package['B'], package['H'],
|
| 71 |
+
package['P'], package['X'], package['R'], package['C'],
|
| 72 |
+
norm_type=package['norm_type'], causal=package['causal'],
|
| 73 |
+
mask_nonlinear=package['mask_nonlinear'])
|
| 74 |
+
model.load_state_dict(package['state_dict'])
|
| 75 |
+
return model
|
| 76 |
+
|
| 77 |
+
@staticmethod
|
| 78 |
+
def serialize(model, optimizer, epoch, tr_loss=None, cv_loss=None):
|
| 79 |
+
package = {
|
| 80 |
+
# hyper-parameter
|
| 81 |
+
'N': model.N, 'L': model.L, 'B': model.B, 'H': model.H,
|
| 82 |
+
'P': model.P, 'X': model.X, 'R': model.R, 'C': model.C,
|
| 83 |
+
'norm_type': model.norm_type, 'causal': model.causal,
|
| 84 |
+
'mask_nonlinear': model.mask_nonlinear,
|
| 85 |
+
# state
|
| 86 |
+
'state_dict': model.state_dict(),
|
| 87 |
+
'optim_dict': optimizer.state_dict(),
|
| 88 |
+
'epoch': epoch
|
| 89 |
+
}
|
| 90 |
+
if tr_loss is not None:
|
| 91 |
+
package['tr_loss'] = tr_loss
|
| 92 |
+
package['cv_loss'] = cv_loss
|
| 93 |
+
return package
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
class Encoder(nn.Module):
|
| 97 |
+
"""Estimation of the nonnegative mixture weight by a 1-D conv layer.
|
| 98 |
+
"""
|
| 99 |
+
|
| 100 |
+
def __init__(self, L, N):
|
| 101 |
+
super(Encoder, self).__init__()
|
| 102 |
+
# Hyper-parameter
|
| 103 |
+
self.L, self.N = L, N
|
| 104 |
+
# Components
|
| 105 |
+
# 50% overlap
|
| 106 |
+
self.conv1d_U = nn.Conv1d(1, N, kernel_size=L, stride=L // 2, bias=False)
|
| 107 |
+
|
| 108 |
+
def forward(self, mixture):
|
| 109 |
+
"""
|
| 110 |
+
Args:
|
| 111 |
+
mixture: [M, T], M is batch size, T is #samples
|
| 112 |
+
Returns:
|
| 113 |
+
mixture_w: [M, N, K], where K = (T-L)/(L/2)+1 = 2T/L-1
|
| 114 |
+
"""
|
| 115 |
+
mixture = torch.unsqueeze(mixture, 1) # [M, 1, T]
|
| 116 |
+
mixture_w = F.relu(self.conv1d_U(mixture)) # [M, N, K]
|
| 117 |
+
return mixture_w
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
class Decoder(nn.Module):
|
| 121 |
+
def __init__(self, N, L):
|
| 122 |
+
super(Decoder, self).__init__()
|
| 123 |
+
# Hyper-parameter
|
| 124 |
+
self.N, self.L = N, L
|
| 125 |
+
# Components
|
| 126 |
+
self.basis_signals = nn.Linear(N, L, bias=False)
|
| 127 |
+
|
| 128 |
+
def forward(self, mixture_w, est_mask):
|
| 129 |
+
"""
|
| 130 |
+
Args:
|
| 131 |
+
mixture_w: [M, N, K]
|
| 132 |
+
est_mask: [M, C, N, K]
|
| 133 |
+
Returns:
|
| 134 |
+
est_source: [M, C, T]
|
| 135 |
+
"""
|
| 136 |
+
# D = W * M
|
| 137 |
+
source_w = torch.unsqueeze(mixture_w, 1) * est_mask # [M, C, N, K]
|
| 138 |
+
source_w = torch.transpose(source_w, 2, 3) # [M, C, K, N]
|
| 139 |
+
# S = DV
|
| 140 |
+
est_source = self.basis_signals(source_w) # [M, C, K, L]
|
| 141 |
+
est_source = overlap_and_add(est_source, self.L // 2) # M x C x T
|
| 142 |
+
return est_source
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
class TemporalConvNet(nn.Module):
|
| 146 |
+
def __init__(self, N, B, H, P, X, R, C, norm_type="gLN", causal=False,
|
| 147 |
+
mask_nonlinear='relu'):
|
| 148 |
+
"""
|
| 149 |
+
Args:
|
| 150 |
+
N: Number of filters in autoencoder
|
| 151 |
+
B: Number of channels in bottleneck 1 × 1-conv block
|
| 152 |
+
H: Number of channels in convolutional blocks
|
| 153 |
+
P: Kernel size in convolutional blocks
|
| 154 |
+
X: Number of convolutional blocks in each repeat
|
| 155 |
+
R: Number of repeats
|
| 156 |
+
C: Number of speakers
|
| 157 |
+
norm_type: BN, gLN, cLN
|
| 158 |
+
causal: causal or non-causal
|
| 159 |
+
mask_nonlinear: use which non-linear function to generate mask
|
| 160 |
+
"""
|
| 161 |
+
super(TemporalConvNet, self).__init__()
|
| 162 |
+
# Hyper-parameter
|
| 163 |
+
self.C = C
|
| 164 |
+
self.mask_nonlinear = mask_nonlinear
|
| 165 |
+
# Components
|
| 166 |
+
# [M, N, K] -> [M, N, K]
|
| 167 |
+
layer_norm = ChannelwiseLayerNorm(N)
|
| 168 |
+
# [M, N, K] -> [M, B, K]
|
| 169 |
+
bottleneck_conv1x1 = nn.Conv1d(N, B, 1, bias=False)
|
| 170 |
+
# [M, B, K] -> [M, B, K]
|
| 171 |
+
repeats = []
|
| 172 |
+
for r in range(R):
|
| 173 |
+
blocks = []
|
| 174 |
+
for x in range(X):
|
| 175 |
+
dilation = 2 ** x
|
| 176 |
+
padding = (P - 1) * dilation if causal else (P - 1) * dilation // 2
|
| 177 |
+
blocks += [TemporalBlock(B, H, P, stride=1,
|
| 178 |
+
padding=padding,
|
| 179 |
+
dilation=dilation,
|
| 180 |
+
norm_type=norm_type,
|
| 181 |
+
causal=causal)]
|
| 182 |
+
repeats += [nn.Sequential(*blocks)]
|
| 183 |
+
temporal_conv_net = nn.Sequential(*repeats)
|
| 184 |
+
# [M, B, K] -> [M, C*N, K]
|
| 185 |
+
mask_conv1x1 = nn.Conv1d(B, C * N, 1, bias=False)
|
| 186 |
+
# Put together
|
| 187 |
+
self.network = nn.Sequential(layer_norm,
|
| 188 |
+
bottleneck_conv1x1,
|
| 189 |
+
temporal_conv_net,
|
| 190 |
+
mask_conv1x1)
|
| 191 |
+
|
| 192 |
+
def forward(self, mixture_w):
|
| 193 |
+
"""
|
| 194 |
+
Keep this API same with TasNet
|
| 195 |
+
Args:
|
| 196 |
+
mixture_w: [M, N, K], M is batch size
|
| 197 |
+
returns:
|
| 198 |
+
est_mask: [M, C, N, K]
|
| 199 |
+
"""
|
| 200 |
+
M, N, K = mixture_w.size()
|
| 201 |
+
score = self.network(mixture_w) # [M, N, K] -> [M, C*N, K]
|
| 202 |
+
score = score.view(M, self.C, N, K) # [M, C*N, K] -> [M, C, N, K]
|
| 203 |
+
if self.mask_nonlinear == 'softmax':
|
| 204 |
+
est_mask = F.softmax(score, dim=1)
|
| 205 |
+
elif self.mask_nonlinear == 'relu':
|
| 206 |
+
est_mask = F.relu(score)
|
| 207 |
+
else:
|
| 208 |
+
raise ValueError("Unsupported mask non-linear function")
|
| 209 |
+
return est_mask
|
| 210 |
+
|
| 211 |
+
|
| 212 |
+
class TemporalBlock(nn.Module):
|
| 213 |
+
def __init__(self, in_channels, out_channels, kernel_size,
|
| 214 |
+
stride, padding, dilation, norm_type="gLN", causal=False):
|
| 215 |
+
super(TemporalBlock, self).__init__()
|
| 216 |
+
# [M, B, K] -> [M, H, K]
|
| 217 |
+
conv1x1 = nn.Conv1d(in_channels, out_channels, 1, bias=False)
|
| 218 |
+
prelu = nn.PReLU()
|
| 219 |
+
norm = chose_norm(norm_type, out_channels)
|
| 220 |
+
# [M, H, K] -> [M, B, K]
|
| 221 |
+
dsconv = DepthwiseSeparableConv(out_channels, in_channels, kernel_size,
|
| 222 |
+
stride, padding, dilation, norm_type,
|
| 223 |
+
causal)
|
| 224 |
+
# Put together
|
| 225 |
+
self.net = nn.Sequential(conv1x1, prelu, norm, dsconv)
|
| 226 |
+
|
| 227 |
+
def forward(self, x):
|
| 228 |
+
"""
|
| 229 |
+
Args:
|
| 230 |
+
x: [M, B, K]
|
| 231 |
+
Returns:
|
| 232 |
+
[M, B, K]
|
| 233 |
+
"""
|
| 234 |
+
residual = x
|
| 235 |
+
out = self.net(x)
|
| 236 |
+
# TODO: when P = 3 here works fine, but when P = 2 maybe need to pad?
|
| 237 |
+
return out + residual # look like w/o F.relu is better than w/ F.relu
|
| 238 |
+
# return F.relu(out + residual)
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
class DepthwiseSeparableConv(nn.Module):
|
| 242 |
+
def __init__(self, in_channels, out_channels, kernel_size,
|
| 243 |
+
stride, padding, dilation, norm_type="gLN", causal=False):
|
| 244 |
+
super(DepthwiseSeparableConv, self).__init__()
|
| 245 |
+
# Use `groups` option to implement depthwise convolution
|
| 246 |
+
# [M, H, K] -> [M, H, K]
|
| 247 |
+
depthwise_conv = nn.Conv1d(in_channels, in_channels, kernel_size,
|
| 248 |
+
stride=stride, padding=padding,
|
| 249 |
+
dilation=dilation, groups=in_channels,
|
| 250 |
+
bias=False)
|
| 251 |
+
if causal:
|
| 252 |
+
chomp = Chomp1d(padding)
|
| 253 |
+
prelu = nn.PReLU()
|
| 254 |
+
norm = chose_norm(norm_type, in_channels)
|
| 255 |
+
# [M, H, K] -> [M, B, K]
|
| 256 |
+
pointwise_conv = nn.Conv1d(in_channels, out_channels, 1, bias=False)
|
| 257 |
+
# Put together
|
| 258 |
+
if causal:
|
| 259 |
+
self.net = nn.Sequential(depthwise_conv, chomp, prelu, norm,
|
| 260 |
+
pointwise_conv)
|
| 261 |
+
else:
|
| 262 |
+
self.net = nn.Sequential(depthwise_conv, prelu, norm,
|
| 263 |
+
pointwise_conv)
|
| 264 |
+
|
| 265 |
+
def forward(self, x):
|
| 266 |
+
"""
|
| 267 |
+
Args:
|
| 268 |
+
x: [M, H, K]
|
| 269 |
+
Returns:
|
| 270 |
+
result: [M, B, K]
|
| 271 |
+
"""
|
| 272 |
+
return self.net(x)
|
| 273 |
+
|
| 274 |
+
|
| 275 |
+
class Chomp1d(nn.Module):
|
| 276 |
+
"""To ensure the output length is the same as the input.
|
| 277 |
+
"""
|
| 278 |
+
|
| 279 |
+
def __init__(self, chomp_size):
|
| 280 |
+
super(Chomp1d, self).__init__()
|
| 281 |
+
self.chomp_size = chomp_size
|
| 282 |
+
|
| 283 |
+
def forward(self, x):
|
| 284 |
+
"""
|
| 285 |
+
Args:
|
| 286 |
+
x: [M, H, Kpad]
|
| 287 |
+
Returns:
|
| 288 |
+
[M, H, K]
|
| 289 |
+
"""
|
| 290 |
+
return x[:, :, :-self.chomp_size].contiguous()
|
| 291 |
+
|
| 292 |
+
|
| 293 |
+
def chose_norm(norm_type, channel_size):
|
| 294 |
+
"""The input of normlization will be (M, C, K), where M is batch size,
|
| 295 |
+
C is channel size and K is sequence length.
|
| 296 |
+
"""
|
| 297 |
+
if norm_type == "gLN":
|
| 298 |
+
return GlobalLayerNorm(channel_size)
|
| 299 |
+
elif norm_type == "cLN":
|
| 300 |
+
return ChannelwiseLayerNorm(channel_size)
|
| 301 |
+
else: # norm_type == "BN":
|
| 302 |
+
# Given input (M, C, K), nn.BatchNorm1d(C) will accumulate statics
|
| 303 |
+
# along M and K, so this BN usage is right.
|
| 304 |
+
return nn.BatchNorm1d(channel_size)
|
| 305 |
+
|
| 306 |
+
|
| 307 |
+
# TODO: Use nn.LayerNorm to impl cLN to speed up
|
| 308 |
+
class ChannelwiseLayerNorm(nn.Module):
|
| 309 |
+
"""Channel-wise Layer Normalization (cLN)"""
|
| 310 |
+
|
| 311 |
+
def __init__(self, channel_size):
|
| 312 |
+
super(ChannelwiseLayerNorm, self).__init__()
|
| 313 |
+
self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
|
| 314 |
+
self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
|
| 315 |
+
self.reset_parameters()
|
| 316 |
+
|
| 317 |
+
def reset_parameters(self):
|
| 318 |
+
self.gamma.data.fill_(1)
|
| 319 |
+
self.beta.data.zero_()
|
| 320 |
+
|
| 321 |
+
def forward(self, y):
|
| 322 |
+
"""
|
| 323 |
+
Args:
|
| 324 |
+
y: [M, N, K], M is batch size, N is channel size, K is length
|
| 325 |
+
Returns:
|
| 326 |
+
cLN_y: [M, N, K]
|
| 327 |
+
"""
|
| 328 |
+
mean = torch.mean(y, dim=1, keepdim=True) # [M, 1, K]
|
| 329 |
+
# var = torch.var(y, dim=1, keepdim=True, unbiased=False) # [M, 1, K]
|
| 330 |
+
var = (torch.pow(y - mean, 2)).mean(dim=1, keepdim=True)
|
| 331 |
+
|
| 332 |
+
cLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta
|
| 333 |
+
return cLN_y
|
| 334 |
+
|
| 335 |
+
|
| 336 |
+
class GlobalLayerNorm(nn.Module):
|
| 337 |
+
"""Global Layer Normalization (gLN)"""
|
| 338 |
+
|
| 339 |
+
def __init__(self, channel_size):
|
| 340 |
+
super(GlobalLayerNorm, self).__init__()
|
| 341 |
+
self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
|
| 342 |
+
self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
|
| 343 |
+
self.reset_parameters()
|
| 344 |
+
|
| 345 |
+
def reset_parameters(self):
|
| 346 |
+
self.gamma.data.fill_(1)
|
| 347 |
+
self.beta.data.zero_()
|
| 348 |
+
|
| 349 |
+
def forward(self, y):
|
| 350 |
+
"""
|
| 351 |
+
Args:
|
| 352 |
+
y: [M, N, K], M is batch size, N is channel size, K is length
|
| 353 |
+
Returns:
|
| 354 |
+
gLN_y: [M, N, K]
|
| 355 |
+
"""
|
| 356 |
+
# TODO: in torch 1.0, torch.mean() support dim list
|
| 357 |
+
mean = y.mean(dim=1, keepdim=True).mean(dim=2, keepdim=True) # [M, 1, 1]
|
| 358 |
+
var = (torch.pow(y - mean, 2)).mean(dim=1, keepdim=True).mean(dim=2, keepdim=True)
|
| 359 |
+
gLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta
|
| 360 |
+
return gLN_y
|
| 361 |
+
|
| 362 |
+
|
| 363 |
+
if __name__ == "__main__":
|
| 364 |
+
torch.manual_seed(123)
|
| 365 |
+
M, N, L, T = 2, 3, 4, 12
|
| 366 |
+
K = 2 * T // L - 1
|
| 367 |
+
B, H, P, X, R, C, norm_type, causal = 2, 3, 3, 3, 2, 2, "gLN", False
|
| 368 |
+
mixture = torch.randint(3, (M, T))
|
| 369 |
+
# test Encoder
|
| 370 |
+
encoder = Encoder(L, N)
|
| 371 |
+
encoder.conv1d_U.weight.data = torch.randint(2, encoder.conv1d_U.weight.size())
|
| 372 |
+
mixture_w = encoder(mixture)
|
| 373 |
+
print('mixture', mixture)
|
| 374 |
+
print('U', encoder.conv1d_U.weight)
|
| 375 |
+
print('mixture_w', mixture_w)
|
| 376 |
+
print('mixture_w size', mixture_w.size())
|
| 377 |
+
|
| 378 |
+
# test TemporalConvNet
|
| 379 |
+
separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type=norm_type, causal=causal)
|
| 380 |
+
est_mask = separator(mixture_w)
|
| 381 |
+
print('est_mask', est_mask)
|
| 382 |
+
|
| 383 |
+
# test Decoder
|
| 384 |
+
decoder = Decoder(N, L)
|
| 385 |
+
est_mask = torch.randint(2, (B, K, C, N))
|
| 386 |
+
est_source = decoder(mixture_w, est_mask)
|
| 387 |
+
print('est_source', est_source)
|
| 388 |
+
|
| 389 |
+
# test Conv-TasNet
|
| 390 |
+
conv_tasnet = ConvTasNet(N, L, B, H, P, X, R, C, norm_type=norm_type)
|
| 391 |
+
est_source = conv_tasnet(mixture)
|
| 392 |
+
print('est_source', est_source)
|
| 393 |
+
print('est_source size', est_source.size())
|
models/ConvTasNet-ONNX (broken)/source.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
https://github.com/onnx/onnx/issues/3067
|
| 2 |
+
https://github.com/pytorch/pytorch/issues/46898
|
| 3 |
+
https://github.com/pytorch/pytorch/issues/47182
|
| 4 |
+
https://drive.google.com/file/d/1we2YpPVWVlIPNTXT6N92x_lH6fTRTd4r/view?usp=sharing
|
| 5 |
+
https://drive.google.com/file/d/1-UEej2yIXsvZWmN-VYdHHwSeIrxrS4BQ/view?usp=sharing
|
models/ConvTasNet-ONNX/conv_tasnet.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:781e8fcef71fdf3589fcc44ae44601f21d51d5d85381cfdf77d435a8e6720745
|
| 3 |
+
size 35449169
|
models/ConvTasNet-ONNX/source.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
https://github.com/PINTO0309/onnx2tf/issues/447
|
| 2 |
+
https://drive.google.com/file/d/189UHTs9OvDiNBc6BiZDG5zde2zSyTe6E/view
|
models/ConvTasNet_DAMP-VSEP_enhboth/.gitattributes
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_DAMP-VSEP_enhboth/README.md
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- asteroid
|
| 4 |
+
- audio
|
| 5 |
+
- ConvTasNet
|
| 6 |
+
- audio-to-audio
|
| 7 |
+
datasets:
|
| 8 |
+
- DAMP-VSEP
|
| 9 |
+
license: cc-by-sa-4.0
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Asteroid model `groadabike/ConvTasNet_DAMP-VSEP_enhboth`
|
| 13 |
+
Imported from [Zenodo](https://zenodo.org/record/3994193)
|
| 14 |
+
|
| 15 |
+
### Description:
|
| 16 |
+
This model was trained by Gerardo Roa Dabike using Asteroid. It was trained on the enh_both task of the DAMP-VSEP dataset.
|
| 17 |
+
|
| 18 |
+
### Training config:
|
| 19 |
+
```yaml
|
| 20 |
+
data:
|
| 21 |
+
channels: 1
|
| 22 |
+
n_src: 2
|
| 23 |
+
root_path: data
|
| 24 |
+
sample_rate: 16000
|
| 25 |
+
samples_per_track: 10
|
| 26 |
+
segment: 3.0
|
| 27 |
+
task: enh_both
|
| 28 |
+
filterbank:
|
| 29 |
+
kernel_size: 20
|
| 30 |
+
n_filters: 256
|
| 31 |
+
stride: 10
|
| 32 |
+
main_args:
|
| 33 |
+
exp_dir: exp/train_convtasnet
|
| 34 |
+
help: None
|
| 35 |
+
masknet:
|
| 36 |
+
bn_chan: 256
|
| 37 |
+
conv_kernel_size: 3
|
| 38 |
+
hid_chan: 512
|
| 39 |
+
mask_act: relu
|
| 40 |
+
n_blocks: 8
|
| 41 |
+
n_repeats: 4
|
| 42 |
+
n_src: 2
|
| 43 |
+
norm_type: gLN
|
| 44 |
+
skip_chan: 256
|
| 45 |
+
optim:
|
| 46 |
+
lr: 0.0003
|
| 47 |
+
optimizer: adam
|
| 48 |
+
weight_decay: 0.0
|
| 49 |
+
positional arguments:
|
| 50 |
+
training:
|
| 51 |
+
batch_size: 12
|
| 52 |
+
early_stop: True
|
| 53 |
+
epochs: 50
|
| 54 |
+
half_lr: True
|
| 55 |
+
num_workers: 12
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
### Results:
|
| 59 |
+
```yaml
|
| 60 |
+
si_sdr: 14.018196157142519
|
| 61 |
+
si_sdr_imp: 14.017103133809577
|
| 62 |
+
sdr: 14.498517291333885
|
| 63 |
+
sdr_imp: 14.463389151567865
|
| 64 |
+
sir: 24.149634529133372
|
| 65 |
+
sir_imp: 24.11450638936735
|
| 66 |
+
sar: 15.338597389045935
|
| 67 |
+
sar_imp: -137.30634122401517
|
| 68 |
+
stoi: 0.7639416744417206
|
| 69 |
+
stoi_imp: 0.1843383526963759
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
### License notice:
|
| 73 |
+
This work "ConvTasNet_DAMP-VSEP_enhboth" is a derivative of DAMP-VSEP: Smule Digital Archive of Mobile Performances - Vocal Separation (Version 1.0.1) by Smule, Inc, used under Smule's Research Data License Agreement (Research only). "ConvTasNet_DAMP-VSEP_enhboth" is licensed under Attribution-ShareAlike 3.0 Unported by Gerardo Roa Dabike.
|
models/ConvTasNet_DAMP-VSEP_enhboth/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8519e8658572f0d3a5e07002849337cb0ff07dcf3b3a641244e0905ceb0adc44
|
| 3 |
+
size 51990656
|
models/ConvTasNet_DAMP-VSEP_enhboth/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/groadabike/ConvTasNet_DAMP-VSEP_enhboth
|
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/.gitattributes
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/README.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- asteroid
|
| 4 |
+
- audio
|
| 5 |
+
- ConvTasNet
|
| 6 |
+
- audio-to-audio
|
| 7 |
+
datasets:
|
| 8 |
+
- DAMP-VSEP
|
| 9 |
+
- Singing/Accompaniment Separation
|
| 10 |
+
license: cc-by-sa-4.0
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
## Description:
|
| 15 |
+
This model was trained by Gerardo Roa using the dampvsep recipe in Asteroid.
|
| 16 |
+
It was trained on the `singing/accompaniment` task of the `DAMP-VSEP` dataset.
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
## Training config:
|
| 20 |
+
```yaml
|
| 21 |
+
data:
|
| 22 |
+
channels: 1
|
| 23 |
+
emb_model: 'no'
|
| 24 |
+
metadata_path: metadata
|
| 25 |
+
mixture: remix
|
| 26 |
+
root_path: /fastdata/acp13gr/DAMP/DAMP-VSEP
|
| 27 |
+
sample_rate: 16000
|
| 28 |
+
train_set: english_nonenglish
|
| 29 |
+
filterbank:
|
| 30 |
+
kernel_size: 20
|
| 31 |
+
n_filters: 256
|
| 32 |
+
stride: 10
|
| 33 |
+
main_args:
|
| 34 |
+
exp_dir: exp/train_convtasnet_remix-no-0.0-english_nonenglish-0.0005-jade
|
| 35 |
+
help: null
|
| 36 |
+
masknet:
|
| 37 |
+
bn_chan: 256
|
| 38 |
+
conv_kernel_size: 3
|
| 39 |
+
hid_chan: 512
|
| 40 |
+
mask_act: relu
|
| 41 |
+
n_blocks: 10
|
| 42 |
+
n_repeats: 4
|
| 43 |
+
n_src: 2
|
| 44 |
+
norm_type: gLN
|
| 45 |
+
skip_chan: 256
|
| 46 |
+
optim:
|
| 47 |
+
lr: 0.0005
|
| 48 |
+
optimizer: adam
|
| 49 |
+
weight_decay: 0.0
|
| 50 |
+
positional arguments: {}
|
| 51 |
+
training:
|
| 52 |
+
batch_size: 7
|
| 53 |
+
early_stop: true
|
| 54 |
+
epochs: 50
|
| 55 |
+
half_lr: true
|
| 56 |
+
loss_alpha: 0.0
|
| 57 |
+
num_workers: 10
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
## Results:
|
| 62 |
+
```yaml
|
| 63 |
+
"si_sdr": 15.111802516750586,
|
| 64 |
+
"si_sdr_imp": 15.178209807687663,
|
| 65 |
+
"si_sdr_s0": 12.160261214703553,
|
| 66 |
+
"si_sdr_s0_imp": 17.434593619085675,
|
| 67 |
+
"si_sdr_s1": 18.063343818797623,
|
| 68 |
+
"si_sdr_s1_imp": 12.92182599628965,
|
| 69 |
+
"sdr": 15.959722569460281,
|
| 70 |
+
"sdr_imp": 14.927002467087567,
|
| 71 |
+
"sdr_s0": 13.270412028426595,
|
| 72 |
+
"sdr_s0_imp": 16.45867572657551,
|
| 73 |
+
"sdr_s1": 18.64903311049397,
|
| 74 |
+
"sdr_s1_imp": 13.39532920759962,
|
| 75 |
+
"sir": 23.935932341084754,
|
| 76 |
+
"sir_imp": 22.903212238712012,
|
| 77 |
+
"sir_s0": 22.30777879911744,
|
| 78 |
+
"sir_s0_imp": 25.49604249726635,
|
| 79 |
+
"sir_s1": 25.56408588305207,
|
| 80 |
+
"sir_s1_imp": 20.310381980157665,
|
| 81 |
+
"sar": 17.174899162445882,
|
| 82 |
+
"sar_imp": -134.47377304178818,
|
| 83 |
+
"sar_s0": 14.268071153965913,
|
| 84 |
+
"sar_s0_imp": -137.38060105026818,
|
| 85 |
+
"sar_s1": 20.081727170925856,
|
| 86 |
+
"sar_s1_imp": -131.56694503330817,
|
| 87 |
+
"stoi": 0.7746496376326059,
|
| 88 |
+
"stoi_imp": 0.19613735629114643,
|
| 89 |
+
"stoi_s0": 0.6611376621212413,
|
| 90 |
+
"stoi_s0_imp": 0.21162695175464794,
|
| 91 |
+
"stoi_s1": 0.8881616131439705,
|
| 92 |
+
"stoi_s1_imp": 0.1806477608276449
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
## License notice:
|
| 97 |
+
|
| 98 |
+
** This is important, please fill it, if you need help, you can ask on Asteroid's slack.**
|
| 99 |
+
|
| 100 |
+
This work "ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline"
|
| 101 |
+
is a derivative of [DAMP-VSEP corpus](https://zenodo.org/record/3553059) by
|
| 102 |
+
[Smule, Inc](https://www.smule.com/),
|
| 103 |
+
used under [Restricted License](https://zenodo.org/record/3553059)(Research only).
|
| 104 |
+
"ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline"
|
| 105 |
+
is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/)
|
| 106 |
+
by Gerardo Roa.
|
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f77ed26005b8cc6d9b6ca4f313e252b4b80b17378a0097c47eb60811708b75b0
|
| 3 |
+
size 64766287
|
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/groadabike/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline
|
models/ConvTasNet_Libri1Mix_enhsignle_16k/.gitattributes
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_Libri1Mix_enhsignle_16k/metadata.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"sample_rate": 16000, "domain_tags": ["speech"], "short_description": "Use me for speech enhancement! Works with 1 speaker.", "long_description": "This model was trained by Joris Cosentino using the librimix recipe in Asteroid. It was trained on the enh_single task of the Libri1Mix dataset.", "tags": ["speech enhancement", "speech"], "labels": ["enhanced"], "effect_type": "waveform-to-waveform", "multichannel": false}
|
models/ConvTasNet_Libri1Mix_enhsignle_16k/model.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ee430a56e84cf617044cd986d56a26800b96278d618ffc738b6e81f8eff6a88d
|
| 3 |
+
size 20500235
|
models/ConvTasNet_Libri1Mix_enhsignle_16k/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/hugggof/ConvTasNet_Libri1Mix_enhsignle_16k
|
models/ConvTasNet_Libri1Mix_enhsingle_8k/.gitattributes
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_Libri1Mix_enhsingle_8k/README.md
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- asteroid
|
| 4 |
+
- audio
|
| 5 |
+
- ConvTasNet
|
| 6 |
+
datasets:
|
| 7 |
+
- LibriMix
|
| 8 |
+
- enh_single
|
| 9 |
+
license: cc-by-sa-4.0
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Asteroid model
|
| 13 |
+
Imported from this Zenodo [model page](https://zenodo.org/record/3970768).
|
| 14 |
+
|
| 15 |
+
## Description:
|
| 16 |
+
This model was trained by Brij Mohan using the Librimix/ConvTasNet recipe in Asteroid.
|
| 17 |
+
It was trained on the `enh_single` task of the Libri3Mix dataset.
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
## Training config:
|
| 21 |
+
```yaml
|
| 22 |
+
data:
|
| 23 |
+
n_src: 1
|
| 24 |
+
sample_rate: 8000
|
| 25 |
+
segment: 3
|
| 26 |
+
task: enh_single
|
| 27 |
+
train_dir: data/wav8k/min/train-360
|
| 28 |
+
valid_dir: data/wav8k/min/dev
|
| 29 |
+
filterbank:
|
| 30 |
+
kernel_size: 16
|
| 31 |
+
n_filters: 512
|
| 32 |
+
stride: 8
|
| 33 |
+
masknet:
|
| 34 |
+
bn_chan: 128
|
| 35 |
+
hid_chan: 512
|
| 36 |
+
mask_act: relu
|
| 37 |
+
n_blocks: 8
|
| 38 |
+
n_repeats: 3
|
| 39 |
+
n_src: 1
|
| 40 |
+
skip_chan: 128
|
| 41 |
+
optim:
|
| 42 |
+
lr: 0.001
|
| 43 |
+
optimizer: adam
|
| 44 |
+
weight_decay: 0.0
|
| 45 |
+
training:
|
| 46 |
+
batch_size: 24
|
| 47 |
+
early_stop: True
|
| 48 |
+
epochs: 200
|
| 49 |
+
half_lr: True
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
## Results:
|
| 54 |
+
```yaml
|
| 55 |
+
si_sdr: 14.783675142685572
|
| 56 |
+
si_sdr_imp: 11.464625198953202
|
| 57 |
+
sdr: 15.497505907983102
|
| 58 |
+
sdr_imp: 12.07230150154914
|
| 59 |
+
sar: 15.497505907983102
|
| 60 |
+
sar_imp: 12.07230150154914
|
| 61 |
+
stoi: 0.9270030254700518
|
| 62 |
+
stoi_imp: 0.1320547197597893
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
## License notice:
|
| 67 |
+
This work "ConvTasNet_Libri1Mix_enhsingle_8k"
|
| 68 |
+
is a derivative of [LibriSpeech ASR corpus](http://www.openslr.org/12) by
|
| 69 |
+
[Vassil Panayotov](https://github.com/vdp),
|
| 70 |
+
used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
|
| 71 |
+
"ConvTasNet_Libri1Mix_enhsingle_8k"
|
| 72 |
+
is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/)
|
| 73 |
+
by Manuel Pariente.
|
models/ConvTasNet_Libri1Mix_enhsingle_8k/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2c82d07cfb842778c26eeed222b313fcd9ae2776ce038a26f22dba0b700e597c
|
| 3 |
+
size 20063674
|
models/ConvTasNet_Libri1Mix_enhsingle_8k/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/mpariente/ConvTasNet_Libri1Mix_enhsingle_8k
|
models/ConvTasNet_Libri2Mix_SepClean/.gitattributes
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_Libri2Mix_SepClean/README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: asteroid
|
| 6 |
+
tags:
|
| 7 |
+
- speech separation
|
| 8 |
+
- audio processing
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Model Card for model.bin
|
| 12 |
+
|
| 13 |
+
<!-- Provide a quick summary of what the model is/does. [Optional] -->
|
| 14 |
+
This model was trained by Dhruv Saini using the libri2mix sep_clean dataset.
|
| 15 |
+
|
| 16 |
+
# Model Details
|
| 17 |
+
|
| 18 |
+
It is a ConvTasNet model for 2 speakers' speech separation.
|
| 19 |
+
|
| 20 |
+
## Model Description
|
| 21 |
+
|
| 22 |
+
<!-- Provide a longer summary of what this model is/does. -->
|
| 23 |
+
This model was trained by Dhruv Saini using the libri2mix sep_clean dataset.
|
| 24 |
+
|
| 25 |
+
- **Developed by:** Dhruv Saini
|
models/ConvTasNet_Libri2Mix_SepClean/model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b6e58d1cec93826da50c883f0c31edc1f95557f0312cd6448ef110049e12bed6
|
| 3 |
+
size 20410329
|
models/ConvTasNet_Libri2Mix_SepClean/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/Dhruv73/ConvTasNet_Libri2Mix_SepClean
|
models/ConvTasNet_Libri2Mix_sepclean_16k/.gitattributes
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_Libri2Mix_sepclean_16k/README.md
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- asteroid
|
| 4 |
+
- audio
|
| 5 |
+
- ConvTasNet
|
| 6 |
+
- audio-to-audio
|
| 7 |
+
datasets:
|
| 8 |
+
- Libri2Mix
|
| 9 |
+
- sep_clean
|
| 10 |
+
license: cc-by-sa-4.0
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Asteroid model `JorisCos/ConvTasNet_Libri2Mix_sepclean_16k`
|
| 14 |
+
|
| 15 |
+
Description:
|
| 16 |
+
|
| 17 |
+
This model was trained by Joris Cosentino using the librimix recipe in [Asteroid](https://github.com/asteroid-team/asteroid).
|
| 18 |
+
It was trained on the `sep_clean` task of the Libri2Mix dataset.
|
| 19 |
+
|
| 20 |
+
Training config:
|
| 21 |
+
```yaml
|
| 22 |
+
data:
|
| 23 |
+
n_src: 2
|
| 24 |
+
sample_rate: 16000
|
| 25 |
+
segment: 3
|
| 26 |
+
task: sep_clean
|
| 27 |
+
train_dir: data/wav16k/min/train-360
|
| 28 |
+
valid_dir: data/wav16k/min/dev
|
| 29 |
+
filterbank:
|
| 30 |
+
kernel_size: 32
|
| 31 |
+
n_filters: 512
|
| 32 |
+
stride: 16
|
| 33 |
+
masknet:
|
| 34 |
+
bn_chan: 128
|
| 35 |
+
hid_chan: 512
|
| 36 |
+
mask_act: relu
|
| 37 |
+
n_blocks: 8
|
| 38 |
+
n_repeats: 3
|
| 39 |
+
skip_chan: 128
|
| 40 |
+
optim:
|
| 41 |
+
lr: 0.001
|
| 42 |
+
optimizer: adam
|
| 43 |
+
weight_decay: 0.0
|
| 44 |
+
training:
|
| 45 |
+
batch_size: 6
|
| 46 |
+
early_stop: true
|
| 47 |
+
epochs: 200
|
| 48 |
+
half_lr: true
|
| 49 |
+
num_workers: 4
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
Results :
|
| 54 |
+
|
| 55 |
+
On Libri2Mix min test set :
|
| 56 |
+
```yaml
|
| 57 |
+
si_sdr: 15.243671356901526
|
| 58 |
+
si_sdr_imp: 15.243034178473609
|
| 59 |
+
sdr: 15.668108919568112
|
| 60 |
+
sdr_imp: 15.578229918028036
|
| 61 |
+
sir: 25.295100756629957
|
| 62 |
+
sir_imp: 25.205219921301754
|
| 63 |
+
sar: 16.307682590197313
|
| 64 |
+
sar_imp: -51.64989963759405
|
| 65 |
+
stoi: 0.9394951175291422
|
| 66 |
+
stoi_imp: 0.22640192740016568
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
License notice:
|
| 70 |
+
|
| 71 |
+
This work "ConvTasNet_Libri2Mix_sepclean_16k"
|
| 72 |
+
is a derivative of [LibriSpeech ASR corpus](http://www.openslr.org/12) by Vassil Panayotov,
|
| 73 |
+
used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). "ConvTasNet_Libri2Mix_sepclean_16k"
|
| 74 |
+
is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Cosentino Joris.
|
models/ConvTasNet_Libri2Mix_sepclean_16k/pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d97f012f7b2f22bb79cb0d0983a7ba27a52c1796ee3f63cbf25b4d28630adce
|
| 3 |
+
size 20394640
|
models/ConvTasNet_Libri2Mix_sepclean_16k/source.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepclean_16k
|
models/ConvTasNet_Libri2Mix_sepclean_8k/.gitattributes
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
|
models/ConvTasNet_Libri2Mix_sepclean_8k/README.md
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- asteroid
|
| 4 |
+
- audio
|
| 5 |
+
- ConvTasNet
|
| 6 |
+
- audio-to-audio
|
| 7 |
+
datasets:
|
| 8 |
+
- Libri2Mix
|
| 9 |
+
- sep_clean
|
| 10 |
+
license: cc-by-sa-4.0
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Asteroid model `JorisCos/ConvTasNet_Libri2Mix_sepclean_8k`
|
| 14 |
+
Imported from [Zenodo](https://zenodo.org/record/3873572#.X9M69cLjJH4)
|
| 15 |
+
|
| 16 |
+
Description:
|
| 17 |
+
|
| 18 |
+
This model was trained by Joris Cosentino using the librimix recipe in [Asteroid](https://github.com/asteroid-team/asteroid).
|
| 19 |
+
It was trained on the `sep_clean` task of the Libri2Mix dataset.
|
| 20 |
+
|
| 21 |
+
Training config:
|
| 22 |
+
```yaml
|
| 23 |
+
data:
|
| 24 |
+
n_src: 2
|
| 25 |
+
sample_rate: 8000
|
| 26 |
+
segment: 3
|
| 27 |
+
task: sep_clean
|
| 28 |
+
train_dir: data/wav8k/min/train-360
|
| 29 |
+
valid_dir: data/wav8k/min/dev
|
| 30 |
+
filterbank:
|
| 31 |
+
kernel_size: 16
|
| 32 |
+
n_filters: 512
|
| 33 |
+
stride: 8
|
| 34 |
+
masknet:
|
| 35 |
+
bn_chan: 128
|
| 36 |
+
hid_chan: 512
|
| 37 |
+
mask_act: relu
|
| 38 |
+
n_blocks: 8
|
| 39 |
+
n_repeats: 3
|
| 40 |
+
skip_chan: 128
|
| 41 |
+
optim:
|
| 42 |
+
lr: 0.001
|
| 43 |
+
optimizer: adam
|
| 44 |
+
weight_decay: 0.0
|
| 45 |
+
training:
|
| 46 |
+
batch_size: 24
|
| 47 |
+
early_stop: True
|
| 48 |
+
epochs: 200
|
| 49 |
+
half_lr: True
|
| 50 |
+
num_workers: 2
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
Results :
|
| 55 |
+
|
| 56 |
+
On Libri2Mix min test set :
|
| 57 |
+
```yaml
|
| 58 |
+
si_sdr: 14.764543634468069
|
| 59 |
+
si_sdr_imp: 14.764029375607246
|
| 60 |
+
sdr: 15.29337970745095
|
| 61 |
+
sdr_imp: 15.114146605113111
|
| 62 |
+
sir: 24.092904661115366
|
| 63 |
+
sir_imp: 23.913669683141528
|
| 64 |
+
sar: 16.06055906916849
|
| 65 |
+
sar_imp: -51.980784441287454
|
| 66 |
+
stoi: 0.9311142440593033
|
| 67 |
+
stoi_imp: 0.21817376142710482
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
License notice:
|
| 71 |
+
|
| 72 |
+
This work "ConvTasNet_Libri2Mix_sepclean_8k"
|
| 73 |
+
is a derivative of [LibriSpeech ASR corpus](http://www.openslr.org/12) by Vassil Panayotov,
|
| 74 |
+
used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). "ConvTasNet_Libri2Mix_sepclean_8k"
|
| 75 |
+
is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Cosentino Joris.
|