niobures commited on
Commit
658500d
·
verified ·
1 Parent(s): 0c64b11

ConvTasNet (code, models, paper)

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. An empirical study of Conv-TasNet.pdf +3 -0
  3. Conv-TasNet. Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.pdf +3 -0
  4. code/Conv-TasNet [jwr1995] +2 -1 original-model +74 -1 multichan +68 -1.zip +3 -0
  5. code/Conv-TasNet [nobel861017] +16 -1 DEMO.zip +3 -0
  6. code/Conv-TasNet [perottievan] +10.zip +3 -0
  7. code/Conv-TasNet [yoshonabee] +8 -1.zip +3 -0
  8. code/Conv-TasNet [zhenhaoge] +2.zip +3 -0
  9. code/Conv-TasNet.zip +3 -0
  10. code/Forked-Conv-TasNet [OfekCohen1] +8 -1.zip +3 -0
  11. models/ConvTasNet-DAMP-Vocals/.gitattributes +16 -0
  12. models/ConvTasNet-DAMP-Vocals/README.md +69 -0
  13. models/ConvTasNet-DAMP-Vocals/metadata.json +1 -0
  14. models/ConvTasNet-DAMP-Vocals/model.pt +3 -0
  15. models/ConvTasNet-DAMP-Vocals/source.txt +1 -0
  16. models/ConvTasNet-IF-Itera-SepNoisy8k-FT/.gitattributes +35 -0
  17. models/ConvTasNet-IF-Itera-SepNoisy8k-FT/ConvTasNet-IF-Itera-SepNoisy8k-FT.pth +3 -0
  18. models/ConvTasNet-IF-Itera-SepNoisy8k-FT/README.md +75 -0
  19. models/ConvTasNet-IF-Itera-SepNoisy8k-FT/source.txt +1 -0
  20. models/ConvTasNet-ONNX (broken)/conv_tasnet.onnx +3 -0
  21. models/ConvTasNet-ONNX (broken)/conv_tasnet.py +393 -0
  22. models/ConvTasNet-ONNX (broken)/source.txt +5 -0
  23. models/ConvTasNet-ONNX/conv_tasnet.onnx +3 -0
  24. models/ConvTasNet-ONNX/source.txt +2 -0
  25. models/ConvTasNet_DAMP-VSEP_enhboth/.gitattributes +16 -0
  26. models/ConvTasNet_DAMP-VSEP_enhboth/README.md +73 -0
  27. models/ConvTasNet_DAMP-VSEP_enhboth/pytorch_model.bin +3 -0
  28. models/ConvTasNet_DAMP-VSEP_enhboth/source.txt +1 -0
  29. models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/.gitattributes +27 -0
  30. models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/README.md +106 -0
  31. models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/pytorch_model.bin +3 -0
  32. models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/source.txt +1 -0
  33. models/ConvTasNet_Libri1Mix_enhsignle_16k/.gitattributes +17 -0
  34. models/ConvTasNet_Libri1Mix_enhsignle_16k/metadata.json +1 -0
  35. models/ConvTasNet_Libri1Mix_enhsignle_16k/model.pt +3 -0
  36. models/ConvTasNet_Libri1Mix_enhsignle_16k/source.txt +1 -0
  37. models/ConvTasNet_Libri1Mix_enhsingle_8k/.gitattributes +16 -0
  38. models/ConvTasNet_Libri1Mix_enhsingle_8k/README.md +73 -0
  39. models/ConvTasNet_Libri1Mix_enhsingle_8k/pytorch_model.bin +3 -0
  40. models/ConvTasNet_Libri1Mix_enhsingle_8k/source.txt +1 -0
  41. models/ConvTasNet_Libri2Mix_SepClean/.gitattributes +34 -0
  42. models/ConvTasNet_Libri2Mix_SepClean/README.md +25 -0
  43. models/ConvTasNet_Libri2Mix_SepClean/model.bin +3 -0
  44. models/ConvTasNet_Libri2Mix_SepClean/source.txt +1 -0
  45. models/ConvTasNet_Libri2Mix_sepclean_16k/.gitattributes +9 -0
  46. models/ConvTasNet_Libri2Mix_sepclean_16k/README.md +74 -0
  47. models/ConvTasNet_Libri2Mix_sepclean_16k/pytorch_model.bin +3 -0
  48. models/ConvTasNet_Libri2Mix_sepclean_16k/source.txt +1 -0
  49. models/ConvTasNet_Libri2Mix_sepclean_8k/.gitattributes +9 -0
  50. models/ConvTasNet_Libri2Mix_sepclean_8k/README.md +75 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ An[[:space:]]empirical[[:space:]]study[[:space:]]of[[:space:]]Conv-TasNet.pdf filter=lfs diff=lfs merge=lfs -text
37
+ Conv-TasNet.[[:space:]]Surpassing[[:space:]]Ideal[[:space:]]Time-Frequency[[:space:]]Magnitude[[:space:]]Masking[[:space:]]for[[:space:]]Speech[[:space:]]Separation.pdf filter=lfs diff=lfs merge=lfs -text
An empirical study of Conv-TasNet.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2eaf57ff260e0e5f6b7c91ce84666dcdbda886639df6f759c85b5a4dbb6cfa99
3
+ size 2052626
Conv-TasNet. Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27983424312ccfe350faa0cafbeff880a5b7ac165ecc74114b8f370eed20e9ce
3
+ size 1121503
code/Conv-TasNet [jwr1995] +2 -1 original-model +74 -1 multichan +68 -1.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60f7e7dcf59d3c2c4a2e18bba2241b9e993690baf5f38c8e02121b38d0db8ad3
3
+ size 2722654
code/Conv-TasNet [nobel861017] +16 -1 DEMO.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c602cf74f1f83271458f77e4d080aab50b7ff565fa7cf36e622751084e39413e
3
+ size 119439140
code/Conv-TasNet [perottievan] +10.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05e9da2c9ae3f630622f6f92a34bf0f7a55e4046813d3f81e75cf7f711b82df0
3
+ size 2635035
code/Conv-TasNet [yoshonabee] +8 -1.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e940a912feffdb915a3159598abec64eda6ba9163040822654d4d902534315f
3
+ size 2574608
code/Conv-TasNet [zhenhaoge] +2.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f79df4cd604fb52a6d9293daedbb7a6128d15a467891b85cd29a081b44a74ee
3
+ size 2683582
code/Conv-TasNet.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e81ade8475daa75010b757bbe075b3621c35fee4aa3bf29fa3cfb45473df038f
3
+ size 2620033
code/Forked-Conv-TasNet [OfekCohen1] +8 -1.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:347cd40cccd86f013501f1197f8505f75247fdbf2a2443aff8621ae6cf4ed31f
3
+ size 2439577
models/ConvTasNet-DAMP-Vocals/.gitattributes ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.arrow filter=lfs diff=lfs merge=lfs -text
10
+ *.ftz filter=lfs diff=lfs merge=lfs -text
11
+ *.joblib filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.pb filter=lfs diff=lfs merge=lfs -text
15
+ *.pt filter=lfs diff=lfs merge=lfs -text
16
+ *.pth filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet-DAMP-Vocals/README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - audacity
4
+ inference: false
5
+ sample_rate: 8000
6
+
7
+ ---
8
+
9
+ This is an Audacity wrapper for the model, forked from the repository `groadabike/ConvTasNet_DAMP-VSEP_enhboth`,
10
+ This model was trained using the Asteroid library: https://github.com/asteroid-team/asteroid.
11
+
12
+ The following info was copied directly from `groadabike/ConvTasNet_DAMP-VSEP_enhboth`:
13
+
14
+ ### Description:
15
+ This model was trained by Gerardo Roa Dabike using Asteroid. It was trained on the enh_both task of the DAMP-VSEP dataset.
16
+ ### Training config:
17
+ ```yaml
18
+ data:
19
+ channels: 1
20
+ n_src: 2
21
+ root_path: data
22
+ sample_rate: 16000
23
+ samples_per_track: 10
24
+ segment: 3.0
25
+ task: enh_both
26
+ filterbank:
27
+ kernel_size: 20
28
+ n_filters: 256
29
+ stride: 10
30
+ main_args:
31
+ exp_dir: exp/train_convtasnet
32
+ help: None
33
+ masknet:
34
+ bn_chan: 256
35
+ conv_kernel_size: 3
36
+ hid_chan: 512
37
+ mask_act: relu
38
+ n_blocks: 8
39
+ n_repeats: 4
40
+ n_src: 2
41
+ norm_type: gLN
42
+ skip_chan: 256
43
+ optim:
44
+ lr: 0.0003
45
+ optimizer: adam
46
+ weight_decay: 0.0
47
+ positional arguments:
48
+ training:
49
+ batch_size: 12
50
+ early_stop: True
51
+ epochs: 50
52
+ half_lr: True
53
+ num_workers: 12
54
+ ```
55
+ ### Results:
56
+ ```yaml
57
+ si_sdr: 14.018196157142519
58
+ si_sdr_imp: 14.017103133809577
59
+ sdr: 14.498517291333885
60
+ sdr_imp: 14.463389151567865
61
+ sir: 24.149634529133372
62
+ sir_imp: 24.11450638936735
63
+ sar: 15.338597389045935
64
+ sar_imp: -137.30634122401517
65
+ stoi: 0.7639416744417206
66
+ stoi_imp: 0.1843383526963759
67
+ ```
68
+ ### License notice:
69
+ This work "ConvTasNet_DAMP-VSEP_enhboth" is a derivative of DAMP-VSEP: Smule Digital Archive of Mobile Performances - Vocal Separation (Version 1.0.1) by Smule, Inc, used under Smule's Research Data License Agreement (Research only). "ConvTasNet_DAMP-VSEP_enhboth" is licensed under Attribution-ShareAlike 3.0 Unported by Gerardo Roa Dabike.
models/ConvTasNet-DAMP-Vocals/metadata.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"sample_rate": 8000, "domain_tags": ["music"], "tags": ["vocals separation"], "effect_type": "waveform-to-waveform", "multichannel": false, "labels": ["source-0", "source-1"], "short_description": "Use me for separating vocals from music!", "long_description": "Instant karaoke! A vocals separation model, trained on the DAMP dataset. Forked from groadabike/ConvTasNet_DAMP-VSEP_enhboth. Trained using Asteroid."}
models/ConvTasNet-DAMP-Vocals/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2304be278f6d81ca09b27fc8d68359fa66fb7d2d46b65a8da38fb854fceb2648
3
+ size 52373994
models/ConvTasNet-DAMP-Vocals/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/hugggof/ConvTasNet-DAMP-Vocals
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/ConvTasNet-IF-Itera-SepNoisy8k-FT.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62d4bd9e929d95b7b407778c03b29b8dc6fee87d826ab4bea18aba5f9ac406bd
3
+ size 20273170
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - id
5
+ - en
6
+ library_name: pytorch
7
+ tags:
8
+ - audio-source-separation
9
+ - speech-separation
10
+ - convtasnet
11
+ - asteroid
12
+ - itera
13
+ datasets:
14
+ - librimix
15
+ - custom-indonesian-noisy-speech
16
+ metrics:
17
+ - si-sdr
18
+ base_model: JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k
19
+ pipeline_tag: audio-to-audio
20
+ ---
21
+
22
+ ## Fine-tuned model: [FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT](https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT)
23
+
24
+ Model ini adalah versi *fine-tuned* dari [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k).
25
+
26
+ ### Description:
27
+ Model ini di-*fine-tuning* oleh peneliti dari **Teknik Informatika, Institut Teknologi Sumatera (ITERA)**. Proses *fine-tuning* menggunakan skrip yang tersedia di [repositori GitHub proyek](https://github.com/fransiskus-121140010/itera-informatics-convtasnet-ft). Model dilatih pada dataset *custom* yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam *noise*.
28
+
29
+ ### Fine-tuning config:
30
+ ```yaml
31
+ # Konfigurasi yang digunakan selama fine-tuning
32
+ data:
33
+ root: "data/processed/"
34
+ sample_rate: 8000
35
+ segment_seconds: 4
36
+ num_workers: 4
37
+
38
+ training:
39
+ project_name: "itera-speech-separation-ft"
40
+ model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training
41
+ epochs: 50
42
+ batch_size: 8
43
+ learning_rate: 0.0005
44
+ gradient_clip_val: 0.5
45
+ precision: "16-mixed"
46
+ early_stopping_patience: 5
47
+
48
+ model:
49
+ freeze_encoder_decoder: false
50
+
51
+ remix:
52
+ dynamic: true
53
+ snr_low: 0.0
54
+ snr_high: 10.0
55
+ ```
56
+
57
+ ## Results
58
+
59
+ Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut:
60
+ ```yaml
61
+ si_sdr:
62
+ baseline_score: -30.2842
63
+ fine_tuned_score: -24.9016
64
+ improvement: +5.3826
65
+ ```
66
+
67
+ ### License Notice
68
+
69
+ This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). The original work is a derivative of:
70
+ > * [LibriSpeech ASR corpus](https://www.openslr.org/12) by Vassil Panayotov, used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/);
71
+ > * The WSJ0 Hipster Ambient Mixtures dataset by [Whisper.ai](https://whisper.ai/), used under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
72
+ >
73
+ > The original work is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Joris Cosentino.
74
+
75
+ This derivative work is licensed under the **[MIT License](https://opensource.org/licenses/MIT)** by the project authors at Institut Teknologi Sumatera.
models/ConvTasNet-IF-Itera-SepNoisy8k-FT/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT
models/ConvTasNet-ONNX (broken)/conv_tasnet.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a48dec63f5c8691482d8cd5560fa1ba7b3d449fff378fa8085fc66012186a6c
3
+ size 35618928
models/ConvTasNet-ONNX (broken)/conv_tasnet.py ADDED
@@ -0,0 +1,393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torch.nn.functional as F
4
+
5
+ from signal_processors.conv_tasnet.utils import overlap_and_add
6
+
7
+ EPS = 1e-8
8
+
9
+
10
+ class ConvTasNet(nn.Module):
11
+ def __init__(self, N, L, B, H, P, X, R, C, norm_type="gLN", causal=False,
12
+ mask_nonlinear='relu'):
13
+ """
14
+ Args:
15
+ N: Number of filters in autoencoder
16
+ L: Length of the filters (in samples)
17
+ B: Number of channels in bottleneck 1 × 1-conv block
18
+ H: Number of channels in convolutional blocks
19
+ P: Kernel size in convolutional blocks
20
+ X: Number of convolutional blocks in each repeat
21
+ R: Number of repeats
22
+ C: Number of speakers
23
+ norm_type: BN, gLN, cLN
24
+ causal: causal or non-causal
25
+ mask_nonlinear: use which non-linear function to generate mask
26
+ """
27
+ super(ConvTasNet, self).__init__()
28
+ # Hyper-parameter
29
+ self.N, self.L, self.B, self.H, self.P, self.X, self.R, self.C = N, L, B, H, P, X, R, C
30
+ self.norm_type = norm_type
31
+ self.causal = causal
32
+ self.mask_nonlinear = mask_nonlinear
33
+ # Components
34
+ self.encoder = Encoder(L, N)
35
+ self.separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type, causal, mask_nonlinear)
36
+ self.decoder = Decoder(N, L)
37
+ # init
38
+ for p in self.parameters():
39
+ if p.dim() > 1:
40
+ nn.init.xavier_normal_(p)
41
+
42
+ def forward(self, mixture):
43
+ """
44
+ Args:
45
+ mixture: [M, T], M is batch size, T is #samples
46
+ Returns:
47
+ est_source: [M, C, T]
48
+ """
49
+ mixture_w = self.encoder(mixture)
50
+ est_mask = self.separator(mixture_w)
51
+ est_source = self.decoder(mixture_w, est_mask)
52
+
53
+ # T changed after conv1d in encoder, fix it here
54
+ # T_origin = mixture.size(-1)
55
+ # T_conv = est_source.size(-1)
56
+ T_origin = torch.tensor(88200)
57
+ T_conv = torch.tensor(88200)
58
+ est_source = F.pad(est_source, (0, T_origin - T_conv))
59
+ return est_source
60
+
61
+ @classmethod
62
+ def load_model(cls, path):
63
+ # Load to CPU
64
+ package = torch.load(path, map_location=lambda storage, loc: storage)
65
+ model = cls.load_model_from_package(package)
66
+ return model
67
+
68
+ @classmethod
69
+ def load_model_from_package(cls, package):
70
+ model = cls(package['N'], package['L'], package['B'], package['H'],
71
+ package['P'], package['X'], package['R'], package['C'],
72
+ norm_type=package['norm_type'], causal=package['causal'],
73
+ mask_nonlinear=package['mask_nonlinear'])
74
+ model.load_state_dict(package['state_dict'])
75
+ return model
76
+
77
+ @staticmethod
78
+ def serialize(model, optimizer, epoch, tr_loss=None, cv_loss=None):
79
+ package = {
80
+ # hyper-parameter
81
+ 'N': model.N, 'L': model.L, 'B': model.B, 'H': model.H,
82
+ 'P': model.P, 'X': model.X, 'R': model.R, 'C': model.C,
83
+ 'norm_type': model.norm_type, 'causal': model.causal,
84
+ 'mask_nonlinear': model.mask_nonlinear,
85
+ # state
86
+ 'state_dict': model.state_dict(),
87
+ 'optim_dict': optimizer.state_dict(),
88
+ 'epoch': epoch
89
+ }
90
+ if tr_loss is not None:
91
+ package['tr_loss'] = tr_loss
92
+ package['cv_loss'] = cv_loss
93
+ return package
94
+
95
+
96
+ class Encoder(nn.Module):
97
+ """Estimation of the nonnegative mixture weight by a 1-D conv layer.
98
+ """
99
+
100
+ def __init__(self, L, N):
101
+ super(Encoder, self).__init__()
102
+ # Hyper-parameter
103
+ self.L, self.N = L, N
104
+ # Components
105
+ # 50% overlap
106
+ self.conv1d_U = nn.Conv1d(1, N, kernel_size=L, stride=L // 2, bias=False)
107
+
108
+ def forward(self, mixture):
109
+ """
110
+ Args:
111
+ mixture: [M, T], M is batch size, T is #samples
112
+ Returns:
113
+ mixture_w: [M, N, K], where K = (T-L)/(L/2)+1 = 2T/L-1
114
+ """
115
+ mixture = torch.unsqueeze(mixture, 1) # [M, 1, T]
116
+ mixture_w = F.relu(self.conv1d_U(mixture)) # [M, N, K]
117
+ return mixture_w
118
+
119
+
120
+ class Decoder(nn.Module):
121
+ def __init__(self, N, L):
122
+ super(Decoder, self).__init__()
123
+ # Hyper-parameter
124
+ self.N, self.L = N, L
125
+ # Components
126
+ self.basis_signals = nn.Linear(N, L, bias=False)
127
+
128
+ def forward(self, mixture_w, est_mask):
129
+ """
130
+ Args:
131
+ mixture_w: [M, N, K]
132
+ est_mask: [M, C, N, K]
133
+ Returns:
134
+ est_source: [M, C, T]
135
+ """
136
+ # D = W * M
137
+ source_w = torch.unsqueeze(mixture_w, 1) * est_mask # [M, C, N, K]
138
+ source_w = torch.transpose(source_w, 2, 3) # [M, C, K, N]
139
+ # S = DV
140
+ est_source = self.basis_signals(source_w) # [M, C, K, L]
141
+ est_source = overlap_and_add(est_source, self.L // 2) # M x C x T
142
+ return est_source
143
+
144
+
145
+ class TemporalConvNet(nn.Module):
146
+ def __init__(self, N, B, H, P, X, R, C, norm_type="gLN", causal=False,
147
+ mask_nonlinear='relu'):
148
+ """
149
+ Args:
150
+ N: Number of filters in autoencoder
151
+ B: Number of channels in bottleneck 1 × 1-conv block
152
+ H: Number of channels in convolutional blocks
153
+ P: Kernel size in convolutional blocks
154
+ X: Number of convolutional blocks in each repeat
155
+ R: Number of repeats
156
+ C: Number of speakers
157
+ norm_type: BN, gLN, cLN
158
+ causal: causal or non-causal
159
+ mask_nonlinear: use which non-linear function to generate mask
160
+ """
161
+ super(TemporalConvNet, self).__init__()
162
+ # Hyper-parameter
163
+ self.C = C
164
+ self.mask_nonlinear = mask_nonlinear
165
+ # Components
166
+ # [M, N, K] -> [M, N, K]
167
+ layer_norm = ChannelwiseLayerNorm(N)
168
+ # [M, N, K] -> [M, B, K]
169
+ bottleneck_conv1x1 = nn.Conv1d(N, B, 1, bias=False)
170
+ # [M, B, K] -> [M, B, K]
171
+ repeats = []
172
+ for r in range(R):
173
+ blocks = []
174
+ for x in range(X):
175
+ dilation = 2 ** x
176
+ padding = (P - 1) * dilation if causal else (P - 1) * dilation // 2
177
+ blocks += [TemporalBlock(B, H, P, stride=1,
178
+ padding=padding,
179
+ dilation=dilation,
180
+ norm_type=norm_type,
181
+ causal=causal)]
182
+ repeats += [nn.Sequential(*blocks)]
183
+ temporal_conv_net = nn.Sequential(*repeats)
184
+ # [M, B, K] -> [M, C*N, K]
185
+ mask_conv1x1 = nn.Conv1d(B, C * N, 1, bias=False)
186
+ # Put together
187
+ self.network = nn.Sequential(layer_norm,
188
+ bottleneck_conv1x1,
189
+ temporal_conv_net,
190
+ mask_conv1x1)
191
+
192
+ def forward(self, mixture_w):
193
+ """
194
+ Keep this API same with TasNet
195
+ Args:
196
+ mixture_w: [M, N, K], M is batch size
197
+ returns:
198
+ est_mask: [M, C, N, K]
199
+ """
200
+ M, N, K = mixture_w.size()
201
+ score = self.network(mixture_w) # [M, N, K] -> [M, C*N, K]
202
+ score = score.view(M, self.C, N, K) # [M, C*N, K] -> [M, C, N, K]
203
+ if self.mask_nonlinear == 'softmax':
204
+ est_mask = F.softmax(score, dim=1)
205
+ elif self.mask_nonlinear == 'relu':
206
+ est_mask = F.relu(score)
207
+ else:
208
+ raise ValueError("Unsupported mask non-linear function")
209
+ return est_mask
210
+
211
+
212
+ class TemporalBlock(nn.Module):
213
+ def __init__(self, in_channels, out_channels, kernel_size,
214
+ stride, padding, dilation, norm_type="gLN", causal=False):
215
+ super(TemporalBlock, self).__init__()
216
+ # [M, B, K] -> [M, H, K]
217
+ conv1x1 = nn.Conv1d(in_channels, out_channels, 1, bias=False)
218
+ prelu = nn.PReLU()
219
+ norm = chose_norm(norm_type, out_channels)
220
+ # [M, H, K] -> [M, B, K]
221
+ dsconv = DepthwiseSeparableConv(out_channels, in_channels, kernel_size,
222
+ stride, padding, dilation, norm_type,
223
+ causal)
224
+ # Put together
225
+ self.net = nn.Sequential(conv1x1, prelu, norm, dsconv)
226
+
227
+ def forward(self, x):
228
+ """
229
+ Args:
230
+ x: [M, B, K]
231
+ Returns:
232
+ [M, B, K]
233
+ """
234
+ residual = x
235
+ out = self.net(x)
236
+ # TODO: when P = 3 here works fine, but when P = 2 maybe need to pad?
237
+ return out + residual # look like w/o F.relu is better than w/ F.relu
238
+ # return F.relu(out + residual)
239
+
240
+
241
+ class DepthwiseSeparableConv(nn.Module):
242
+ def __init__(self, in_channels, out_channels, kernel_size,
243
+ stride, padding, dilation, norm_type="gLN", causal=False):
244
+ super(DepthwiseSeparableConv, self).__init__()
245
+ # Use `groups` option to implement depthwise convolution
246
+ # [M, H, K] -> [M, H, K]
247
+ depthwise_conv = nn.Conv1d(in_channels, in_channels, kernel_size,
248
+ stride=stride, padding=padding,
249
+ dilation=dilation, groups=in_channels,
250
+ bias=False)
251
+ if causal:
252
+ chomp = Chomp1d(padding)
253
+ prelu = nn.PReLU()
254
+ norm = chose_norm(norm_type, in_channels)
255
+ # [M, H, K] -> [M, B, K]
256
+ pointwise_conv = nn.Conv1d(in_channels, out_channels, 1, bias=False)
257
+ # Put together
258
+ if causal:
259
+ self.net = nn.Sequential(depthwise_conv, chomp, prelu, norm,
260
+ pointwise_conv)
261
+ else:
262
+ self.net = nn.Sequential(depthwise_conv, prelu, norm,
263
+ pointwise_conv)
264
+
265
+ def forward(self, x):
266
+ """
267
+ Args:
268
+ x: [M, H, K]
269
+ Returns:
270
+ result: [M, B, K]
271
+ """
272
+ return self.net(x)
273
+
274
+
275
+ class Chomp1d(nn.Module):
276
+ """To ensure the output length is the same as the input.
277
+ """
278
+
279
+ def __init__(self, chomp_size):
280
+ super(Chomp1d, self).__init__()
281
+ self.chomp_size = chomp_size
282
+
283
+ def forward(self, x):
284
+ """
285
+ Args:
286
+ x: [M, H, Kpad]
287
+ Returns:
288
+ [M, H, K]
289
+ """
290
+ return x[:, :, :-self.chomp_size].contiguous()
291
+
292
+
293
+ def chose_norm(norm_type, channel_size):
294
+ """The input of normlization will be (M, C, K), where M is batch size,
295
+ C is channel size and K is sequence length.
296
+ """
297
+ if norm_type == "gLN":
298
+ return GlobalLayerNorm(channel_size)
299
+ elif norm_type == "cLN":
300
+ return ChannelwiseLayerNorm(channel_size)
301
+ else: # norm_type == "BN":
302
+ # Given input (M, C, K), nn.BatchNorm1d(C) will accumulate statics
303
+ # along M and K, so this BN usage is right.
304
+ return nn.BatchNorm1d(channel_size)
305
+
306
+
307
+ # TODO: Use nn.LayerNorm to impl cLN to speed up
308
+ class ChannelwiseLayerNorm(nn.Module):
309
+ """Channel-wise Layer Normalization (cLN)"""
310
+
311
+ def __init__(self, channel_size):
312
+ super(ChannelwiseLayerNorm, self).__init__()
313
+ self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
314
+ self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
315
+ self.reset_parameters()
316
+
317
+ def reset_parameters(self):
318
+ self.gamma.data.fill_(1)
319
+ self.beta.data.zero_()
320
+
321
+ def forward(self, y):
322
+ """
323
+ Args:
324
+ y: [M, N, K], M is batch size, N is channel size, K is length
325
+ Returns:
326
+ cLN_y: [M, N, K]
327
+ """
328
+ mean = torch.mean(y, dim=1, keepdim=True) # [M, 1, K]
329
+ # var = torch.var(y, dim=1, keepdim=True, unbiased=False) # [M, 1, K]
330
+ var = (torch.pow(y - mean, 2)).mean(dim=1, keepdim=True)
331
+
332
+ cLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta
333
+ return cLN_y
334
+
335
+
336
+ class GlobalLayerNorm(nn.Module):
337
+ """Global Layer Normalization (gLN)"""
338
+
339
+ def __init__(self, channel_size):
340
+ super(GlobalLayerNorm, self).__init__()
341
+ self.gamma = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
342
+ self.beta = nn.Parameter(torch.Tensor(1, channel_size, 1)) # [1, N, 1]
343
+ self.reset_parameters()
344
+
345
+ def reset_parameters(self):
346
+ self.gamma.data.fill_(1)
347
+ self.beta.data.zero_()
348
+
349
+ def forward(self, y):
350
+ """
351
+ Args:
352
+ y: [M, N, K], M is batch size, N is channel size, K is length
353
+ Returns:
354
+ gLN_y: [M, N, K]
355
+ """
356
+ # TODO: in torch 1.0, torch.mean() support dim list
357
+ mean = y.mean(dim=1, keepdim=True).mean(dim=2, keepdim=True) # [M, 1, 1]
358
+ var = (torch.pow(y - mean, 2)).mean(dim=1, keepdim=True).mean(dim=2, keepdim=True)
359
+ gLN_y = self.gamma * (y - mean) / torch.pow(var + EPS, 0.5) + self.beta
360
+ return gLN_y
361
+
362
+
363
+ if __name__ == "__main__":
364
+ torch.manual_seed(123)
365
+ M, N, L, T = 2, 3, 4, 12
366
+ K = 2 * T // L - 1
367
+ B, H, P, X, R, C, norm_type, causal = 2, 3, 3, 3, 2, 2, "gLN", False
368
+ mixture = torch.randint(3, (M, T))
369
+ # test Encoder
370
+ encoder = Encoder(L, N)
371
+ encoder.conv1d_U.weight.data = torch.randint(2, encoder.conv1d_U.weight.size())
372
+ mixture_w = encoder(mixture)
373
+ print('mixture', mixture)
374
+ print('U', encoder.conv1d_U.weight)
375
+ print('mixture_w', mixture_w)
376
+ print('mixture_w size', mixture_w.size())
377
+
378
+ # test TemporalConvNet
379
+ separator = TemporalConvNet(N, B, H, P, X, R, C, norm_type=norm_type, causal=causal)
380
+ est_mask = separator(mixture_w)
381
+ print('est_mask', est_mask)
382
+
383
+ # test Decoder
384
+ decoder = Decoder(N, L)
385
+ est_mask = torch.randint(2, (B, K, C, N))
386
+ est_source = decoder(mixture_w, est_mask)
387
+ print('est_source', est_source)
388
+
389
+ # test Conv-TasNet
390
+ conv_tasnet = ConvTasNet(N, L, B, H, P, X, R, C, norm_type=norm_type)
391
+ est_source = conv_tasnet(mixture)
392
+ print('est_source', est_source)
393
+ print('est_source size', est_source.size())
models/ConvTasNet-ONNX (broken)/source.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ https://github.com/onnx/onnx/issues/3067
2
+ https://github.com/pytorch/pytorch/issues/46898
3
+ https://github.com/pytorch/pytorch/issues/47182
4
+ https://drive.google.com/file/d/1we2YpPVWVlIPNTXT6N92x_lH6fTRTd4r/view?usp=sharing
5
+ https://drive.google.com/file/d/1-UEej2yIXsvZWmN-VYdHHwSeIrxrS4BQ/view?usp=sharing
models/ConvTasNet-ONNX/conv_tasnet.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:781e8fcef71fdf3589fcc44ae44601f21d51d5d85381cfdf77d435a8e6720745
3
+ size 35449169
models/ConvTasNet-ONNX/source.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ https://github.com/PINTO0309/onnx2tf/issues/447
2
+ https://drive.google.com/file/d/189UHTs9OvDiNBc6BiZDG5zde2zSyTe6E/view
models/ConvTasNet_DAMP-VSEP_enhboth/.gitattributes ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.arrow filter=lfs diff=lfs merge=lfs -text
10
+ *.ftz filter=lfs diff=lfs merge=lfs -text
11
+ *.joblib filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.pb filter=lfs diff=lfs merge=lfs -text
15
+ *.pt filter=lfs diff=lfs merge=lfs -text
16
+ *.pth filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_DAMP-VSEP_enhboth/README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - asteroid
4
+ - audio
5
+ - ConvTasNet
6
+ - audio-to-audio
7
+ datasets:
8
+ - DAMP-VSEP
9
+ license: cc-by-sa-4.0
10
+ ---
11
+
12
+ ## Asteroid model `groadabike/ConvTasNet_DAMP-VSEP_enhboth`
13
+ Imported from [Zenodo](https://zenodo.org/record/3994193)
14
+
15
+ ### Description:
16
+ This model was trained by Gerardo Roa Dabike using Asteroid. It was trained on the enh_both task of the DAMP-VSEP dataset.
17
+
18
+ ### Training config:
19
+ ```yaml
20
+ data:
21
+ channels: 1
22
+ n_src: 2
23
+ root_path: data
24
+ sample_rate: 16000
25
+ samples_per_track: 10
26
+ segment: 3.0
27
+ task: enh_both
28
+ filterbank:
29
+ kernel_size: 20
30
+ n_filters: 256
31
+ stride: 10
32
+ main_args:
33
+ exp_dir: exp/train_convtasnet
34
+ help: None
35
+ masknet:
36
+ bn_chan: 256
37
+ conv_kernel_size: 3
38
+ hid_chan: 512
39
+ mask_act: relu
40
+ n_blocks: 8
41
+ n_repeats: 4
42
+ n_src: 2
43
+ norm_type: gLN
44
+ skip_chan: 256
45
+ optim:
46
+ lr: 0.0003
47
+ optimizer: adam
48
+ weight_decay: 0.0
49
+ positional arguments:
50
+ training:
51
+ batch_size: 12
52
+ early_stop: True
53
+ epochs: 50
54
+ half_lr: True
55
+ num_workers: 12
56
+ ```
57
+
58
+ ### Results:
59
+ ```yaml
60
+ si_sdr: 14.018196157142519
61
+ si_sdr_imp: 14.017103133809577
62
+ sdr: 14.498517291333885
63
+ sdr_imp: 14.463389151567865
64
+ sir: 24.149634529133372
65
+ sir_imp: 24.11450638936735
66
+ sar: 15.338597389045935
67
+ sar_imp: -137.30634122401517
68
+ stoi: 0.7639416744417206
69
+ stoi_imp: 0.1843383526963759
70
+ ```
71
+
72
+ ### License notice:
73
+ This work "ConvTasNet_DAMP-VSEP_enhboth" is a derivative of DAMP-VSEP: Smule Digital Archive of Mobile Performances - Vocal Separation (Version 1.0.1) by Smule, Inc, used under Smule's Research Data License Agreement (Research only). "ConvTasNet_DAMP-VSEP_enhboth" is licensed under Attribution-ShareAlike 3.0 Unported by Gerardo Roa Dabike.
models/ConvTasNet_DAMP-VSEP_enhboth/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8519e8658572f0d3a5e07002849337cb0ff07dcf3b3a641244e0905ceb0adc44
3
+ size 51990656
models/ConvTasNet_DAMP-VSEP_enhboth/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/groadabike/ConvTasNet_DAMP-VSEP_enhboth
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/.gitattributes ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
5
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.model filter=lfs diff=lfs merge=lfs -text
12
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
13
+ *.onnx filter=lfs diff=lfs merge=lfs -text
14
+ *.ot filter=lfs diff=lfs merge=lfs -text
15
+ *.parquet filter=lfs diff=lfs merge=lfs -text
16
+ *.pb filter=lfs diff=lfs merge=lfs -text
17
+ *.pt filter=lfs diff=lfs merge=lfs -text
18
+ *.pth filter=lfs diff=lfs merge=lfs -text
19
+ *.rar filter=lfs diff=lfs merge=lfs -text
20
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
21
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
22
+ *.tflite filter=lfs diff=lfs merge=lfs -text
23
+ *.tgz filter=lfs diff=lfs merge=lfs -text
24
+ *.xz filter=lfs diff=lfs merge=lfs -text
25
+ *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - asteroid
4
+ - audio
5
+ - ConvTasNet
6
+ - audio-to-audio
7
+ datasets:
8
+ - DAMP-VSEP
9
+ - Singing/Accompaniment Separation
10
+ license: cc-by-sa-4.0
11
+ ---
12
+
13
+
14
+ ## Description:
15
+ This model was trained by Gerardo Roa using the dampvsep recipe in Asteroid.
16
+ It was trained on the `singing/accompaniment` task of the `DAMP-VSEP` dataset.
17
+
18
+
19
+ ## Training config:
20
+ ```yaml
21
+ data:
22
+ channels: 1
23
+ emb_model: 'no'
24
+ metadata_path: metadata
25
+ mixture: remix
26
+ root_path: /fastdata/acp13gr/DAMP/DAMP-VSEP
27
+ sample_rate: 16000
28
+ train_set: english_nonenglish
29
+ filterbank:
30
+ kernel_size: 20
31
+ n_filters: 256
32
+ stride: 10
33
+ main_args:
34
+ exp_dir: exp/train_convtasnet_remix-no-0.0-english_nonenglish-0.0005-jade
35
+ help: null
36
+ masknet:
37
+ bn_chan: 256
38
+ conv_kernel_size: 3
39
+ hid_chan: 512
40
+ mask_act: relu
41
+ n_blocks: 10
42
+ n_repeats: 4
43
+ n_src: 2
44
+ norm_type: gLN
45
+ skip_chan: 256
46
+ optim:
47
+ lr: 0.0005
48
+ optimizer: adam
49
+ weight_decay: 0.0
50
+ positional arguments: {}
51
+ training:
52
+ batch_size: 7
53
+ early_stop: true
54
+ epochs: 50
55
+ half_lr: true
56
+ loss_alpha: 0.0
57
+ num_workers: 10
58
+ ```
59
+
60
+
61
+ ## Results:
62
+ ```yaml
63
+ "si_sdr": 15.111802516750586,
64
+ "si_sdr_imp": 15.178209807687663,
65
+ "si_sdr_s0": 12.160261214703553,
66
+ "si_sdr_s0_imp": 17.434593619085675,
67
+ "si_sdr_s1": 18.063343818797623,
68
+ "si_sdr_s1_imp": 12.92182599628965,
69
+ "sdr": 15.959722569460281,
70
+ "sdr_imp": 14.927002467087567,
71
+ "sdr_s0": 13.270412028426595,
72
+ "sdr_s0_imp": 16.45867572657551,
73
+ "sdr_s1": 18.64903311049397,
74
+ "sdr_s1_imp": 13.39532920759962,
75
+ "sir": 23.935932341084754,
76
+ "sir_imp": 22.903212238712012,
77
+ "sir_s0": 22.30777879911744,
78
+ "sir_s0_imp": 25.49604249726635,
79
+ "sir_s1": 25.56408588305207,
80
+ "sir_s1_imp": 20.310381980157665,
81
+ "sar": 17.174899162445882,
82
+ "sar_imp": -134.47377304178818,
83
+ "sar_s0": 14.268071153965913,
84
+ "sar_s0_imp": -137.38060105026818,
85
+ "sar_s1": 20.081727170925856,
86
+ "sar_s1_imp": -131.56694503330817,
87
+ "stoi": 0.7746496376326059,
88
+ "stoi_imp": 0.19613735629114643,
89
+ "stoi_s0": 0.6611376621212413,
90
+ "stoi_s0_imp": 0.21162695175464794,
91
+ "stoi_s1": 0.8881616131439705,
92
+ "stoi_s1_imp": 0.1806477608276449
93
+ ```
94
+
95
+
96
+ ## License notice:
97
+
98
+ ** This is important, please fill it, if you need help, you can ask on Asteroid's slack.**
99
+
100
+ This work "ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline"
101
+ is a derivative of [DAMP-VSEP corpus](https://zenodo.org/record/3553059) by
102
+ [Smule, Inc](https://www.smule.com/),
103
+ used under [Restricted License](https://zenodo.org/record/3553059)(Research only).
104
+ "ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline"
105
+ is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/)
106
+ by Gerardo Roa.
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f77ed26005b8cc6d9b6ca4f313e252b4b80b17378a0097c47eb60811708b75b0
3
+ size 64766287
models/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/groadabike/ConvTasNet_DAMPVSEP_EnglishNonEnglish_baseline
models/ConvTasNet_Libri1Mix_enhsignle_16k/.gitattributes ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.arrow filter=lfs diff=lfs merge=lfs -text
10
+ *.ftz filter=lfs diff=lfs merge=lfs -text
11
+ *.joblib filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.pb filter=lfs diff=lfs merge=lfs -text
15
+ *.pt filter=lfs diff=lfs merge=lfs -text
16
+ *.pth filter=lfs diff=lfs merge=lfs -text
17
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_Libri1Mix_enhsignle_16k/metadata.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"sample_rate": 16000, "domain_tags": ["speech"], "short_description": "Use me for speech enhancement! Works with 1 speaker.", "long_description": "This model was trained by Joris Cosentino using the librimix recipe in Asteroid. It was trained on the enh_single task of the Libri1Mix dataset.", "tags": ["speech enhancement", "speech"], "labels": ["enhanced"], "effect_type": "waveform-to-waveform", "multichannel": false}
models/ConvTasNet_Libri1Mix_enhsignle_16k/model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee430a56e84cf617044cd986d56a26800b96278d618ffc738b6e81f8eff6a88d
3
+ size 20500235
models/ConvTasNet_Libri1Mix_enhsignle_16k/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/hugggof/ConvTasNet_Libri1Mix_enhsignle_16k
models/ConvTasNet_Libri1Mix_enhsingle_8k/.gitattributes ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ *.arrow filter=lfs diff=lfs merge=lfs -text
10
+ *.ftz filter=lfs diff=lfs merge=lfs -text
11
+ *.joblib filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.pb filter=lfs diff=lfs merge=lfs -text
15
+ *.pt filter=lfs diff=lfs merge=lfs -text
16
+ *.pth filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_Libri1Mix_enhsingle_8k/README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - asteroid
4
+ - audio
5
+ - ConvTasNet
6
+ datasets:
7
+ - LibriMix
8
+ - enh_single
9
+ license: cc-by-sa-4.0
10
+ ---
11
+
12
+ ## Asteroid model
13
+ Imported from this Zenodo [model page](https://zenodo.org/record/3970768).
14
+
15
+ ## Description:
16
+ This model was trained by Brij Mohan using the Librimix/ConvTasNet recipe in Asteroid.
17
+ It was trained on the `enh_single` task of the Libri3Mix dataset.
18
+
19
+
20
+ ## Training config:
21
+ ```yaml
22
+ data:
23
+ n_src: 1
24
+ sample_rate: 8000
25
+ segment: 3
26
+ task: enh_single
27
+ train_dir: data/wav8k/min/train-360
28
+ valid_dir: data/wav8k/min/dev
29
+ filterbank:
30
+ kernel_size: 16
31
+ n_filters: 512
32
+ stride: 8
33
+ masknet:
34
+ bn_chan: 128
35
+ hid_chan: 512
36
+ mask_act: relu
37
+ n_blocks: 8
38
+ n_repeats: 3
39
+ n_src: 1
40
+ skip_chan: 128
41
+ optim:
42
+ lr: 0.001
43
+ optimizer: adam
44
+ weight_decay: 0.0
45
+ training:
46
+ batch_size: 24
47
+ early_stop: True
48
+ epochs: 200
49
+ half_lr: True
50
+ ```
51
+
52
+
53
+ ## Results:
54
+ ```yaml
55
+ si_sdr: 14.783675142685572
56
+ si_sdr_imp: 11.464625198953202
57
+ sdr: 15.497505907983102
58
+ sdr_imp: 12.07230150154914
59
+ sar: 15.497505907983102
60
+ sar_imp: 12.07230150154914
61
+ stoi: 0.9270030254700518
62
+ stoi_imp: 0.1320547197597893
63
+ ```
64
+
65
+
66
+ ## License notice:
67
+ This work "ConvTasNet_Libri1Mix_enhsingle_8k"
68
+ is a derivative of [LibriSpeech ASR corpus](http://www.openslr.org/12) by
69
+ [Vassil Panayotov](https://github.com/vdp),
70
+ used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
71
+ "ConvTasNet_Libri1Mix_enhsingle_8k"
72
+ is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/)
73
+ by Manuel Pariente.
models/ConvTasNet_Libri1Mix_enhsingle_8k/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c82d07cfb842778c26eeed222b313fcd9ae2776ce038a26f22dba0b700e597c
3
+ size 20063674
models/ConvTasNet_Libri1Mix_enhsingle_8k/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/mpariente/ConvTasNet_Libri1Mix_enhsingle_8k
models/ConvTasNet_Libri2Mix_SepClean/.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_Libri2Mix_SepClean/README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl
3
+ language:
4
+ - en
5
+ library_name: asteroid
6
+ tags:
7
+ - speech separation
8
+ - audio processing
9
+ ---
10
+
11
+ # Model Card for model.bin
12
+
13
+ <!-- Provide a quick summary of what the model is/does. [Optional] -->
14
+ This model was trained by Dhruv Saini using the libri2mix sep_clean dataset.
15
+
16
+ # Model Details
17
+
18
+ It is a ConvTasNet model for 2 speakers' speech separation.
19
+
20
+ ## Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is/does. -->
23
+ This model was trained by Dhruv Saini using the libri2mix sep_clean dataset.
24
+
25
+ - **Developed by:** Dhruv Saini
models/ConvTasNet_Libri2Mix_SepClean/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6e58d1cec93826da50c883f0c31edc1f95557f0312cd6448ef110049e12bed6
3
+ size 20410329
models/ConvTasNet_Libri2Mix_SepClean/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/Dhruv73/ConvTasNet_Libri2Mix_SepClean
models/ConvTasNet_Libri2Mix_sepclean_16k/.gitattributes ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_Libri2Mix_sepclean_16k/README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - asteroid
4
+ - audio
5
+ - ConvTasNet
6
+ - audio-to-audio
7
+ datasets:
8
+ - Libri2Mix
9
+ - sep_clean
10
+ license: cc-by-sa-4.0
11
+ ---
12
+
13
+ ## Asteroid model `JorisCos/ConvTasNet_Libri2Mix_sepclean_16k`
14
+
15
+ Description:
16
+
17
+ This model was trained by Joris Cosentino using the librimix recipe in [Asteroid](https://github.com/asteroid-team/asteroid).
18
+ It was trained on the `sep_clean` task of the Libri2Mix dataset.
19
+
20
+ Training config:
21
+ ```yaml
22
+ data:
23
+ n_src: 2
24
+ sample_rate: 16000
25
+ segment: 3
26
+ task: sep_clean
27
+ train_dir: data/wav16k/min/train-360
28
+ valid_dir: data/wav16k/min/dev
29
+ filterbank:
30
+ kernel_size: 32
31
+ n_filters: 512
32
+ stride: 16
33
+ masknet:
34
+ bn_chan: 128
35
+ hid_chan: 512
36
+ mask_act: relu
37
+ n_blocks: 8
38
+ n_repeats: 3
39
+ skip_chan: 128
40
+ optim:
41
+ lr: 0.001
42
+ optimizer: adam
43
+ weight_decay: 0.0
44
+ training:
45
+ batch_size: 6
46
+ early_stop: true
47
+ epochs: 200
48
+ half_lr: true
49
+ num_workers: 4
50
+ ```
51
+
52
+
53
+ Results :
54
+
55
+ On Libri2Mix min test set :
56
+ ```yaml
57
+ si_sdr: 15.243671356901526
58
+ si_sdr_imp: 15.243034178473609
59
+ sdr: 15.668108919568112
60
+ sdr_imp: 15.578229918028036
61
+ sir: 25.295100756629957
62
+ sir_imp: 25.205219921301754
63
+ sar: 16.307682590197313
64
+ sar_imp: -51.64989963759405
65
+ stoi: 0.9394951175291422
66
+ stoi_imp: 0.22640192740016568
67
+ ```
68
+
69
+ License notice:
70
+
71
+ This work "ConvTasNet_Libri2Mix_sepclean_16k"
72
+ is a derivative of [LibriSpeech ASR corpus](http://www.openslr.org/12) by Vassil Panayotov,
73
+ used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). "ConvTasNet_Libri2Mix_sepclean_16k"
74
+ is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Cosentino Joris.
models/ConvTasNet_Libri2Mix_sepclean_16k/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d97f012f7b2f22bb79cb0d0983a7ba27a52c1796ee3f63cbf25b4d28630adce
3
+ size 20394640
models/ConvTasNet_Libri2Mix_sepclean_16k/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepclean_16k
models/ConvTasNet_Libri2Mix_sepclean_8k/.gitattributes ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.h5 filter=lfs diff=lfs merge=lfs -text
5
+ *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
9
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
models/ConvTasNet_Libri2Mix_sepclean_8k/README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - asteroid
4
+ - audio
5
+ - ConvTasNet
6
+ - audio-to-audio
7
+ datasets:
8
+ - Libri2Mix
9
+ - sep_clean
10
+ license: cc-by-sa-4.0
11
+ ---
12
+
13
+ ## Asteroid model `JorisCos/ConvTasNet_Libri2Mix_sepclean_8k`
14
+ Imported from [Zenodo](https://zenodo.org/record/3873572#.X9M69cLjJH4)
15
+
16
+ Description:
17
+
18
+ This model was trained by Joris Cosentino using the librimix recipe in [Asteroid](https://github.com/asteroid-team/asteroid).
19
+ It was trained on the `sep_clean` task of the Libri2Mix dataset.
20
+
21
+ Training config:
22
+ ```yaml
23
+ data:
24
+ n_src: 2
25
+ sample_rate: 8000
26
+ segment: 3
27
+ task: sep_clean
28
+ train_dir: data/wav8k/min/train-360
29
+ valid_dir: data/wav8k/min/dev
30
+ filterbank:
31
+ kernel_size: 16
32
+ n_filters: 512
33
+ stride: 8
34
+ masknet:
35
+ bn_chan: 128
36
+ hid_chan: 512
37
+ mask_act: relu
38
+ n_blocks: 8
39
+ n_repeats: 3
40
+ skip_chan: 128
41
+ optim:
42
+ lr: 0.001
43
+ optimizer: adam
44
+ weight_decay: 0.0
45
+ training:
46
+ batch_size: 24
47
+ early_stop: True
48
+ epochs: 200
49
+ half_lr: True
50
+ num_workers: 2
51
+ ```
52
+
53
+
54
+ Results :
55
+
56
+ On Libri2Mix min test set :
57
+ ```yaml
58
+ si_sdr: 14.764543634468069
59
+ si_sdr_imp: 14.764029375607246
60
+ sdr: 15.29337970745095
61
+ sdr_imp: 15.114146605113111
62
+ sir: 24.092904661115366
63
+ sir_imp: 23.913669683141528
64
+ sar: 16.06055906916849
65
+ sar_imp: -51.980784441287454
66
+ stoi: 0.9311142440593033
67
+ stoi_imp: 0.21817376142710482
68
+ ```
69
+
70
+ License notice:
71
+
72
+ This work "ConvTasNet_Libri2Mix_sepclean_8k"
73
+ is a derivative of [LibriSpeech ASR corpus](http://www.openslr.org/12) by Vassil Panayotov,
74
+ used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). "ConvTasNet_Libri2Mix_sepclean_8k"
75
+ is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Cosentino Joris.