niobures commited on
Commit
aa8dfd8
·
verified ·
1 Parent(s): cfefc26

SGMSE (code, models, paper)

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +10 -0
  2. Analysing Diffusion-based Generative Approaches Versus Discriminative Approaches for Speech Restoration.pdf +3 -0
  3. Investigating Training Objectives for Generative Speech Enhancement.pdf +3 -0
  4. Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement.pdf +3 -0
  5. Single and Few-step Diffusion for Generative Speech Enhancement.pdf +3 -0
  6. Speech Enhancement and Dereverberation with Diffusion-based Generative Models.pdf +3 -0
  7. Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain.pdf +3 -0
  8. StoRM. A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation.pdf +3 -0
  9. code/Dereverbration-model (Samrinah).zip +3 -0
  10. code/SGMSE (PeymanAudioML).zip +3 -0
  11. code/sgmse-bbed.zip +3 -0
  12. code/sgmse-speech-enhancement-deverb-replicate.zip +3 -0
  13. code/sgmse.zip +3 -0
  14. code/sgmse_crp.zip +3 -0
  15. models/EARS/EARS. An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.pdf +3 -0
  16. models/EARS/ears_reverb.ckpt +3 -0
  17. models/EARS/ears_wham.ckpt +3 -0
  18. models/EARS/source.txt +2 -0
  19. models/SGMSE+ Dereverberation/epoch=326-step=408750.ckpt +3 -0
  20. models/SGMSE+ Dereverberation/source.txt +2 -0
  21. models/SGMSE+/source.txt +2 -0
  22. models/SGMSE+/train_vb_29nqe0uh_epoch=115.ckpt +3 -0
  23. models/SGMSE+/train_wsj0_2cta4cov_epoch=159.ckpt +3 -0
  24. models/SGMSE/EARS-WHAM + VB-DMD/SB_VB-DMD_EARS-WHAM_900k.pt +3 -0
  25. models/SGMSE/icassp2025_gense/checkpoints/m1.ckpt +3 -0
  26. models/SGMSE/icassp2025_gense/checkpoints/m2.ckpt +3 -0
  27. models/SGMSE/icassp2025_gense/checkpoints/m3.ckpt +3 -0
  28. models/SGMSE/icassp2025_gense/checkpoints/m4.ckpt +3 -0
  29. models/SGMSE/icassp2025_gense/checkpoints/m5.ckpt +3 -0
  30. models/SGMSE/icassp2025_gense/checkpoints/m6.ckpt +3 -0
  31. models/SGMSE/icassp2025_gense/checkpoints/m7.ckpt +3 -0
  32. models/SGMSE/icassp2025_gense/checkpoints/m8.ckpt +3 -0
  33. models/SGMSE/icassp2025_gense/source.txt +1 -0
  34. models/SGMSE/itg2025-reverbfx/checkpoints/sb_artificial_rir_350k.ckpt +3 -0
  35. models/SGMSE/itg2025-reverbfx/checkpoints/sgmse_artificial_rir_350k.ckpt +3 -0
  36. models/SGMSE/itg2025-reverbfx/checkpoints/sgmse_natural_rir_350k.ckpt +3 -0
  37. models/SGMSE/itg2025-reverbfx/source.txt +1 -0
  38. models/SGMSE/source.txt +1 -0
  39. models/SGMSE_BBED/epoch=222-pesq=3.04.ckpt +3 -0
  40. models/SGMSE_BBED/source.txt +2 -0
  41. models/SGMSE_CRP/epoch=184-pesq=2.71.ckpt +3 -0
  42. models/SGMSE_CRP/epoch=222-pesq=3.04.ckpt +3 -0
  43. models/SGMSE_CRP/epoch=3-pesq=2.80.ckpt +3 -0
  44. models/SGMSE_CRP/source.txt +6 -0
  45. models/SR105-SGMSE-NEW/.gitattributes +1 -0
  46. models/SR105-SGMSE-NEW/.gitignore +162 -0
  47. models/SR105-SGMSE-NEW/Dockerfile +33 -0
  48. models/SR105-SGMSE-NEW/LICENSE +34 -0
  49. models/SR105-SGMSE-NEW/README.md +77 -0
  50. models/SR105-SGMSE-NEW/THIRD_PARTY_LICENSES +21 -0
.gitattributes CHANGED
@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Analysing[[:space:]]Diffusion-based[[:space:]]Generative[[:space:]]Approaches[[:space:]]Versus[[:space:]]Discriminative[[:space:]]Approaches[[:space:]]for[[:space:]]Speech[[:space:]]Restoration.pdf filter=lfs diff=lfs merge=lfs -text
37
+ Investigating[[:space:]]Training[[:space:]]Objectives[[:space:]]for[[:space:]]Generative[[:space:]]Speech[[:space:]]Enhancement.pdf filter=lfs diff=lfs merge=lfs -text
38
+ models/EARS/EARS.[[:space:]]An[[:space:]]Anechoic[[:space:]]Fullband[[:space:]]Speech[[:space:]]Dataset[[:space:]]Benchmarked[[:space:]]for[[:space:]]Speech[[:space:]]Enhancement[[:space:]]and[[:space:]]Dereverberation.pdf filter=lfs diff=lfs merge=lfs -text
39
+ models/sgmse-voicebank/example.wav filter=lfs diff=lfs merge=lfs -text
40
+ models/speech-enhancement-sgmse/diffusion_process.png filter=lfs diff=lfs merge=lfs -text
41
+ Reducing[[:space:]]the[[:space:]]Prior[[:space:]]Mismatch[[:space:]]of[[:space:]]Stochastic[[:space:]]Differential[[:space:]]Equations[[:space:]]for[[:space:]]Diffusion-based[[:space:]]Speech[[:space:]]Enhancement.pdf filter=lfs diff=lfs merge=lfs -text
42
+ Single[[:space:]]and[[:space:]]Few-step[[:space:]]Diffusion[[:space:]]for[[:space:]]Generative[[:space:]]Speech[[:space:]]Enhancement.pdf filter=lfs diff=lfs merge=lfs -text
43
+ Speech[[:space:]]Enhancement[[:space:]]and[[:space:]]Dereverberation[[:space:]]with[[:space:]]Diffusion-based[[:space:]]Generative[[:space:]]Models.pdf filter=lfs diff=lfs merge=lfs -text
44
+ Speech[[:space:]]Enhancement[[:space:]]with[[:space:]]Score-Based[[:space:]]Generative[[:space:]]Models[[:space:]]in[[:space:]]the[[:space:]]Complex[[:space:]]STFT[[:space:]]Domain.pdf filter=lfs diff=lfs merge=lfs -text
45
+ StoRM.[[:space:]]A[[:space:]]Diffusion-based[[:space:]]Stochastic[[:space:]]Regeneration[[:space:]]Model[[:space:]]for[[:space:]]Speech[[:space:]]Enhancement[[:space:]]and[[:space:]]Dereverberation.pdf filter=lfs diff=lfs merge=lfs -text
Analysing Diffusion-based Generative Approaches Versus Discriminative Approaches for Speech Restoration.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1e659b7a85febf1f21c8e441e1d9083291973901f9ad945458a5077fa183ba1
3
+ size 371786
Investigating Training Objectives for Generative Speech Enhancement.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7883a9d45ce17f4bd8b2be38fd23d86c45b8f2ece08e63b5e19c8abccd52cdde
3
+ size 485378
Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fb0f55d15a6b2ec822ed483c9d8f586fa8fe46cbe4f055294f883240c01d7f8
3
+ size 581232
Single and Few-step Diffusion for Generative Speech Enhancement.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a260c057b74a16a0398a9a590d82341804cdb53f7f00563d74e63af2ea3da6b
3
+ size 351945
Speech Enhancement and Dereverberation with Diffusion-based Generative Models.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4145fa6db8457d7812b221aa5d8e44c4238abd3888363219aa49ca1b14e9c810
3
+ size 2282648
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca4046ae971f7acd79222503483f2f16804f10964e0e0f87e9232ba7f2b83597
3
+ size 794848
StoRM. A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50b9785f3d74caa7a8dadc21b067ea87529b507693503cd86c30a6b5d7a6d4b9
3
+ size 8201030
code/Dereverbration-model (Samrinah).zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:382f26884bd57499e8da392eca6e584cafb081a132e88b7f15dbef4242c7cbe6
3
+ size 53952502
code/SGMSE (PeymanAudioML).zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42e415d49bbbdf06a813883dadd1f84386990da68e1db206fc27447c432c8552
3
+ size 868721
code/sgmse-bbed.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bdafc1f039cca2577436b38e107eb43a0c932c080df91491c1c903c8cccb7f26
3
+ size 302782
code/sgmse-speech-enhancement-deverb-replicate.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bad42e76535206b568b31aad10e06b5a38e703d22e1f15af8c71715cf6b0394
3
+ size 40259
code/sgmse.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:965a1a12b5e71d852fa7d17fc344983c1b6fbf175ee3d8f8d6d0ca134ebdeed0
3
+ size 4289852
code/sgmse_crp.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edfeef9ae75f53abd90798ae2bd877b64f39c76d6154c50970d27980a89198b9
3
+ size 348452
models/EARS/EARS. An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b9456a260b2a118367f825a4ac8aad6e6a94a6a255ddf758be2bab230fd3d1c
3
+ size 3444783
models/EARS/ears_reverb.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9be62a3df0401398e5626a1c4dc65039220499d506941ecf110b52cccb7309b2
3
+ size 1295848436
models/EARS/ears_wham.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2768097c11e8192ce499416c48e67d5085d31cefb63ff928e383dd32817ba2c9
3
+ size 1295841456
models/EARS/source.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ https://github.com/sp-uhh/sgmse
2
+ https://drive.google.com/drive/folders/1Tn6pVwjxUAy1DJ8167JCg3enuSi0hiw5
models/SGMSE+ Dereverberation/epoch=326-step=408750.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b770d098538dec1c06c6917bf327a7922ef326321aa23678f071d86c5f39716f
3
+ size 1312808247
models/SGMSE+ Dereverberation/source.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ https://github.com/sp-uhh/sgmse
2
+ https://drive.google.com/drive/folders/1082_PSEgrqoVVrNsAkSIcpLF1AAtzGwV
models/SGMSE+/source.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ https://github.com/sp-uhh/sgmse
2
+ https://drive.google.com/drive/folders/1CSnkhUSoiv3RG0xg7WEcVapyLuwDaLbe
models/SGMSE+/train_vb_29nqe0uh_epoch=115.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3875747b5646092d5c556bae68e5af639e2c1f45f009c669f379cd4d415cbd8
3
+ size 1312806643
models/SGMSE+/train_wsj0_2cta4cov_epoch=159.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ec94cf546ef0a9d66f90364bd735820c78d9a214133588e90ce9ce01cd8a73b
3
+ size 1312806643
models/SGMSE/EARS-WHAM + VB-DMD/SB_VB-DMD_EARS-WHAM_900k.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9d6a9a5245d99938bf46d738e14bf60438e843a7da937a107b620cf099dc6c0
3
+ size 1312970335
models/SGMSE/icassp2025_gense/checkpoints/m1.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f43ac4da4591632483637e8b23c65fec1b1024e28d148fc649135dc1f9e3a89f
3
+ size 1312746286
models/SGMSE/icassp2025_gense/checkpoints/m2.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:097a71404a8da1d210904e2d2b8afa1fd32b96eb54436bc9ea4d985a18b56f5b
3
+ size 1312753390
models/SGMSE/icassp2025_gense/checkpoints/m3.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57338426f0ee53cfb747f2498c89f2219462752f60679d27933af3039d92ab85
3
+ size 1312753844
models/SGMSE/icassp2025_gense/checkpoints/m4.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3560126111a27291ecad4ed96e7f7a4af9139beebcb02b2faccaf9223accf4d
3
+ size 1275872898
models/SGMSE/icassp2025_gense/checkpoints/m5.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8dc9cae6a5eaeea25ef71b0988d4b89e6452bed1930577c6444cac3dc4e898b
3
+ size 1312753908
models/SGMSE/icassp2025_gense/checkpoints/m6.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9481a8c0c96795500724f050557c23de5e06810f4cd635e2156a4cb61c8cf751
3
+ size 1312811384
models/SGMSE/icassp2025_gense/checkpoints/m7.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff557ed5e784d9ae7c7505bc48838fb6b2e265566688f031244ea7bb7d69bc30
3
+ size 1312804216
models/SGMSE/icassp2025_gense/checkpoints/m8.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:041329eb879c714efc9da57336b7b47b8f983b3a5374e7f92f9f54ead431e03e
3
+ size 1312804216
models/SGMSE/icassp2025_gense/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://www2.informatik.uni-hamburg.de/sp/audio/publications/icassp2025_gense/checkpoints
models/SGMSE/itg2025-reverbfx/checkpoints/sb_artificial_rir_350k.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9507953339e8b7ce8b08568b223a60fd7c86c83d033a91f6c4180df66fda6890
3
+ size 1297096851
models/SGMSE/itg2025-reverbfx/checkpoints/sgmse_artificial_rir_350k.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33a7ae9637dbf07c10e62d9c7e1296d48f7904157adb7c44eb050c9389c03105
3
+ size 1295834987
models/SGMSE/itg2025-reverbfx/checkpoints/sgmse_natural_rir_350k.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86f9c2a07efcf7d7527f2e2723d01e3d852cf009fdda73ab5018e32441b52330
3
+ size 1295835051
models/SGMSE/itg2025-reverbfx/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://www2.informatik.uni-hamburg.de/sp/audio/publications/itg2025-reverbfx/checkpoints
models/SGMSE/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://github.com/sp-uhh/sgmse
models/SGMSE_BBED/epoch=222-pesq=3.04.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7b5aeb6981ff81e7b1fa88041ab002ee547939d0c96cc9f9c17982b89f77127
3
+ size 1312968045
models/SGMSE_BBED/source.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ https://github.com/sp-uhh/sgmse-bbed
2
+ https://drive.google.com/file/d/1_h7pH6o-j7GV_E69SbRQF2BMRlC8tmz_/view?usp=share_link
models/SGMSE_CRP/epoch=184-pesq=2.71.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3766f90101e19e0e86f39512aa0b5064158066d9c6500aa4ca458243c193cebb
3
+ size 1312968045
models/SGMSE_CRP/epoch=222-pesq=3.04.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7b5aeb6981ff81e7b1fa88041ab002ee547939d0c96cc9f9c17982b89f77127
3
+ size 1312968045
models/SGMSE_CRP/epoch=3-pesq=2.80.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac1508495863e9e0dbc7a637b2876cf56c3c603aaab1f7401371af446b524c4f
3
+ size 1312968109
models/SGMSE_CRP/source.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ https://github.com/sp-uhh/sgmse_crp
2
+ https://drive.google.com/file/d/1_h7pH6o-j7GV_E69SbRQF2BMRlC8tmz_/view?usp=share_link
3
+ https://drive.google.com/file/d/1AJmEJalqJyrgZEVh-NZ2mgHdIeu-XgMz/view?usp=drive_link.
4
+ https://drive.google.com/file/d/1E0-Cr5CX7xNr_T53eVZP-1-dvlmBAJW6/view?usp=drive_link
5
+ https://drive.google.com/file/d/1_h7pH6o-j7GV_E69SbRQF2BMRlC8tmz_/view?usp=share_link
6
+ https://drive.google.com/file/d/1E0-Cr5CX7xNr_T53eVZP-1-dvlmBAJW6/view?usp=drive_link
models/SR105-SGMSE-NEW/.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
models/SR105-SGMSE-NEW/.gitignore ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ share/python-wheels/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+ MANIFEST
28
+
29
+ # PyInstaller
30
+ # Usually these files are written by a python script from a template
31
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
32
+ *.manifest
33
+ *.spec
34
+
35
+ # Installer logs
36
+ pip-log.txt
37
+ pip-delete-this-directory.txt
38
+
39
+ # Unit test / coverage reports
40
+ htmlcov/
41
+ .tox/
42
+ .nox/
43
+ .coverage
44
+ .coverage.*
45
+ .cache
46
+ nosetests.xml
47
+ coverage.xml
48
+ *.cover
49
+ *.py,cover
50
+ .hypothesis/
51
+ .pytest_cache/
52
+ cover/
53
+
54
+ # Translations
55
+ *.mo
56
+ *.pot
57
+
58
+ # Django stuff:
59
+ *.log
60
+ local_settings.py
61
+ db.sqlite3
62
+ db.sqlite3-journal
63
+
64
+ # Flask stuff:
65
+ instance/
66
+ .webassets-cache
67
+
68
+ # Scrapy stuff:
69
+ .scrapy
70
+
71
+ # Sphinx documentation
72
+ docs/_build/
73
+
74
+ # PyBuilder
75
+ .pybuilder/
76
+ target/
77
+
78
+ # Jupyter Notebook
79
+ .ipynb_checkpoints
80
+
81
+ # IPython
82
+ profile_default/
83
+ ipython_config.py
84
+
85
+ # pyenv
86
+ # For a library or package, you might want to ignore these files since the code is
87
+ # intended to run in multiple environments; otherwise, check them in:
88
+ # .python-version
89
+
90
+ # pipenv
91
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
93
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
94
+ # install all needed dependencies.
95
+ #Pipfile.lock
96
+
97
+ # poetry
98
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
100
+ # commonly ignored for libraries.
101
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102
+ #poetry.lock
103
+
104
+ # pdm
105
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106
+ #pdm.lock
107
+ # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108
+ # in version control.
109
+ # https://pdm.fming.dev/latest/usage/project/#working-with-version-control
110
+ .pdm.toml
111
+ .pdm-python
112
+ .pdm-build/
113
+
114
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
115
+ __pypackages__/
116
+
117
+ # Celery stuff
118
+ celerybeat-schedule
119
+ celerybeat.pid
120
+
121
+ # SageMath parsed files
122
+ *.sage.py
123
+
124
+ # Environments
125
+ .env
126
+ .venv
127
+ env/
128
+ venv/
129
+ ENV/
130
+ env.bak/
131
+ venv.bak/
132
+
133
+ # Spyder project settings
134
+ .spyderproject
135
+ .spyproject
136
+
137
+ # Rope project settings
138
+ .ropeproject
139
+
140
+ # mkdocs documentation
141
+ /site
142
+
143
+ # mypy
144
+ .mypy_cache/
145
+ .dmypy.json
146
+ dmypy.json
147
+
148
+ # Pyre type checker
149
+ .pyre/
150
+
151
+ # pytype static type analyzer
152
+ .pytype/
153
+
154
+ # Cython debug symbols
155
+ cython_debug/
156
+
157
+ # PyCharm
158
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
159
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
160
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
161
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
162
+ #.idea/
models/SR105-SGMSE-NEW/Dockerfile ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10.14-bookworm
2
+
3
+ ARG USER_UID=10002
4
+ ARG USER_GID=$USER_UID
5
+ ARG USERNAME=modelapi
6
+
7
+ RUN groupadd --gid $USER_GID $USERNAME \
8
+ && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
9
+
10
+ # Copy required files
11
+ RUN mkdir -p /modelapi && mkdir -p /home/$USERNAME/.modelapi
12
+ COPY app /modelapi/app
13
+ COPY sgmse /modelapi/sgmse
14
+ COPY pyproject.toml /modelapi/pyproject.toml
15
+
16
+ ENV CUDA_HOME=/usr/local/cuda-12.6
17
+
18
+ # Setup permissions
19
+ RUN chown -R $USER_UID:$USER_GID /modelapi \
20
+ && chown -R $USER_UID:$USER_GID /home/$USERNAME/.modelapi \
21
+ && chown -R $USER_UID:$USER_GID /home/$USERNAME \
22
+ && chmod -R 755 /home/$USERNAME \
23
+ && chmod -R 755 /modelapi \
24
+ && chmod -R 755 /home/$USERNAME/.modelapi
25
+
26
+ # Change to the user and do subnet installation
27
+ USER $USERNAME
28
+
29
+ RUN /bin/bash -c "python3 -m venv /modelapi/.venv && source /modelapi/.venv/bin/activate && pip3 install -e /modelapi/."
30
+
31
+ EXPOSE 6500
32
+
33
+ CMD ["/bin/bash", "-c", "source /modelapi/.venv/bin/activate && python3 /modelapi/app/run.py"]
models/SR105-SGMSE-NEW/LICENSE ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 synapsec.ai
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
23
+ ---
24
+
25
+ ### Third-Party Code:
26
+
27
+ Portions of this software are derived from code in the following project(s):
28
+
29
+ - speech-enhancement-sgmse by sp-uhh (MIT License)
30
+ - Repository: https://huggingface.co/sp-uhh/speech-enhancement-sgmse
31
+ - Copyright (c) 2022 Signal Processing (SP), Universität Hamburg
32
+ - Licensed under the MIT License (included in the `THIRD_PARTY_LICENSES` file)
33
+
34
+ ---
models/SR105-SGMSE-NEW/README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Container Template for SoundsRight Subnet Miners
2
+
3
+ This repository contains a contanierized version of [SGMSE+](https://huggingface.co/sp-uhh/speech-enhancement-sgmse) and serves as a tutorial for miners to format their models on [Bittensor's](https://bittensor.com/) [SoundsRight Subnet](https://github.com/synapsec-ai/SoundsRightSubnet). The branches `DENOISING_16000HZ` and `DEREVERBERATION_16000HZ` contain SGMSE fitted with the approrpriate checkpoints for denoising and dereverberation tasks at 16kHz, respectively.
4
+
5
+ This container has only been tested with **Ubuntu 24.04** and **CUDA 12.6**. It may run on other configurations, but it is not guaranteed.
6
+
7
+ To run the container, first configure NVIDIA Container Toolkit and generate a CDI specification. Follow the instructions to download the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) with Apt.
8
+
9
+ Next, follow the instructions for [generating a CDI specification](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html).
10
+
11
+ Verify that the CDI specification was done correctly with:
12
+ ```
13
+ $ nvidia-ctk cdi list
14
+ ```
15
+ You should see this in your output:
16
+ ```
17
+ nvidia.com/gpu=all
18
+ nvidia.com/gpu=0
19
+ ```
20
+
21
+ If you are running podman as root, run the following command to start the container:
22
+
23
+ Run the container with:
24
+ ```
25
+ podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --user root --name modelapi -p 6500:6500 modelapi
26
+ ```
27
+ Access logs with:
28
+ ```
29
+ podman logs -f modelapi
30
+ ```
31
+ If you are running the container rootless, there are a few more changes to make:
32
+
33
+ First, modify `/etc/nvidia-container-runtime/config.toml` and set the following parameters:
34
+ ```
35
+ [nvidia-container-cli]
36
+ no-cgroups = true
37
+
38
+ [nvidia-container-runtime]
39
+ debug = "/tmp/nvidia-container-runtime.log"
40
+ ```
41
+ You can also run the following command to achieve the same result:
42
+ ```
43
+ $ sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
44
+ ```
45
+
46
+ Run the container with:
47
+ ```
48
+ podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --volume /usr/local/cuda-12.6:/usr/local/cuda-12.6 --user 10002:10002 --name modelapi -p 6500:6500 modelapi
49
+ ```
50
+ Access logs with:
51
+ ```
52
+ podman logs -f modelapi
53
+ ```
54
+ Running the container will spin up an API with the following endpoints:
55
+ 1. `/status/` : Communicates API status
56
+ 2. `/prepare/` : Download model checkpoint and initialize model
57
+ 3. `/upload-audio/` : Upload audio files, save to noisy audio directory
58
+ 4. `/enhance/` : Initialize model, enhance audio files, save to enhanced audio directory
59
+ 5. `/download-enhanced/` : Download enhanced audio files
60
+
61
+ By default the API will use host `0.0.0.0` and port `6500`.
62
+
63
+ ### References
64
+
65
+ 1. **Welker, Simon; Richter, Julius; Gerkmann, Timo**
66
+ *Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain*.
67
+ Proceedings of *Interspeech 2022*, 2022, pp. 2928–2932.
68
+ [DOI: 10.21437/Interspeech.2022-10653](https://doi.org/10.21437/Interspeech.2022-10653)
69
+
70
+ 2. **Richter, Julius; Welker, Simon; Lemercier, Jean-Marie; Lay, Bunlong; Gerkmann, Timo**
71
+ *Speech Enhancement and Dereverberation with Diffusion-based Generative Models*.
72
+ *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, Vol. 31, 2023, pp. 2351–2364.
73
+ [DOI: 10.1109/TASLP.2023.3285241](https://doi.org/10.1109/TASLP.2023.3285241)
74
+
75
+ 3. **Richter, Julius; Wu, Yi-Chiao; Krenn, Steven; Welker, Simon; Lay, Bunlong; Watanabe, Shinjii; Richard, Alexander; Gerkmann, Timo**
76
+ *EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation*.
77
+ Proceedings of *ISCA Interspeech*, 2024, pp. 4873–4877.
models/SR105-SGMSE-NEW/THIRD_PARTY_LICENSES ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2022 Signal Processing (SP), Universität Hamburg
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.