File size: 3,703 Bytes
561a1cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Container Template for SoundsRight Subnet Miners

This repository contains a contanierized version of [SGMSE+](https://huggingface.co/sp-uhh/speech-enhancement-sgmse) and serves as a tutorial for miners to format their models on [Bittensor's](https://bittensor.com/) [SoundsRight Subnet](https://github.com/synapsec-ai/SoundsRightSubnet). The branches `DENOISING_16000HZ` and `DEREVERBERATION_16000HZ` contain SGMSE fitted with the approrpriate checkpoints for denoising and dereverberation tasks at 16kHz, respectively. 

This container has only been tested with **Ubuntu 24.04** and **CUDA 12.6**. It may run on other configurations, but it is not guaranteed.

To run the container, first configure NVIDIA Container Toolkit and generate a CDI specification. Follow the instructions to download the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) with Apt.

Next, follow the instructions for [generating a CDI specification](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html).

Verify that the CDI specification was done correctly with:
```
$ nvidia-ctk cdi list
```
You should see this in your output:
```
nvidia.com/gpu=all
nvidia.com/gpu=0
```

If you are running podman as root, run the following command to start the container:

Run the container with:
```
podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --user root --name modelapi -p 6500:6500 modelapi
```
Access logs with: 
```
podman logs -f modelapi
```
If you are running the container rootless, there are a few more changes to make:

First, modify `/etc/nvidia-container-runtime/config.toml` and set the following parameters:
```
[nvidia-container-cli]
no-cgroups = true

[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"
```
You can also run the following command to achieve the same result:
```
$ sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
```

Run the container with:
```
podman build -t modelapi . && podman run -d --device nvidia.com/gpu=all --volume /usr/local/cuda-12.6:/usr/local/cuda-12.6 --user 10002:10002 --name modelapi -p 6500:6500 modelapi
```
Access logs with: 
```
podman logs -f modelapi
```
Running the container will spin up an API with the following endpoints:
1. `/status/` : Communicates API status
2. `/prepare/` : Download model checkpoint and initialize model
3. `/upload-audio/` : Upload audio files, save to noisy audio directory
4. `/enhance/` : Initialize model, enhance audio files, save to enhanced audio directory
5. `/download-enhanced/` : Download enhanced audio files

By default the API will use host `0.0.0.0` and port `6500`.

### References

1. **Welker, Simon; Richter, Julius; Gerkmann, Timo**  
   *Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain*.  
   Proceedings of *Interspeech 2022*, 2022, pp. 2928–2932.  
   [DOI: 10.21437/Interspeech.2022-10653](https://doi.org/10.21437/Interspeech.2022-10653)

2. **Richter, Julius; Welker, Simon; Lemercier, Jean-Marie; Lay, Bunlong; Gerkmann, Timo**  
   *Speech Enhancement and Dereverberation with Diffusion-based Generative Models*.  
   *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, Vol. 31, 2023, pp. 2351–2364.  
   [DOI: 10.1109/TASLP.2023.3285241](https://doi.org/10.1109/TASLP.2023.3285241)

3. **Richter, Julius; Wu, Yi-Chiao; Krenn, Steven; Welker, Simon; Lay, Bunlong; Watanabe, Shinjii; Richard, Alexander; Gerkmann, Timo**  
   *EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation*.  
   Proceedings of *ISCA Interspeech*, 2024, pp. 4873–4877.