Update README and requirements.txt: add gradio==3.39.0 and improve docs
Browse files- README.md +37 -82
- requirements.txt +4 -4
README.md
CHANGED
|
@@ -1,96 +1,51 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
A minimal speech-denoising project built around a SEGAN-style U-Net generator. It includes:
|
|
|
|
| 6 |
- Training script to learn on paired noisy/clean audio.
|
| 7 |
- Inference pipeline that denoises long clips in chunks and can pack output audio losslessly into PNG.
|
| 8 |
- FastAPI service to expose denoise + PNG pack/restore endpoints.
|
|
|
|
| 9 |
|
| 10 |
-
|
| 11 |
-
-------------
|
| 12 |
-
- `SEGAN.py` β training components: config, dataset, U-Net generator, PatchGAN discriminator, training loop.
|
| 13 |
-
- `pipeline.py` β inference utilities: chunked denoiser, spectral gating cleanup, PNG pack/restore helpers.
|
| 14 |
-
- `app.py` β FastAPI app wiring the pipeline for HTTP use.
|
| 15 |
-
- `seagan_final.pt` β example checkpoint (place your own if different).
|
| 16 |
-
- `requirements.txt` β Python dependencies.
|
| 17 |
-
|
| 18 |
-
Prerequisites
|
| 19 |
-
-------------
|
| 20 |
-
- Python 3.9+ (tested with PyTorch CPU/GPU builds).
|
| 21 |
-
- For GPU inference/training, install the matching CUDA-enabled `torch`/`torchaudio`.
|
| 22 |
-
- FFmpeg is not required; `torchaudio` handles WAV I/O.
|
| 23 |
-
|
| 24 |
-
Install
|
| 25 |
-
-------
|
| 26 |
-
```bash
|
| 27 |
-
python -m venv .venv
|
| 28 |
-
source .venv/Scripts/activate # on Windows PowerShell: .\.venv\Scripts\activate
|
| 29 |
-
pip install -r requirements.txt
|
| 30 |
-
```
|
| 31 |
-
If you need a specific CUDA wheel, install torch/torchaudio first, then run `pip install -r requirements.txt` with `--no-deps`.
|
| 32 |
-
|
| 33 |
-
Quick Inference (CLI)
|
| 34 |
-
---------------------
|
| 35 |
-
Use the chunked denoiser directly:
|
| 36 |
-
```bash
|
| 37 |
-
python pipeline.py --input path/to/noisy.wav --output path/to/denoised.wav --checkpoint seagan_final.pt
|
| 38 |
-
```
|
| 39 |
-
Notes:
|
| 40 |
-
- `--png-width` controls width when packing to PNG; omit `--no-pack` to also write `*_packed.png` and a reconstructed WAV check.
|
| 41 |
-
- The denoiser mirrors/overlaps chunks to reduce seams and optionally runs a spectral subtraction cleanup.
|
| 42 |
|
| 43 |
-
|
| 44 |
-
---------------
|
| 45 |
-
Environment variables:
|
| 46 |
-
- `CHECKPOINT_PATH` (default `/app/checkpoints/seagan_final.pt`)
|
| 47 |
-
- `CHECKPOINT_URL` (optional download at startup)
|
| 48 |
-
- `SAMPLE_RATE` (default `16000`)
|
| 49 |
-
- `PNG_WIDTH` (default `2048`)
|
| 50 |
|
| 51 |
-
|
| 52 |
-
``
|
| 53 |
-
|
| 54 |
-
``
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
-
- `POST /denoise-and-pack` β form-data key `file` with WAV. Returns packed PNG of denoised audio.
|
| 58 |
-
- `POST /restore-from-png` β form-data key `file` with packed PNG. Returns restored WAV.
|
| 59 |
-
- `GET /health` β health check.
|
| 60 |
|
| 61 |
-
|
| 62 |
-
--------------
|
| 63 |
-
`SEGAN.py` trains on paired noisy/clean WAVs. Update `Config.noisy_dir`, `Config.clean_dir`, and `Config.save_dir` to your paths, then run:
|
| 64 |
-
```bash
|
| 65 |
-
python SEGAN.py
|
| 66 |
-
```
|
| 67 |
-
Checkpoints are written every 5 epochs and as `seagan_final.pt` at the end. The inference pipeline expects a `G_state` entry inside the checkpoint.
|
| 68 |
|
| 69 |
-
|
| 70 |
-
--
|
| 71 |
-
`
|
| 72 |
-
- `save_audio_as_png_lossless(tensor, png_path, width)` β stores int16 PCM in a lossless PNG.
|
| 73 |
-
- `load_audio_from_png_lossless(png_path, original_length)` β restores the tensor.
|
| 74 |
-
- `write_wav_from_tensor(tensor, out_wav_path, sr)` β writes mono WAV.
|
| 75 |
|
| 76 |
-
Tips
|
| 77 |
-
----
|
| 78 |
-
- Keep input WAVs mono or they will be averaged to mono.
|
| 79 |
-
- Large files are chunked; adjust `chunk_seconds` and `overlap` in `denoise_chunked_final`.
|
| 80 |
-
- Ensure the checkpoint matches the model architecture in `SEGAN.py`.
|
| 81 |
-
=======
|
| 82 |
-
---
|
| 83 |
-
title: SEGAN
|
| 84 |
-
emoji: π’
|
| 85 |
-
colorFrom: green
|
| 86 |
-
colorTo: yellow
|
| 87 |
-
sdk: gradio
|
| 88 |
-
sdk_version: 6.1.0
|
| 89 |
-
app_file: app.py
|
| 90 |
-
pinned: false
|
| 91 |
-
license: apache-2.0
|
| 92 |
-
short_description: Remove BackgroundNoise and Generate Image from the Audio
|
| 93 |
---
|
| 94 |
|
| 95 |
-
|
| 96 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: SEGAN
|
| 3 |
+
emoji: π’
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: yellow
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "6.1.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
short_description: Remove BackgroundNoise and Generate Image from the Audio
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# SEAGAN Speech Enhancement & API
|
| 15 |
|
| 16 |
A minimal speech-denoising project built around a SEGAN-style U-Net generator. It includes:
|
| 17 |
+
|
| 18 |
- Training script to learn on paired noisy/clean audio.
|
| 19 |
- Inference pipeline that denoises long clips in chunks and can pack output audio losslessly into PNG.
|
| 20 |
- FastAPI service to expose denoise + PNG pack/restore endpoints.
|
| 21 |
+
- Gradio demo for Hugging Face Spaces (app.py).
|
| 22 |
|
| 23 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
+
## Repo Contents
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
- `SEGAN.py` β training components: config, dataset, U-Net generator, PatchGAN discriminator, training loop.
|
| 28 |
+
- `pipeline.py` β inference utilities: chunked denoiser, spectral gating cleanup, PNG pack/restore helpers.
|
| 29 |
+
- `app.py` β Gradio / FastAPI app wiring the pipeline for UI/API use.
|
| 30 |
+
- `checkpoint/seagan_final.pt` β example checkpoint (place your own if different) β tracked with git-lfs.
|
| 31 |
+
- `requirements.txt` β Python dependencies.
|
| 32 |
|
| 33 |
+
---
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
## Prerequisites
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
- Python 3.9+ (tested with PyTorch CPU/GPU builds).
|
| 38 |
+
- For GPU inference/training, install the matching CUDA-enabled `torch`/`torchaudio`.
|
| 39 |
+
- FFmpeg is not required; `torchaudio` handles WAV I/O.
|
|
|
|
|
|
|
|
|
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
---
|
| 42 |
|
| 43 |
+
## Install
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
python -m venv .venv
|
| 47 |
+
# Windows PowerShell:
|
| 48 |
+
.\.venv\Scripts\Activate.ps1
|
| 49 |
+
# or cmd:
|
| 50 |
+
.\.venv\Scripts\activate.bat
|
| 51 |
+
pip install -r requirements.txt
|
requirements.txt
CHANGED
|
@@ -1,9 +1,9 @@
|
|
| 1 |
fastapi==0.95.2
|
| 2 |
uvicorn[standard]==0.22.0
|
| 3 |
-
torch==2.1.0
|
| 4 |
-
torchaudio==2.1.0
|
| 5 |
numpy==1.26.4
|
| 6 |
pillow==10.1.0
|
| 7 |
requests==2.32.2
|
| 8 |
-
|
| 9 |
-
|
|
|
|
| 1 |
fastapi==0.95.2
|
| 2 |
uvicorn[standard]==0.22.0
|
| 3 |
+
torch==2.1.0
|
| 4 |
+
torchaudio==2.1.0
|
| 5 |
numpy==1.26.4
|
| 6 |
pillow==10.1.0
|
| 7 |
requests==2.32.2
|
| 8 |
+
huggingface-hub>=0.14.1
|
| 9 |
+
gradio==3.39.0
|