91prince commited on
Commit
d056f62
Β·
1 Parent(s): aa7981f

Update README and requirements.txt: add gradio==3.39.0 and improve docs

Browse files
Files changed (2) hide show
  1. README.md +37 -82
  2. requirements.txt +4 -4
README.md CHANGED
@@ -1,96 +1,51 @@
1
- <<<<<<< HEAD
2
- SEAGAN Speech Enhancement & API
3
- ===============================
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  A minimal speech-denoising project built around a SEGAN-style U-Net generator. It includes:
 
6
  - Training script to learn on paired noisy/clean audio.
7
  - Inference pipeline that denoises long clips in chunks and can pack output audio losslessly into PNG.
8
  - FastAPI service to expose denoise + PNG pack/restore endpoints.
 
9
 
10
- Repo Contents
11
- -------------
12
- - `SEGAN.py` – training components: config, dataset, U-Net generator, PatchGAN discriminator, training loop.
13
- - `pipeline.py` – inference utilities: chunked denoiser, spectral gating cleanup, PNG pack/restore helpers.
14
- - `app.py` – FastAPI app wiring the pipeline for HTTP use.
15
- - `seagan_final.pt` – example checkpoint (place your own if different).
16
- - `requirements.txt` – Python dependencies.
17
-
18
- Prerequisites
19
- -------------
20
- - Python 3.9+ (tested with PyTorch CPU/GPU builds).
21
- - For GPU inference/training, install the matching CUDA-enabled `torch`/`torchaudio`.
22
- - FFmpeg is not required; `torchaudio` handles WAV I/O.
23
-
24
- Install
25
- -------
26
- ```bash
27
- python -m venv .venv
28
- source .venv/Scripts/activate # on Windows PowerShell: .\.venv\Scripts\activate
29
- pip install -r requirements.txt
30
- ```
31
- If you need a specific CUDA wheel, install torch/torchaudio first, then run `pip install -r requirements.txt` with `--no-deps`.
32
-
33
- Quick Inference (CLI)
34
- ---------------------
35
- Use the chunked denoiser directly:
36
- ```bash
37
- python pipeline.py --input path/to/noisy.wav --output path/to/denoised.wav --checkpoint seagan_final.pt
38
- ```
39
- Notes:
40
- - `--png-width` controls width when packing to PNG; omit `--no-pack` to also write `*_packed.png` and a reconstructed WAV check.
41
- - The denoiser mirrors/overlaps chunks to reduce seams and optionally runs a spectral subtraction cleanup.
42
 
43
- FastAPI Service
44
- ---------------
45
- Environment variables:
46
- - `CHECKPOINT_PATH` (default `/app/checkpoints/seagan_final.pt`)
47
- - `CHECKPOINT_URL` (optional download at startup)
48
- - `SAMPLE_RATE` (default `16000`)
49
- - `PNG_WIDTH` (default `2048`)
50
 
51
- Run locally:
52
- ```bash
53
- uvicorn app:app --host 0.0.0.0 --port 8000
54
- ```
 
55
 
56
- Endpoints:
57
- - `POST /denoise-and-pack` – form-data key `file` with WAV. Returns packed PNG of denoised audio.
58
- - `POST /restore-from-png` – form-data key `file` with packed PNG. Returns restored WAV.
59
- - `GET /health` – health check.
60
 
61
- Model Training
62
- --------------
63
- `SEGAN.py` trains on paired noisy/clean WAVs. Update `Config.noisy_dir`, `Config.clean_dir`, and `Config.save_dir` to your paths, then run:
64
- ```bash
65
- python SEGAN.py
66
- ```
67
- Checkpoints are written every 5 epochs and as `seagan_final.pt` at the end. The inference pipeline expects a `G_state` entry inside the checkpoint.
68
 
69
- PNG Packing/Restoration Utilities
70
- ---------------------------------
71
- `pipeline.py` exposes:
72
- - `save_audio_as_png_lossless(tensor, png_path, width)` – stores int16 PCM in a lossless PNG.
73
- - `load_audio_from_png_lossless(png_path, original_length)` – restores the tensor.
74
- - `write_wav_from_tensor(tensor, out_wav_path, sr)` – writes mono WAV.
75
 
76
- Tips
77
- ----
78
- - Keep input WAVs mono or they will be averaged to mono.
79
- - Large files are chunked; adjust `chunk_seconds` and `overlap` in `denoise_chunked_final`.
80
- - Ensure the checkpoint matches the model architecture in `SEGAN.py`.
81
- =======
82
- ---
83
- title: SEGAN
84
- emoji: 🏒
85
- colorFrom: green
86
- colorTo: yellow
87
- sdk: gradio
88
- sdk_version: 6.1.0
89
- app_file: app.py
90
- pinned: false
91
- license: apache-2.0
92
- short_description: Remove BackgroundNoise and Generate Image from the Audio
93
  ---
94
 
95
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
96
- >>>>>>> b15accf770b0a139b21a8b09501ce8fd93a23c44
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SEGAN
3
+ emoji: 🏒
4
+ colorFrom: green
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: "6.1.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ short_description: Remove BackgroundNoise and Generate Image from the Audio
12
+ ---
13
+
14
+ # SEAGAN Speech Enhancement & API
15
 
16
  A minimal speech-denoising project built around a SEGAN-style U-Net generator. It includes:
17
+
18
  - Training script to learn on paired noisy/clean audio.
19
  - Inference pipeline that denoises long clips in chunks and can pack output audio losslessly into PNG.
20
  - FastAPI service to expose denoise + PNG pack/restore endpoints.
21
+ - Gradio demo for Hugging Face Spaces (app.py).
22
 
23
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
+ ## Repo Contents
 
 
 
 
 
 
26
 
27
+ - `SEGAN.py` – training components: config, dataset, U-Net generator, PatchGAN discriminator, training loop.
28
+ - `pipeline.py` – inference utilities: chunked denoiser, spectral gating cleanup, PNG pack/restore helpers.
29
+ - `app.py` – Gradio / FastAPI app wiring the pipeline for UI/API use.
30
+ - `checkpoint/seagan_final.pt` – example checkpoint (place your own if different) β€” tracked with git-lfs.
31
+ - `requirements.txt` – Python dependencies.
32
 
33
+ ---
 
 
 
34
 
35
+ ## Prerequisites
 
 
 
 
 
 
36
 
37
+ - Python 3.9+ (tested with PyTorch CPU/GPU builds).
38
+ - For GPU inference/training, install the matching CUDA-enabled `torch`/`torchaudio`.
39
+ - FFmpeg is not required; `torchaudio` handles WAV I/O.
 
 
 
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ---
42
 
43
+ ## Install
44
+
45
+ ```bash
46
+ python -m venv .venv
47
+ # Windows PowerShell:
48
+ .\.venv\Scripts\Activate.ps1
49
+ # or cmd:
50
+ .\.venv\Scripts\activate.bat
51
+ pip install -r requirements.txt
requirements.txt CHANGED
@@ -1,9 +1,9 @@
1
  fastapi==0.95.2
2
  uvicorn[standard]==0.22.0
3
- torch==2.1.0+cpu
4
- torchaudio==2.1.0+cpu
5
  numpy==1.26.4
6
  pillow==10.1.0
7
  requests==2.32.2
8
- boto3==1.30.24
9
- botocore==1.35.24
 
1
  fastapi==0.95.2
2
  uvicorn[standard]==0.22.0
3
+ torch==2.1.0
4
+ torchaudio==2.1.0
5
  numpy==1.26.4
6
  pillow==10.1.0
7
  requests==2.32.2
8
+ huggingface-hub>=0.14.1
9
+ gradio==3.39.0