SEGAN / README.md
91prince's picture
Update README and requirements.txt: add gradio==3.39.0 and improve docs
d056f62
---
title: SEGAN
emoji: 🏒
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: "6.1.0"
app_file: app.py
pinned: false
license: apache-2.0
short_description: Remove BackgroundNoise and Generate Image from the Audio
---
# SEAGAN Speech Enhancement & API
A minimal speech-denoising project built around a SEGAN-style U-Net generator. It includes:
- Training script to learn on paired noisy/clean audio.
- Inference pipeline that denoises long clips in chunks and can pack output audio losslessly into PNG.
- FastAPI service to expose denoise + PNG pack/restore endpoints.
- Gradio demo for Hugging Face Spaces (app.py).
---
## Repo Contents
- `SEGAN.py` – training components: config, dataset, U-Net generator, PatchGAN discriminator, training loop.
- `pipeline.py` – inference utilities: chunked denoiser, spectral gating cleanup, PNG pack/restore helpers.
- `app.py` – Gradio / FastAPI app wiring the pipeline for UI/API use.
- `checkpoint/seagan_final.pt` – example checkpoint (place your own if different) β€” tracked with git-lfs.
- `requirements.txt` – Python dependencies.
---
## Prerequisites
- Python 3.9+ (tested with PyTorch CPU/GPU builds).
- For GPU inference/training, install the matching CUDA-enabled `torch`/`torchaudio`.
- FFmpeg is not required; `torchaudio` handles WAV I/O.
---
## Install
```bash
python -m venv .venv
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# or cmd:
.\.venv\Scripts\activate.bat
pip install -r requirements.txt