---
title: SEGAN
emoji: 🏢
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: "6.1.0"
app_file: app.py
pinned: false
license: apache-2.0
short_description: Remove BackgroundNoise and Generate Image from the Audio
---

# SEAGAN Speech Enhancement & API

A minimal speech-denoising project built around a SEGAN-style U-Net generator. It includes:

- Training script to learn on paired noisy/clean audio.
- Inference pipeline that denoises long clips in chunks and can pack output audio losslessly into PNG.
- FastAPI service to expose denoise + PNG pack/restore endpoints.
- Gradio demo for Hugging Face Spaces (app.py).

---

## Repo Contents

- `SEGAN.py` – training components: config, dataset, U-Net generator, PatchGAN discriminator, training loop.  
- `pipeline.py` – inference utilities: chunked denoiser, spectral gating cleanup, PNG pack/restore helpers.  
- `app.py` – Gradio / FastAPI app wiring the pipeline for UI/API use.  
- `checkpoint/seagan_final.pt` – example checkpoint (place your own if different) — tracked with git-lfs.  
- `requirements.txt` – Python dependencies.

---

## Prerequisites

- Python 3.9+ (tested with PyTorch CPU/GPU builds).  
- For GPU inference/training, install the matching CUDA-enabled `torch`/`torchaudio`.  
- FFmpeg is not required; `torchaudio` handles WAV I/O.

---

## Install

```bash
python -m venv .venv
# Windows PowerShell:
.\.venv\Scripts\Activate.ps1
# or cmd:
.\.venv\Scripts\activate.bat
pip install -r requirements.txt