Spaces:

Reza2kn
/

representation-chizzler

Running on Zero

App Files Files Community

representation-chizzler / README.md

Reza2kn

Upgrade Gradio for ZeroGPU auth handshake

69c2d55 verified about 2 months ago

preview code

raw

history blame contribute delete

4.62 kB

	---
	title: Representation Chizzler
	sdk: gradio
	sdk_version: 5.16.1
	app_file: app.py
	python_version: "3.10"
	packages:
	- ffmpeg
	hf_oauth: true
	---

	# 🎧 Representation Chizzler™

	A powerful two-stage audio processing tool that combines Voice Activity Detection (VAD) and Speech Enhancement to clean and denoise audio files.

	## 🌟 Features

	1. Two-Stage Processing Pipeline:
	- Stage 1: Uses Silero VAD to detect and extract speech segments
	- Stage 2: Applies MP-SENet deep learning model to remove noise

	2. Memory-Efficient Processing:
	- Processes audio in chunks to prevent memory issues
	- Automatically converts audio to the required format (16kHz mono WAV)

	3. User-Friendly Interface:
	- Beautiful Gradio web interface
	- Real-time progress reporting
	- Compare original, VAD-processed, and denoised versions

	4. Dataset Cleaning to Hub:
	- Load any HF audio dataset (wav/mp3/flac)
	- Process every audio file with Representation Chizzler
	- Upload a cleaned dataset with a Representation Chizzler suffix

	## 🚀 Installation

	1. Clone this repository:
	```bash
	git clone https://github.com/Reza2kn/RepresentationChizzler.git
	cd RepresentationChizzler
	```

	2. Create and activate a virtual environment:
	```bash
	python -m venv .venv
	source .venv/bin/activate # On Windows, use: .venv\Scripts\activate
	```

	3. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	4. Set up environment variables:
	- Create a `.env` file in the project root
	- Add your Hugging Face token:
	```
	HF_TOKEN=your_huggingface_token_here
	```

	5. MP-SENet files and weights:
	- This repo includes a minimal MP-SENet copy plus the pretrained
	`best_ckpt/g_best_dns` and `best_ckpt/config.json` downloaded from the
	official MP-SENet GitHub repository.

	Optional alternatives (if you want to swap weights):
	- Set `MPSENET_REPO` to a Hugging Face repo that contains `g_best_dns` and
	`config.json` (use `MPSENET_CKPT_FILENAME` / `MPSENET_CONFIG_FILENAME` if
	the filenames differ).

	## 🎮 Usage

	1. Run the app:
	```bash
	python app.py
	```

	2. Open your web browser and navigate to the provided URL

	3. Upload an audio file and adjust the parameters:
	- VAD Threshold: Controls voice detection sensitivity (0.1-0.9)
	- Max Silence Gap: Controls merging of close speech segments (1-10s)
	- Normalize volume: Boosts quiet samples and gently attenuates loud ones

	4. Compare the results:
	- Original Audio
	- VAD Processed (Speech Only)
	- Final Denoised

	## 📦 Dataset Cleaning (Hugging Face Hub)

	Use the "Dataset to Hub" tab to process any HF dataset that includes audio
	files (wav, mp3, flac).

	Inputs:
	- Dataset ID or URL (defaults to `kiarashQ/farsi-asr-unified-cleaned`)
	- Optional config and split (use `all` to process every split)
	- Optional audio column (auto-detected if left empty)
	- Optional output dataset repo (defaults to `{username}/{dataset}-representation-chizzler`)
	- Resume from cached shards to continue long runs without restarting
	- Normalize volume to raise quiet clips and reduce overly loud clips

	Requirements:
	- `HF_TOKEN` must be set so the app can download private datasets and push the
	cleaned dataset to your account.

	Output notes:
	- The cleaned dataset adds `chizzler_ok` (bool) and `chizzler_error` (string)
	columns for per-row error tracking.
	- Cached shards are stored under `chizzler_cache/` (configurable via
	`CHIZZLER_CACHE_DIR`).
	- On ZeroGPU, enable "Cache shards on Hub" so resume works across GPU workers.
	ZeroGPU is preemptible; for uninterrupted runs use a dedicated GPU Space.

	## ☁️ Hugging Face Space (Zero GPU)

	This repo is Space-ready. Create a Zero GPU Space and:
	1. Set the Space secret `HF_TOKEN`.
	2. MP-SENet files are already bundled in the repo:
	- `MP-SENet/best_ckpt/g_best_dns`
	- `MP-SENet/best_ckpt/config.json`
	3. Launch uses `app.py` automatically.

	## 🛠️ Parameters

	- VAD Threshold (0.1-0.9):
	- Higher values = stricter voice detection
	- Lower values = more lenient detection
	- Default: 0.5

	- Max Silence Gap (1-10s):
	- Maximum silence duration to consider segments as continuous
	- Higher values = fewer segments but may include more silence
	- Default: 4.0s

	## 🙏 Credits

	This project combines two powerful models:
	- [Silero VAD](https://github.com/snakers4/silero-vad) for Voice Activity Detection
	- [MP-SENet](https://github.com/yxlu-0102/MP-SENet) for Speech Enhancement

	## 📝 License

	This project is licensed under the terms specified in the MP-SENet repository.