Spaces:
Running on Zero
A newer version of the Gradio SDK is available:
6.9.0
title: Representation Chizzler
sdk: gradio
sdk_version: 5.16.1
app_file: app.py
python_version: '3.10'
packages:
- ffmpeg
hf_oauth: true
๐ง Representation Chizzlerโข
A powerful two-stage audio processing tool that combines Voice Activity Detection (VAD) and Speech Enhancement to clean and denoise audio files.
๐ Features
Two-Stage Processing Pipeline:
- Stage 1: Uses Silero VAD to detect and extract speech segments
- Stage 2: Applies MP-SENet deep learning model to remove noise
Memory-Efficient Processing:
- Processes audio in chunks to prevent memory issues
- Automatically converts audio to the required format (16kHz mono WAV)
User-Friendly Interface:
- Beautiful Gradio web interface
- Real-time progress reporting
- Compare original, VAD-processed, and denoised versions
Dataset Cleaning to Hub:
- Load any HF audio dataset (wav/mp3/flac)
- Process every audio file with Representation Chizzler
- Upload a cleaned dataset with a Representation Chizzler suffix
๐ Installation
Clone this repository:
git clone https://github.com/Reza2kn/RepresentationChizzler.git cd RepresentationChizzlerCreate and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows, use: .venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtSet up environment variables:
- Create a
.envfile in the project root - Add your Hugging Face token:
HF_TOKEN=your_huggingface_token_here
- Create a
MP-SENet files and weights:
- This repo includes a minimal MP-SENet copy plus the pretrained
best_ckpt/g_best_dnsandbest_ckpt/config.jsondownloaded from the official MP-SENet GitHub repository.
Optional alternatives (if you want to swap weights):
- Set
MPSENET_REPOto a Hugging Face repo that containsg_best_dnsandconfig.json(useMPSENET_CKPT_FILENAME/MPSENET_CONFIG_FILENAMEif the filenames differ).
- This repo includes a minimal MP-SENet copy plus the pretrained
๐ฎ Usage
Run the app:
python app.pyOpen your web browser and navigate to the provided URL
Upload an audio file and adjust the parameters:
- VAD Threshold: Controls voice detection sensitivity (0.1-0.9)
- Max Silence Gap: Controls merging of close speech segments (1-10s)
- Normalize volume: Boosts quiet samples and gently attenuates loud ones
Compare the results:
- Original Audio
- VAD Processed (Speech Only)
- Final Denoised
๐ฆ Dataset Cleaning (Hugging Face Hub)
Use the "Dataset to Hub" tab to process any HF dataset that includes audio files (wav, mp3, flac).
Inputs:
- Dataset ID or URL (defaults to
kiarashQ/farsi-asr-unified-cleaned) - Optional config and split (use
allto process every split) - Optional audio column (auto-detected if left empty)
- Optional output dataset repo (defaults to
{username}/{dataset}-representation-chizzler) - Resume from cached shards to continue long runs without restarting
- Normalize volume to raise quiet clips and reduce overly loud clips
Requirements:
HF_TOKENmust be set so the app can download private datasets and push the cleaned dataset to your account.
Output notes:
- The cleaned dataset adds
chizzler_ok(bool) andchizzler_error(string) columns for per-row error tracking. - Cached shards are stored under
chizzler_cache/(configurable viaCHIZZLER_CACHE_DIR). - On ZeroGPU, enable "Cache shards on Hub" so resume works across GPU workers. ZeroGPU is preemptible; for uninterrupted runs use a dedicated GPU Space.
โ๏ธ Hugging Face Space (Zero GPU)
This repo is Space-ready. Create a Zero GPU Space and:
- Set the Space secret
HF_TOKEN. - MP-SENet files are already bundled in the repo:
MP-SENet/best_ckpt/g_best_dnsMP-SENet/best_ckpt/config.json
- Launch uses
app.pyautomatically.
๐ ๏ธ Parameters
VAD Threshold (0.1-0.9):
- Higher values = stricter voice detection
- Lower values = more lenient detection
- Default: 0.5
Max Silence Gap (1-10s):
- Maximum silence duration to consider segments as continuous
- Higher values = fewer segments but may include more silence
- Default: 4.0s
๐ Credits
This project combines two powerful models:
- Silero VAD for Voice Activity Detection
- MP-SENet for Speech Enhancement
๐ License
This project is licensed under the terms specified in the MP-SENet repository.