--- title: Representation Chizzler sdk: gradio sdk_version: 5.16.1 app_file: app.py python_version: "3.10" packages: - ffmpeg hf_oauth: true --- # ๐ŸŽง Representation Chizzlerโ„ข A powerful two-stage audio processing tool that combines Voice Activity Detection (VAD) and Speech Enhancement to clean and denoise audio files. ## ๐ŸŒŸ Features 1. **Two-Stage Processing Pipeline**: - Stage 1: Uses Silero VAD to detect and extract speech segments - Stage 2: Applies MP-SENet deep learning model to remove noise 2. **Memory-Efficient Processing**: - Processes audio in chunks to prevent memory issues - Automatically converts audio to the required format (16kHz mono WAV) 3. **User-Friendly Interface**: - Beautiful Gradio web interface - Real-time progress reporting - Compare original, VAD-processed, and denoised versions 4. **Dataset Cleaning to Hub**: - Load any HF audio dataset (wav/mp3/flac) - Process every audio file with Representation Chizzler - Upload a cleaned dataset with a Representation Chizzler suffix ## ๐Ÿš€ Installation 1. Clone this repository: ```bash git clone https://github.com/Reza2kn/RepresentationChizzler.git cd RepresentationChizzler ``` 2. Create and activate a virtual environment: ```bash python -m venv .venv source .venv/bin/activate # On Windows, use: .venv\Scripts\activate ``` 3. Install dependencies: ```bash pip install -r requirements.txt ``` 4. Set up environment variables: - Create a `.env` file in the project root - Add your Hugging Face token: ``` HF_TOKEN=your_huggingface_token_here ``` 5. MP-SENet files and weights: - This repo includes a minimal MP-SENet copy plus the pretrained `best_ckpt/g_best_dns` and `best_ckpt/config.json` downloaded from the official MP-SENet GitHub repository. Optional alternatives (if you want to swap weights): - Set `MPSENET_REPO` to a Hugging Face repo that contains `g_best_dns` and `config.json` (use `MPSENET_CKPT_FILENAME` / `MPSENET_CONFIG_FILENAME` if the filenames differ). ## ๐ŸŽฎ Usage 1. Run the app: ```bash python app.py ``` 2. Open your web browser and navigate to the provided URL 3. Upload an audio file and adjust the parameters: - VAD Threshold: Controls voice detection sensitivity (0.1-0.9) - Max Silence Gap: Controls merging of close speech segments (1-10s) - Normalize volume: Boosts quiet samples and gently attenuates loud ones 4. Compare the results: - Original Audio - VAD Processed (Speech Only) - Final Denoised ## ๐Ÿ“ฆ Dataset Cleaning (Hugging Face Hub) Use the "Dataset to Hub" tab to process any HF dataset that includes audio files (wav, mp3, flac). Inputs: - Dataset ID or URL (defaults to `kiarashQ/farsi-asr-unified-cleaned`) - Optional config and split (use `all` to process every split) - Optional audio column (auto-detected if left empty) - Optional output dataset repo (defaults to `{username}/{dataset}-representation-chizzler`) - Resume from cached shards to continue long runs without restarting - Normalize volume to raise quiet clips and reduce overly loud clips Requirements: - `HF_TOKEN` must be set so the app can download private datasets and push the cleaned dataset to your account. Output notes: - The cleaned dataset adds `chizzler_ok` (bool) and `chizzler_error` (string) columns for per-row error tracking. - Cached shards are stored under `chizzler_cache/` (configurable via `CHIZZLER_CACHE_DIR`). - On ZeroGPU, enable "Cache shards on Hub" so resume works across GPU workers. ZeroGPU is preemptible; for uninterrupted runs use a dedicated GPU Space. ## โ˜๏ธ Hugging Face Space (Zero GPU) This repo is Space-ready. Create a Zero GPU Space and: 1. Set the Space secret `HF_TOKEN`. 2. MP-SENet files are already bundled in the repo: - `MP-SENet/best_ckpt/g_best_dns` - `MP-SENet/best_ckpt/config.json` 3. Launch uses `app.py` automatically. ## ๐Ÿ› ๏ธ Parameters - **VAD Threshold** (0.1-0.9): - Higher values = stricter voice detection - Lower values = more lenient detection - Default: 0.5 - **Max Silence Gap** (1-10s): - Maximum silence duration to consider segments as continuous - Higher values = fewer segments but may include more silence - Default: 4.0s ## ๐Ÿ™ Credits This project combines two powerful models: - [Silero VAD](https://github.com/snakers4/silero-vad) for Voice Activity Detection - [MP-SENet](https://github.com/yxlu-0102/MP-SENet) for Speech Enhancement ## ๐Ÿ“ License This project is licensed under the terms specified in the MP-SENet repository.