Spaces:
Running on Zero
Running on Zero
| title: Representation Chizzler | |
| sdk: gradio | |
| sdk_version: 5.16.1 | |
| app_file: app.py | |
| python_version: "3.10" | |
| packages: | |
| - ffmpeg | |
| hf_oauth: true | |
| # 🎧 Representation Chizzler™ | |
| A powerful two-stage audio processing tool that combines Voice Activity Detection (VAD) and Speech Enhancement to clean and denoise audio files. | |
| ## 🌟 Features | |
| 1. **Two-Stage Processing Pipeline**: | |
| - Stage 1: Uses Silero VAD to detect and extract speech segments | |
| - Stage 2: Applies MP-SENet deep learning model to remove noise | |
| 2. **Memory-Efficient Processing**: | |
| - Processes audio in chunks to prevent memory issues | |
| - Automatically converts audio to the required format (16kHz mono WAV) | |
| 3. **User-Friendly Interface**: | |
| - Beautiful Gradio web interface | |
| - Real-time progress reporting | |
| - Compare original, VAD-processed, and denoised versions | |
| 4. **Dataset Cleaning to Hub**: | |
| - Load any HF audio dataset (wav/mp3/flac) | |
| - Process every audio file with Representation Chizzler | |
| - Upload a cleaned dataset with a Representation Chizzler suffix | |
| ## 🚀 Installation | |
| 1. Clone this repository: | |
| ```bash | |
| git clone https://github.com/Reza2kn/RepresentationChizzler.git | |
| cd RepresentationChizzler | |
| ``` | |
| 2. Create and activate a virtual environment: | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate # On Windows, use: .venv\Scripts\activate | |
| ``` | |
| 3. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 4. Set up environment variables: | |
| - Create a `.env` file in the project root | |
| - Add your Hugging Face token: | |
| ``` | |
| HF_TOKEN=your_huggingface_token_here | |
| ``` | |
| 5. MP-SENet files and weights: | |
| - This repo includes a minimal MP-SENet copy plus the pretrained | |
| `best_ckpt/g_best_dns` and `best_ckpt/config.json` downloaded from the | |
| official MP-SENet GitHub repository. | |
| Optional alternatives (if you want to swap weights): | |
| - Set `MPSENET_REPO` to a Hugging Face repo that contains `g_best_dns` and | |
| `config.json` (use `MPSENET_CKPT_FILENAME` / `MPSENET_CONFIG_FILENAME` if | |
| the filenames differ). | |
| ## 🎮 Usage | |
| 1. Run the app: | |
| ```bash | |
| python app.py | |
| ``` | |
| 2. Open your web browser and navigate to the provided URL | |
| 3. Upload an audio file and adjust the parameters: | |
| - VAD Threshold: Controls voice detection sensitivity (0.1-0.9) | |
| - Max Silence Gap: Controls merging of close speech segments (1-10s) | |
| - Normalize volume: Boosts quiet samples and gently attenuates loud ones | |
| 4. Compare the results: | |
| - Original Audio | |
| - VAD Processed (Speech Only) | |
| - Final Denoised | |
| ## 📦 Dataset Cleaning (Hugging Face Hub) | |
| Use the "Dataset to Hub" tab to process any HF dataset that includes audio | |
| files (wav, mp3, flac). | |
| Inputs: | |
| - Dataset ID or URL (defaults to `kiarashQ/farsi-asr-unified-cleaned`) | |
| - Optional config and split (use `all` to process every split) | |
| - Optional audio column (auto-detected if left empty) | |
| - Optional output dataset repo (defaults to `{username}/{dataset}-representation-chizzler`) | |
| - Resume from cached shards to continue long runs without restarting | |
| - Normalize volume to raise quiet clips and reduce overly loud clips | |
| Requirements: | |
| - `HF_TOKEN` must be set so the app can download private datasets and push the | |
| cleaned dataset to your account. | |
| Output notes: | |
| - The cleaned dataset adds `chizzler_ok` (bool) and `chizzler_error` (string) | |
| columns for per-row error tracking. | |
| - Cached shards are stored under `chizzler_cache/` (configurable via | |
| `CHIZZLER_CACHE_DIR`). | |
| - On ZeroGPU, enable "Cache shards on Hub" so resume works across GPU workers. | |
| ZeroGPU is preemptible; for uninterrupted runs use a dedicated GPU Space. | |
| ## ☁️ Hugging Face Space (Zero GPU) | |
| This repo is Space-ready. Create a Zero GPU Space and: | |
| 1. Set the Space secret `HF_TOKEN`. | |
| 2. MP-SENet files are already bundled in the repo: | |
| - `MP-SENet/best_ckpt/g_best_dns` | |
| - `MP-SENet/best_ckpt/config.json` | |
| 3. Launch uses `app.py` automatically. | |
| ## 🛠️ Parameters | |
| - **VAD Threshold** (0.1-0.9): | |
| - Higher values = stricter voice detection | |
| - Lower values = more lenient detection | |
| - Default: 0.5 | |
| - **Max Silence Gap** (1-10s): | |
| - Maximum silence duration to consider segments as continuous | |
| - Higher values = fewer segments but may include more silence | |
| - Default: 4.0s | |
| ## 🙏 Credits | |
| This project combines two powerful models: | |
| - [Silero VAD](https://github.com/snakers4/silero-vad) for Voice Activity Detection | |
| - [MP-SENet](https://github.com/yxlu-0102/MP-SENet) for Speech Enhancement | |
| ## 📝 License | |
| This project is licensed under the terms specified in the MP-SENet repository. | |