| --- |
| title: Hindi Voice Cloning (VibeVoice) |
| emoji: ๐๏ธ |
| colorFrom: red |
| colorTo: purple |
| sdk: gradio |
| sdk_version: "4.44.0" |
| app_file: app.py |
| pinned: false |
| --- |
| |
| # ๐ฎ๐ณ Hindi Voice Cloning with Emotion |
|
|
| This Hugging Face Space provides **high-quality Hindi Text-to-Speech with voice cloning and expressive emotion**. |
|
|
| Users can upload a short reference voice sample and generate Hindi speech in the **same voice, tone, and emotional style**. |
|
|
| The system is powered by **VibeVoice-7B** with **Hindi LoRA fine-tuning**, optimized for natural prosody and long-form speech. |
|
|
| --- |
|
|
| ## โจ Features |
|
|
| - ๐๏ธ Voice cloning from uploaded reference audio |
| - ๐ญ Emotion & speaking style transfer |
| - ๐ฃ๏ธ Natural-sounding Hindi TTS |
| - ๐ Long-form narration support |
| - ๐ GPU-accelerated inference |
| - ๐๏ธ Expression strength control (CFG scale) |
|
|
| --- |
|
|
| ## ๐งช How to Use |
|
|
| 1. Enter Hindi text in the text box |
| 2. Upload a **reference voice (WAV format)** |
| 3. Adjust **Expression Strength (CFG Scale)** |
| 4. Click **๐ Generate Voice** |
| 5. Listen to or download the generated audio |
|
|
| --- |
|
|
| ## ๐ง Reference Voice Guidelines (Very Important) |
|
|
| For best quality voice cloning: |
|
|
| - WAV format only |
| - 10โ30 seconds duration recommended |
| - Single speaker |
| - Clear audio, minimal background noise |
| - Natural emotion (happy, calm, sad, etc.) |
|
|
| > โ ๏ธ Emotion is copied from the **reference voice**, not from the text. |
|
|
| --- |
|
|
| ## ๐ญ Expression Control (CFG Scale) |
|
|
| | CFG Scale | Effect | |
| |---------|------| |
| | 0.8 โ 1.0 | Calm / neutral | |
| | 1.2 โ 1.4 | Natural & expressive (recommended) | |
| | 1.5 โ 2.0 | Strong emotion (may distort if too high) | |
|
|
| --- |
|
|
| ## โ ๏ธ System Requirements |
|
|
| - โ
GPU required |
| - Recommended: A10 / A100 / H100 |
| - โ CPU-only Spaces will not work |
| - โณ First run may take time due to model loading |
|
|
| --- |
|
|
| ## ๐ Privacy & Data Handling |
|
|
| - Uploaded voice files are used **only for generation** |
| - Voice files are overwritten per request |
| - No permanent storage or reuse of user voices |
|
|
| --- |
|
|
| ## ๐ซ Responsible Use Policy |
|
|
| This Space is intended for **research and demonstration purposes only**. |
|
|
| โ Do NOT clone voices of real individuals without **explicit consent** |
| โ Do NOT use for impersonation, fraud, or misinformation |
| โ Do NOT present generated audio as real recordings |
|
|
| โ Always disclose AI-generated audio when sharing publicly |
|
|
| --- |
|
|
| ## ๐ง Model Information |
|
|
| - **Base Model:** VibeVoice-7B |
| - **Hindi Fine-Tuning:** Hindi LoRA adapters |
| - **Architecture:** LLM + acoustic & semantic tokenizers + diffusion head |
| - **Technique:** LoRA (parameter-efficient fine-tuning) |
|
|
| --- |
|
|
| ## ๐ License |
|
|
| MIT License |
| (Same as the base VibeVoice model and adapters) |
|
|
| --- |
|
|
| ## ๐ Acknowledgements |
|
|
| - Microsoft Research โ VibeVoice |
| - VibeVoice Community |
| - Hugging Face Open-Source Ecosystem |
|
|
| --- |
|
|
| ### โก Note |
| This is a **research/demo Space**, not recommended for production or real-time applications. |