File size: 2,969 Bytes
026659d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | ---
title: Hindi Voice Cloning (VibeVoice)
emoji: ๐๏ธ
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
# ๐ฎ๐ณ Hindi Voice Cloning with Emotion
This Hugging Face Space provides **high-quality Hindi Text-to-Speech with voice cloning and expressive emotion**.
Users can upload a short reference voice sample and generate Hindi speech in the **same voice, tone, and emotional style**.
The system is powered by **VibeVoice-7B** with **Hindi LoRA fine-tuning**, optimized for natural prosody and long-form speech.
---
## โจ Features
- ๐๏ธ Voice cloning from uploaded reference audio
- ๐ญ Emotion & speaking style transfer
- ๐ฃ๏ธ Natural-sounding Hindi TTS
- ๐ Long-form narration support
- ๐ GPU-accelerated inference
- ๐๏ธ Expression strength control (CFG scale)
---
## ๐งช How to Use
1. Enter Hindi text in the text box
2. Upload a **reference voice (WAV format)**
3. Adjust **Expression Strength (CFG Scale)**
4. Click **๐ Generate Voice**
5. Listen to or download the generated audio
---
## ๐ง Reference Voice Guidelines (Very Important)
For best quality voice cloning:
- WAV format only
- 10โ30 seconds duration recommended
- Single speaker
- Clear audio, minimal background noise
- Natural emotion (happy, calm, sad, etc.)
> โ ๏ธ Emotion is copied from the **reference voice**, not from the text.
---
## ๐ญ Expression Control (CFG Scale)
| CFG Scale | Effect |
|---------|------|
| 0.8 โ 1.0 | Calm / neutral |
| 1.2 โ 1.4 | Natural & expressive (recommended) |
| 1.5 โ 2.0 | Strong emotion (may distort if too high) |
---
## โ ๏ธ System Requirements
- โ
GPU required
- Recommended: A10 / A100 / H100
- โ CPU-only Spaces will not work
- โณ First run may take time due to model loading
---
## ๐ Privacy & Data Handling
- Uploaded voice files are used **only for generation**
- Voice files are overwritten per request
- No permanent storage or reuse of user voices
---
## ๐ซ Responsible Use Policy
This Space is intended for **research and demonstration purposes only**.
โ Do NOT clone voices of real individuals without **explicit consent**
โ Do NOT use for impersonation, fraud, or misinformation
โ Do NOT present generated audio as real recordings
โ Always disclose AI-generated audio when sharing publicly
---
## ๐ง Model Information
- **Base Model:** VibeVoice-7B
- **Hindi Fine-Tuning:** Hindi LoRA adapters
- **Architecture:** LLM + acoustic & semantic tokenizers + diffusion head
- **Technique:** LoRA (parameter-efficient fine-tuning)
---
## ๐ License
MIT License
(Same as the base VibeVoice model and adapters)
---
## ๐ Acknowledgements
- Microsoft Research โ VibeVoice
- VibeVoice Community
- Hugging Face Open-Source Ecosystem
---
### โก Note
This is a **research/demo Space**, not recommended for production or real-time applications. |