ind / README.md
12labs's picture
Upload 3 files
026659d verified
---
title: Hindi Voice Cloning (VibeVoice)
emoji: ๐ŸŽ™๏ธ
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
# ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi Voice Cloning with Emotion
This Hugging Face Space provides **high-quality Hindi Text-to-Speech with voice cloning and expressive emotion**.
Users can upload a short reference voice sample and generate Hindi speech in the **same voice, tone, and emotional style**.
The system is powered by **VibeVoice-7B** with **Hindi LoRA fine-tuning**, optimized for natural prosody and long-form speech.
---
## โœจ Features
- ๐ŸŽ™๏ธ Voice cloning from uploaded reference audio
- ๐ŸŽญ Emotion & speaking style transfer
- ๐Ÿ—ฃ๏ธ Natural-sounding Hindi TTS
- ๐Ÿ“„ Long-form narration support
- ๐Ÿš€ GPU-accelerated inference
- ๐ŸŽš๏ธ Expression strength control (CFG scale)
---
## ๐Ÿงช How to Use
1. Enter Hindi text in the text box
2. Upload a **reference voice (WAV format)**
3. Adjust **Expression Strength (CFG Scale)**
4. Click **๐Ÿš€ Generate Voice**
5. Listen to or download the generated audio
---
## ๐ŸŽง Reference Voice Guidelines (Very Important)
For best quality voice cloning:
- WAV format only
- 10โ€“30 seconds duration recommended
- Single speaker
- Clear audio, minimal background noise
- Natural emotion (happy, calm, sad, etc.)
> โš ๏ธ Emotion is copied from the **reference voice**, not from the text.
---
## ๐ŸŽญ Expression Control (CFG Scale)
| CFG Scale | Effect |
|---------|------|
| 0.8 โ€“ 1.0 | Calm / neutral |
| 1.2 โ€“ 1.4 | Natural & expressive (recommended) |
| 1.5 โ€“ 2.0 | Strong emotion (may distort if too high) |
---
## โš ๏ธ System Requirements
- โœ… GPU required
- Recommended: A10 / A100 / H100
- โŒ CPU-only Spaces will not work
- โณ First run may take time due to model loading
---
## ๐Ÿ” Privacy & Data Handling
- Uploaded voice files are used **only for generation**
- Voice files are overwritten per request
- No permanent storage or reuse of user voices
---
## ๐Ÿšซ Responsible Use Policy
This Space is intended for **research and demonstration purposes only**.
โŒ Do NOT clone voices of real individuals without **explicit consent**
โŒ Do NOT use for impersonation, fraud, or misinformation
โŒ Do NOT present generated audio as real recordings
โœ” Always disclose AI-generated audio when sharing publicly
---
## ๐Ÿง  Model Information
- **Base Model:** VibeVoice-7B
- **Hindi Fine-Tuning:** Hindi LoRA adapters
- **Architecture:** LLM + acoustic & semantic tokenizers + diffusion head
- **Technique:** LoRA (parameter-efficient fine-tuning)
---
## ๐Ÿ“œ License
MIT License
(Same as the base VibeVoice model and adapters)
---
## ๐Ÿ™ Acknowledgements
- Microsoft Research โ€“ VibeVoice
- VibeVoice Community
- Hugging Face Open-Source Ecosystem
---
### โšก Note
This is a **research/demo Space**, not recommended for production or real-time applications.