Spaces:

12labs
/

ind

Runtime error

File size: 2,969 Bytes

026659d

---
title: Hindi Voice Cloning (VibeVoice)
emoji: 🎙️
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---

# 🇮🇳 Hindi Voice Cloning with Emotion

This Hugging Face Space provides **high-quality Hindi Text-to-Speech with voice cloning and expressive emotion**.

Users can upload a short reference voice sample and generate Hindi speech in the **same voice, tone, and emotional style**.

The system is powered by **VibeVoice-7B** with **Hindi LoRA fine-tuning**, optimized for natural prosody and long-form speech.

---

## ✨ Features

- 🎙️ Voice cloning from uploaded reference audio  
- 🎭 Emotion & speaking style transfer  
- 🗣️ Natural-sounding Hindi TTS  
- 📄 Long-form narration support  
- 🚀 GPU-accelerated inference  
- 🎚️ Expression strength control (CFG scale)

---

## 🧪 How to Use

1. Enter Hindi text in the text box  
2. Upload a **reference voice (WAV format)**  
3. Adjust **Expression Strength (CFG Scale)**  
4. Click **🚀 Generate Voice**  
5. Listen to or download the generated audio  

---

## 🎧 Reference Voice Guidelines (Very Important)

For best quality voice cloning:

- WAV format only  
- 10–30 seconds duration recommended  
- Single speaker  
- Clear audio, minimal background noise  
- Natural emotion (happy, calm, sad, etc.)

> ⚠️ Emotion is copied from the **reference voice**, not from the text.

---

## 🎭 Expression Control (CFG Scale)

| CFG Scale | Effect |
|---------|------|
| 0.8 – 1.0 | Calm / neutral |
| 1.2 – 1.4 | Natural & expressive (recommended) |
| 1.5 – 2.0 | Strong emotion (may distort if too high) |

---

## ⚠️ System Requirements

- ✅ GPU required  
  - Recommended: A10 / A100 / H100  
- ❌ CPU-only Spaces will not work  
- ⏳ First run may take time due to model loading

---

## 🔐 Privacy & Data Handling

- Uploaded voice files are used **only for generation**
- Voice files are overwritten per request
- No permanent storage or reuse of user voices

---

## 🚫 Responsible Use Policy

This Space is intended for **research and demonstration purposes only**.

❌ Do NOT clone voices of real individuals without **explicit consent**  
❌ Do NOT use for impersonation, fraud, or misinformation  
❌ Do NOT present generated audio as real recordings  

✔ Always disclose AI-generated audio when sharing publicly

---

## 🧠 Model Information

- **Base Model:** VibeVoice-7B  
- **Hindi Fine-Tuning:** Hindi LoRA adapters  
- **Architecture:** LLM + acoustic & semantic tokenizers + diffusion head  
- **Technique:** LoRA (parameter-efficient fine-tuning)

---

## 📜 License

MIT License  
(Same as the base VibeVoice model and adapters)

---

## 🙏 Acknowledgements

- Microsoft Research – VibeVoice  
- VibeVoice Community  
- Hugging Face Open-Source Ecosystem  

---

### ⚡ Note
This is a **research/demo Space**, not recommended for production or real-time applications.