File size: 2,969 Bytes
026659d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: Hindi Voice Cloning (VibeVoice)
emoji: ๐ŸŽ™๏ธ
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---

# ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi Voice Cloning with Emotion

This Hugging Face Space provides **high-quality Hindi Text-to-Speech with voice cloning and expressive emotion**.

Users can upload a short reference voice sample and generate Hindi speech in the **same voice, tone, and emotional style**.

The system is powered by **VibeVoice-7B** with **Hindi LoRA fine-tuning**, optimized for natural prosody and long-form speech.

---

## โœจ Features

- ๐ŸŽ™๏ธ Voice cloning from uploaded reference audio  
- ๐ŸŽญ Emotion & speaking style transfer  
- ๐Ÿ—ฃ๏ธ Natural-sounding Hindi TTS  
- ๐Ÿ“„ Long-form narration support  
- ๐Ÿš€ GPU-accelerated inference  
- ๐ŸŽš๏ธ Expression strength control (CFG scale)

---

## ๐Ÿงช How to Use

1. Enter Hindi text in the text box  
2. Upload a **reference voice (WAV format)**  
3. Adjust **Expression Strength (CFG Scale)**  
4. Click **๐Ÿš€ Generate Voice**  
5. Listen to or download the generated audio  

---

## ๐ŸŽง Reference Voice Guidelines (Very Important)

For best quality voice cloning:

- WAV format only  
- 10โ€“30 seconds duration recommended  
- Single speaker  
- Clear audio, minimal background noise  
- Natural emotion (happy, calm, sad, etc.)

> โš ๏ธ Emotion is copied from the **reference voice**, not from the text.

---

## ๐ŸŽญ Expression Control (CFG Scale)

| CFG Scale | Effect |
|---------|------|
| 0.8 โ€“ 1.0 | Calm / neutral |
| 1.2 โ€“ 1.4 | Natural & expressive (recommended) |
| 1.5 โ€“ 2.0 | Strong emotion (may distort if too high) |

---

## โš ๏ธ System Requirements

- โœ… GPU required  
  - Recommended: A10 / A100 / H100  
- โŒ CPU-only Spaces will not work  
- โณ First run may take time due to model loading

---

## ๐Ÿ” Privacy & Data Handling

- Uploaded voice files are used **only for generation**
- Voice files are overwritten per request
- No permanent storage or reuse of user voices

---

## ๐Ÿšซ Responsible Use Policy

This Space is intended for **research and demonstration purposes only**.

โŒ Do NOT clone voices of real individuals without **explicit consent**  
โŒ Do NOT use for impersonation, fraud, or misinformation  
โŒ Do NOT present generated audio as real recordings  

โœ” Always disclose AI-generated audio when sharing publicly

---

## ๐Ÿง  Model Information

- **Base Model:** VibeVoice-7B  
- **Hindi Fine-Tuning:** Hindi LoRA adapters  
- **Architecture:** LLM + acoustic & semantic tokenizers + diffusion head  
- **Technique:** LoRA (parameter-efficient fine-tuning)

---

## ๐Ÿ“œ License

MIT License  
(Same as the base VibeVoice model and adapters)

---

## ๐Ÿ™ Acknowledgements

- Microsoft Research โ€“ VibeVoice  
- VibeVoice Community  
- Hugging Face Open-Source Ecosystem  

---

### โšก Note
This is a **research/demo Space**, not recommended for production or real-time applications.