A newer version of the Gradio SDK is available: 6.9.0
title: Hindi Voice Cloning (VibeVoice)
emoji: ๐๏ธ
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
๐ฎ๐ณ Hindi Voice Cloning with Emotion
This Hugging Face Space provides high-quality Hindi Text-to-Speech with voice cloning and expressive emotion.
Users can upload a short reference voice sample and generate Hindi speech in the same voice, tone, and emotional style.
The system is powered by VibeVoice-7B with Hindi LoRA fine-tuning, optimized for natural prosody and long-form speech.
โจ Features
- ๐๏ธ Voice cloning from uploaded reference audio
- ๐ญ Emotion & speaking style transfer
- ๐ฃ๏ธ Natural-sounding Hindi TTS
- ๐ Long-form narration support
- ๐ GPU-accelerated inference
- ๐๏ธ Expression strength control (CFG scale)
๐งช How to Use
- Enter Hindi text in the text box
- Upload a reference voice (WAV format)
- Adjust Expression Strength (CFG Scale)
- Click ๐ Generate Voice
- Listen to or download the generated audio
๐ง Reference Voice Guidelines (Very Important)
For best quality voice cloning:
- WAV format only
- 10โ30 seconds duration recommended
- Single speaker
- Clear audio, minimal background noise
- Natural emotion (happy, calm, sad, etc.)
โ ๏ธ Emotion is copied from the reference voice, not from the text.
๐ญ Expression Control (CFG Scale)
| CFG Scale | Effect |
|---|---|
| 0.8 โ 1.0 | Calm / neutral |
| 1.2 โ 1.4 | Natural & expressive (recommended) |
| 1.5 โ 2.0 | Strong emotion (may distort if too high) |
โ ๏ธ System Requirements
- โ
GPU required
- Recommended: A10 / A100 / H100
- โ CPU-only Spaces will not work
- โณ First run may take time due to model loading
๐ Privacy & Data Handling
- Uploaded voice files are used only for generation
- Voice files are overwritten per request
- No permanent storage or reuse of user voices
๐ซ Responsible Use Policy
This Space is intended for research and demonstration purposes only.
โ Do NOT clone voices of real individuals without explicit consent
โ Do NOT use for impersonation, fraud, or misinformation
โ Do NOT present generated audio as real recordings
โ Always disclose AI-generated audio when sharing publicly
๐ง Model Information
- Base Model: VibeVoice-7B
- Hindi Fine-Tuning: Hindi LoRA adapters
- Architecture: LLM + acoustic & semantic tokenizers + diffusion head
- Technique: LoRA (parameter-efficient fine-tuning)
๐ License
MIT License
(Same as the base VibeVoice model and adapters)
๐ Acknowledgements
- Microsoft Research โ VibeVoice
- VibeVoice Community
- Hugging Face Open-Source Ecosystem
โก Note
This is a research/demo Space, not recommended for production or real-time applications.