Spaces:

12labs
/

ind

Runtime error

App Files Files Community

ind / README.md

12labs

Upload 3 files

026659d verified about 2 months ago

preview code

raw

history blame contribute delete

2.97 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: Hindi Voice Cloning (VibeVoice)
emoji: 🎙️
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

🇮🇳 Hindi Voice Cloning with Emotion

This Hugging Face Space provides high-quality Hindi Text-to-Speech with voice cloning and expressive emotion.

Users can upload a short reference voice sample and generate Hindi speech in the same voice, tone, and emotional style.

The system is powered by VibeVoice-7B with Hindi LoRA fine-tuning, optimized for natural prosody and long-form speech.

✨ Features

🎙️ Voice cloning from uploaded reference audio
🎭 Emotion & speaking style transfer
🗣️ Natural-sounding Hindi TTS
📄 Long-form narration support
🚀 GPU-accelerated inference
🎚️ Expression strength control (CFG scale)

🧪 How to Use

Enter Hindi text in the text box
Upload a reference voice (WAV format)
Adjust Expression Strength (CFG Scale)
Click 🚀 Generate Voice
Listen to or download the generated audio

🎧 Reference Voice Guidelines (Very Important)

For best quality voice cloning:

WAV format only
10–30 seconds duration recommended
Single speaker
Clear audio, minimal background noise
Natural emotion (happy, calm, sad, etc.)

⚠️ Emotion is copied from the reference voice, not from the text.

🎭 Expression Control (CFG Scale)

CFG Scale	Effect
0.8 – 1.0	Calm / neutral
1.2 – 1.4	Natural & expressive (recommended)
1.5 – 2.0	Strong emotion (may distort if too high)

⚠️ System Requirements

✅ GPU required
- Recommended: A10 / A100 / H100
❌ CPU-only Spaces will not work
⏳ First run may take time due to model loading

🔐 Privacy & Data Handling

Uploaded voice files are used only for generation
Voice files are overwritten per request
No permanent storage or reuse of user voices

🚫 Responsible Use Policy

This Space is intended for research and demonstration purposes only.

❌ Do NOT clone voices of real individuals without explicit consent
❌ Do NOT use for impersonation, fraud, or misinformation
❌ Do NOT present generated audio as real recordings

✔ Always disclose AI-generated audio when sharing publicly

🧠 Model Information

Base Model: VibeVoice-7B
Hindi Fine-Tuning: Hindi LoRA adapters
Architecture: LLM + acoustic & semantic tokenizers + diffusion head
Technique: LoRA (parameter-efficient fine-tuning)

📜 License

MIT License
(Same as the base VibeVoice model and adapters)

🙏 Acknowledgements

Microsoft Research – VibeVoice
VibeVoice Community
Hugging Face Open-Source Ecosystem

⚡ Note

This is a research/demo Space, not recommended for production or real-time applications.