Spaces:

12labs
/

ind

Runtime error

App Files Files Community

ind / README.md

12labs

Upload 3 files

026659d verified 2 months ago

preview code

raw

history blame contribute delete

2.97 kB

	---
	title: Hindi Voice Cloning (VibeVoice)
	emoji: 🎙️
	colorFrom: red
	colorTo: purple
	sdk: gradio
	sdk_version: "4.44.0"
	app_file: app.py
	pinned: false
	---

	# 🇮🇳 Hindi Voice Cloning with Emotion

	This Hugging Face Space provides high-quality Hindi Text-to-Speech with voice cloning and expressive emotion.

	Users can upload a short reference voice sample and generate Hindi speech in the same voice, tone, and emotional style.

	The system is powered by VibeVoice-7B with Hindi LoRA fine-tuning, optimized for natural prosody and long-form speech.

	---

	## ✨ Features

	- 🎙️ Voice cloning from uploaded reference audio
	- 🎭 Emotion & speaking style transfer
	- 🗣️ Natural-sounding Hindi TTS
	- 📄 Long-form narration support
	- 🚀 GPU-accelerated inference
	- 🎚️ Expression strength control (CFG scale)

	---

	## 🧪 How to Use

	1. Enter Hindi text in the text box
	2. Upload a reference voice (WAV format)
	3. Adjust Expression Strength (CFG Scale)
	4. Click 🚀 Generate Voice
	5. Listen to or download the generated audio

	---

	## 🎧 Reference Voice Guidelines (Very Important)

	For best quality voice cloning:

	- WAV format only
	- 10–30 seconds duration recommended
	- Single speaker
	- Clear audio, minimal background noise
	- Natural emotion (happy, calm, sad, etc.)

	> ⚠️ Emotion is copied from the reference voice, not from the text.

	---

	## 🎭 Expression Control (CFG Scale)

	\| CFG Scale \| Effect \|
	\|---------\|------\|
	\| 0.8 – 1.0 \| Calm / neutral \|
	\| 1.2 – 1.4 \| Natural & expressive (recommended) \|
	\| 1.5 – 2.0 \| Strong emotion (may distort if too high) \|

	---

	## ⚠️ System Requirements

	- ✅ GPU required
	- Recommended: A10 / A100 / H100
	- ❌ CPU-only Spaces will not work
	- ⏳ First run may take time due to model loading

	---

	## 🔐 Privacy & Data Handling

	- Uploaded voice files are used only for generation
	- Voice files are overwritten per request
	- No permanent storage or reuse of user voices

	---

	## 🚫 Responsible Use Policy

	This Space is intended for research and demonstration purposes only.

	❌ Do NOT clone voices of real individuals without explicit consent
	❌ Do NOT use for impersonation, fraud, or misinformation
	❌ Do NOT present generated audio as real recordings

	✔ Always disclose AI-generated audio when sharing publicly

	---

	## 🧠 Model Information

	- Base Model: VibeVoice-7B
	- Hindi Fine-Tuning: Hindi LoRA adapters
	- Architecture: LLM + acoustic & semantic tokenizers + diffusion head
	- Technique: LoRA (parameter-efficient fine-tuning)

	---

	## 📜 License

	MIT License
	(Same as the base VibeVoice model and adapters)

	---

	## 🙏 Acknowledgements

	- Microsoft Research – VibeVoice
	- VibeVoice Community
	- Hugging Face Open-Source Ecosystem

	---

	### ⚡ Note
	This is a research/demo Space, not recommended for production or real-time applications.