ind / README.md
12labs's picture
Upload 3 files
026659d verified

A newer version of the Gradio SDK is available: 6.9.0

Upgrade
metadata
title: Hindi Voice Cloning (VibeVoice)
emoji: ๐ŸŽ™๏ธ
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

๐Ÿ‡ฎ๐Ÿ‡ณ Hindi Voice Cloning with Emotion

This Hugging Face Space provides high-quality Hindi Text-to-Speech with voice cloning and expressive emotion.

Users can upload a short reference voice sample and generate Hindi speech in the same voice, tone, and emotional style.

The system is powered by VibeVoice-7B with Hindi LoRA fine-tuning, optimized for natural prosody and long-form speech.


โœจ Features

  • ๐ŸŽ™๏ธ Voice cloning from uploaded reference audio
  • ๐ŸŽญ Emotion & speaking style transfer
  • ๐Ÿ—ฃ๏ธ Natural-sounding Hindi TTS
  • ๐Ÿ“„ Long-form narration support
  • ๐Ÿš€ GPU-accelerated inference
  • ๐ŸŽš๏ธ Expression strength control (CFG scale)

๐Ÿงช How to Use

  1. Enter Hindi text in the text box
  2. Upload a reference voice (WAV format)
  3. Adjust Expression Strength (CFG Scale)
  4. Click ๐Ÿš€ Generate Voice
  5. Listen to or download the generated audio

๐ŸŽง Reference Voice Guidelines (Very Important)

For best quality voice cloning:

  • WAV format only
  • 10โ€“30 seconds duration recommended
  • Single speaker
  • Clear audio, minimal background noise
  • Natural emotion (happy, calm, sad, etc.)

โš ๏ธ Emotion is copied from the reference voice, not from the text.


๐ŸŽญ Expression Control (CFG Scale)

CFG Scale Effect
0.8 โ€“ 1.0 Calm / neutral
1.2 โ€“ 1.4 Natural & expressive (recommended)
1.5 โ€“ 2.0 Strong emotion (may distort if too high)

โš ๏ธ System Requirements

  • โœ… GPU required
    • Recommended: A10 / A100 / H100
  • โŒ CPU-only Spaces will not work
  • โณ First run may take time due to model loading

๐Ÿ” Privacy & Data Handling

  • Uploaded voice files are used only for generation
  • Voice files are overwritten per request
  • No permanent storage or reuse of user voices

๐Ÿšซ Responsible Use Policy

This Space is intended for research and demonstration purposes only.

โŒ Do NOT clone voices of real individuals without explicit consent
โŒ Do NOT use for impersonation, fraud, or misinformation
โŒ Do NOT present generated audio as real recordings

โœ” Always disclose AI-generated audio when sharing publicly


๐Ÿง  Model Information

  • Base Model: VibeVoice-7B
  • Hindi Fine-Tuning: Hindi LoRA adapters
  • Architecture: LLM + acoustic & semantic tokenizers + diffusion head
  • Technique: LoRA (parameter-efficient fine-tuning)

๐Ÿ“œ License

MIT License
(Same as the base VibeVoice model and adapters)


๐Ÿ™ Acknowledgements

  • Microsoft Research โ€“ VibeVoice
  • VibeVoice Community
  • Hugging Face Open-Source Ecosystem

โšก Note

This is a research/demo Space, not recommended for production or real-time applications.