Spaces:
Sleeping
Sleeping
| title: Voice Cloning Studio | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| preload_from_hub: | |
| - coqui/XTTS-v2 | |
| - openai/whisper-base | |
| # π Voice Cloning Studio | |
| Real voice-to-voice and text-to-speech cloning using XTTS-v2 and Whisper AI. | |
| ## β¨ Features | |
| - **π€ Voice-to-Voice Cloning**: Transform input audio using reference voice characteristics | |
| - **π Text-to-Speech**: Generate speech in any cloned voice | |
| - **π Multi-language Support**: 8+ languages supported | |
| - **π΅ High Quality**: Professional 24kHz audio output | |
| - **β‘ Real-time Processing**: Fast voice cloning with XTTS-v2 | |
| ## π How to Use | |
| ### Voice-to-Voice Cloning | |
| 1. **Upload Reference Voice** - 6+ seconds of clear speech from the person to clone | |
| 2. **Upload Input Audio** - Speech content you want to transform | |
| 3. **Select Language** - Choose target language | |
| 4. **Click "Clone Voice"** - AI will extract content and apply reference voice | |
| 5. **Download Result** - New audio with same content, different voice | |
| ### Text-to-Speech Cloning | |
| 1. **Upload Reference Voice** - Voice sample to clone | |
| 2. **Enter Text** - Type what you want the cloned voice to say | |
| 3. **Generate Speech** - Create natural speech in the cloned voice | |
| 4. **Download Result** - High-quality synthesized audio | |
| ## π§ Technical Details | |
| - **TTS Model**: XTTS-v2 (Coqui AI) - State-of-the-art voice cloning | |
| - **Speech Recognition**: Whisper (OpenAI) - Accurate transcription | |
| - **Languages**: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese | |
| - **Quality**: 24kHz professional audio generation | |
| - **Processing**: CPU/GPU optimized with automatic fallbacks | |
| ## π‘ Tips for Best Results | |
| - **Reference Audio**: Use clear, single-speaker recordings with minimal background noise | |
| - **Length**: 6-10 seconds of reference audio works best | |
| - **Quality**: Higher quality input leads to better cloning results | |
| - **Language**: Match reference voice language when possible for optimal results | |
| ## π οΈ Built With | |
| - [XTTS-v2](https://huggingface.co/coqui/XTTS-v2) - Voice cloning model | |
| - [Whisper](https://github.com/openai/whisper) - Speech recognition | |
| - [Gradio](https://gradio.app/) - Web interface | |
| - [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting platform | |
| --- | |
| **Note**: This space implements real voice cloning technology. Please use responsibly and respect others' voice rights and privacy. | |