Spaces:
Sleeping
Sleeping
File size: 2,445 Bytes
60dcf48 9a26f4f 60dcf48 9a26f4f 60dcf48 9a26f4f 60dcf48 9a26f4f 60dcf48 9a26f4f 60dcf48 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
title: Voice Cloning Studio
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
preload_from_hub:
- coqui/XTTS-v2
- openai/whisper-base
---
# π Voice Cloning Studio
Real voice-to-voice and text-to-speech cloning using XTTS-v2 and Whisper AI.
## β¨ Features
- **π€ Voice-to-Voice Cloning**: Transform input audio using reference voice characteristics
- **π Text-to-Speech**: Generate speech in any cloned voice
- **π Multi-language Support**: 8+ languages supported
- **π΅ High Quality**: Professional 24kHz audio output
- **β‘ Real-time Processing**: Fast voice cloning with XTTS-v2
## π How to Use
### Voice-to-Voice Cloning
1. **Upload Reference Voice** - 6+ seconds of clear speech from the person to clone
2. **Upload Input Audio** - Speech content you want to transform
3. **Select Language** - Choose target language
4. **Click "Clone Voice"** - AI will extract content and apply reference voice
5. **Download Result** - New audio with same content, different voice
### Text-to-Speech Cloning
1. **Upload Reference Voice** - Voice sample to clone
2. **Enter Text** - Type what you want the cloned voice to say
3. **Generate Speech** - Create natural speech in the cloned voice
4. **Download Result** - High-quality synthesized audio
## π§ Technical Details
- **TTS Model**: XTTS-v2 (Coqui AI) - State-of-the-art voice cloning
- **Speech Recognition**: Whisper (OpenAI) - Accurate transcription
- **Languages**: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese
- **Quality**: 24kHz professional audio generation
- **Processing**: CPU/GPU optimized with automatic fallbacks
## π‘ Tips for Best Results
- **Reference Audio**: Use clear, single-speaker recordings with minimal background noise
- **Length**: 6-10 seconds of reference audio works best
- **Quality**: Higher quality input leads to better cloning results
- **Language**: Match reference voice language when possible for optimal results
## π οΈ Built With
- [XTTS-v2](https://huggingface.co/coqui/XTTS-v2) - Voice cloning model
- [Whisper](https://github.com/openai/whisper) - Speech recognition
- [Gradio](https://gradio.app/) - Web interface
- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting platform
---
**Note**: This space implements real voice cloning technology. Please use responsibly and respect others' voice rights and privacy.
|