Spaces:

crackuser
/

voiceclone-dev

Sleeping

File size: 2,445 Bytes

60dcf48
 
 
 
 
 
 
 
 
 
 
 
 
9a26f4f
60dcf48
9a26f4f
60dcf48
9a26f4f
60dcf48
9a26f4f
60dcf48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a26f4f
60dcf48

---
title: Voice Cloning Studio
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
preload_from_hub:
- coqui/XTTS-v2
- openai/whisper-base
---

# 🎭 Voice Cloning Studio

Real voice-to-voice and text-to-speech cloning using XTTS-v2 and Whisper AI.

## ✨ Features

- **🎤 Voice-to-Voice Cloning**: Transform input audio using reference voice characteristics
- **📝 Text-to-Speech**: Generate speech in any cloned voice
- **🌍 Multi-language Support**: 8+ languages supported
- **🎵 High Quality**: Professional 24kHz audio output
- **⚡ Real-time Processing**: Fast voice cloning with XTTS-v2

## 🚀 How to Use

### Voice-to-Voice Cloning
1. **Upload Reference Voice** - 6+ seconds of clear speech from the person to clone
2. **Upload Input Audio** - Speech content you want to transform
3. **Select Language** - Choose target language
4. **Click "Clone Voice"** - AI will extract content and apply reference voice
5. **Download Result** - New audio with same content, different voice

### Text-to-Speech Cloning
1. **Upload Reference Voice** - Voice sample to clone
2. **Enter Text** - Type what you want the cloned voice to say
3. **Generate Speech** - Create natural speech in the cloned voice
4. **Download Result** - High-quality synthesized audio

## 🔧 Technical Details

- **TTS Model**: XTTS-v2 (Coqui AI) - State-of-the-art voice cloning
- **Speech Recognition**: Whisper (OpenAI) - Accurate transcription
- **Languages**: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese
- **Quality**: 24kHz professional audio generation
- **Processing**: CPU/GPU optimized with automatic fallbacks

## 💡 Tips for Best Results

- **Reference Audio**: Use clear, single-speaker recordings with minimal background noise
- **Length**: 6-10 seconds of reference audio works best
- **Quality**: Higher quality input leads to better cloning results  
- **Language**: Match reference voice language when possible for optimal results

## 🛠️ Built With

- [XTTS-v2](https://huggingface.co/coqui/XTTS-v2) - Voice cloning model
- [Whisper](https://github.com/openai/whisper) - Speech recognition
- [Gradio](https://gradio.app/) - Web interface
- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting platform

---

**Note**: This space implements real voice cloning technology. Please use responsibly and respect others' voice rights and privacy.