File size: 2,445 Bytes
60dcf48
 
 
 
 
 
 
 
 
 
 
 
 
9a26f4f
60dcf48
9a26f4f
60dcf48
9a26f4f
60dcf48
9a26f4f
60dcf48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a26f4f
60dcf48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: Voice Cloning Studio
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
preload_from_hub:
- coqui/XTTS-v2
- openai/whisper-base
---

# 🎭 Voice Cloning Studio

Real voice-to-voice and text-to-speech cloning using XTTS-v2 and Whisper AI.

## ✨ Features

- **🎀 Voice-to-Voice Cloning**: Transform input audio using reference voice characteristics
- **πŸ“ Text-to-Speech**: Generate speech in any cloned voice
- **🌍 Multi-language Support**: 8+ languages supported
- **🎡 High Quality**: Professional 24kHz audio output
- **⚑ Real-time Processing**: Fast voice cloning with XTTS-v2

## πŸš€ How to Use

### Voice-to-Voice Cloning
1. **Upload Reference Voice** - 6+ seconds of clear speech from the person to clone
2. **Upload Input Audio** - Speech content you want to transform
3. **Select Language** - Choose target language
4. **Click "Clone Voice"** - AI will extract content and apply reference voice
5. **Download Result** - New audio with same content, different voice

### Text-to-Speech Cloning
1. **Upload Reference Voice** - Voice sample to clone
2. **Enter Text** - Type what you want the cloned voice to say
3. **Generate Speech** - Create natural speech in the cloned voice
4. **Download Result** - High-quality synthesized audio

## πŸ”§ Technical Details

- **TTS Model**: XTTS-v2 (Coqui AI) - State-of-the-art voice cloning
- **Speech Recognition**: Whisper (OpenAI) - Accurate transcription
- **Languages**: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese
- **Quality**: 24kHz professional audio generation
- **Processing**: CPU/GPU optimized with automatic fallbacks

## πŸ’‘ Tips for Best Results

- **Reference Audio**: Use clear, single-speaker recordings with minimal background noise
- **Length**: 6-10 seconds of reference audio works best
- **Quality**: Higher quality input leads to better cloning results  
- **Language**: Match reference voice language when possible for optimal results

## πŸ› οΈ Built With

- [XTTS-v2](https://huggingface.co/coqui/XTTS-v2) - Voice cloning model
- [Whisper](https://github.com/openai/whisper) - Speech recognition
- [Gradio](https://gradio.app/) - Web interface
- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting platform

---

**Note**: This space implements real voice cloning technology. Please use responsibly and respect others' voice rights and privacy.