Spaces:
Paused
Paused
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,13 @@
|
|
| 1 |
---
|
| 2 |
-
title: Voice Conversion
|
| 3 |
emoji: 🎤
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# Amphion's Vevo - Voice Conversion & TTS
|
|
@@ -19,27 +20,44 @@ This is a Gradio web interface for the Vevo voice conversion model from the Amph
|
|
| 19 |
|
| 20 |
## Usage
|
| 21 |
|
| 22 |
-
1. Select
|
| 23 |
-
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
3. For TTS mode:
|
| 28 |
- Enter the text you want to convert to speech
|
| 29 |
-
- Optionally provide reference text
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
5. Click "Generate" to create the converted audio
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
## Models
|
| 34 |
|
| 35 |
-
The application
|
| 36 |
- Content Tokenizer (vq32)
|
| 37 |
- Content-Style Tokenizer (vq8192)
|
| 38 |
- Autoregressive Transformer
|
| 39 |
- Flow Matching Transformer
|
| 40 |
-
- Vocoder
|
| 41 |
-
|
| 42 |
-
## Technical Requirements
|
| 43 |
-
|
| 44 |
-
- Python 3.8+
|
| 45 |
-
- CUDA-capable GPU recommended for faster inference
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Amphion Vevo Voice Conversion
|
| 3 |
emoji: 🎤
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 4.8.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
python_version: "3.10"
|
| 11 |
---
|
| 12 |
|
| 13 |
# Amphion's Vevo - Voice Conversion & TTS
|
|
|
|
| 20 |
|
| 21 |
## Usage
|
| 22 |
|
| 23 |
+
1. Select mode:
|
| 24 |
+
- **Voice**: Convert voice with both style and timbre transfer
|
| 25 |
+
- **Timbre**: Convert only the timbre of the voice
|
| 26 |
+
- **TTS**: Generate speech from text with voice cloning
|
| 27 |
+
|
| 28 |
+
2. Upload audio files based on mode:
|
| 29 |
+
- Source Audio: Your input audio (for voice and timbre modes)
|
| 30 |
+
- Reference Style: Style reference (for voice and TTS modes)
|
| 31 |
+
- Reference Timbre: Voice reference (required for all modes)
|
| 32 |
+
|
| 33 |
3. For TTS mode:
|
| 34 |
- Enter the text you want to convert to speech
|
| 35 |
+
- Optionally provide reference text
|
| 36 |
+
- Select source and reference languages
|
| 37 |
+
|
| 38 |
+
4. Adjust Flow Matching Steps (1-64, default: 32)
|
| 39 |
+
- Higher values give better quality but take longer
|
| 40 |
+
- Lower values are faster but may reduce quality
|
| 41 |
+
|
| 42 |
5. Click "Generate" to create the converted audio
|
| 43 |
|
| 44 |
+
## Sample Files
|
| 45 |
+
|
| 46 |
+
Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
|
| 47 |
+
- arabic_male.wav
|
| 48 |
+
- source.wav
|
| 49 |
+
|
| 50 |
+
## Technical Requirements
|
| 51 |
+
|
| 52 |
+
- Python 3.10+
|
| 53 |
+
- CUDA-capable GPU recommended for faster inference
|
| 54 |
+
- Minimum 12GB storage space for models
|
| 55 |
+
|
| 56 |
## Models
|
| 57 |
|
| 58 |
+
The application automatically downloads required models from Hugging Face:
|
| 59 |
- Content Tokenizer (vq32)
|
| 60 |
- Content-Style Tokenizer (vq8192)
|
| 61 |
- Autoregressive Transformer
|
| 62 |
- Flow Matching Transformer
|
| 63 |
+
- Vocoder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|