Spaces:

naonauno
/

dialogs2-factory

Paused

App Files Files Community

naonauno commited on Jan 15, 2025

Commit

1b37547

verified ·

1 Parent(s): 4ae4a65

Update README.md

Browse files

Files changed (1) hide show

README.md +34 -16

README.md CHANGED Viewed

@@ -1,12 +1,13 @@
 ---
-title: Voice Conversion
 emoji: 🎤
 colorFrom: indigo
 colorTo: purple
 sdk: gradio
-sdk_version: 5.12.0
 app_file: app.py
 pinned: false
 ---
 # Amphion's Vevo - Voice Conversion & TTS
@@ -19,27 +20,44 @@ This is a Gradio web interface for the Vevo voice conversion model from the Amph
 ## Usage
-1. Select the mode you want to use (voice, timbre, or TTS)
-2. Upload the required audio files:
-   - Source audio (for voice and timbre modes)
-   - Reference style audio (for voice and TTS modes)
-   - Reference timbre audio (for all modes)
 3. For TTS mode:
    - Enter the text you want to convert to speech
-   - Optionally provide reference text and select languages
-4. Adjust the Flow Matching Steps if needed (default: 32)
 5. Click "Generate" to create the converted audio
 ## Models
-The application uses the following models from Hugging Face:
 - Content Tokenizer (vq32)
 - Content-Style Tokenizer (vq8192)
 - Autoregressive Transformer
 - Flow Matching Transformer
-- Vocoder
-## Technical Requirements
-- Python 3.8+
-- CUDA-capable GPU recommended for faster inference

 ---
+title: Amphion Vevo Voice Conversion
 emoji: 🎤
 colorFrom: indigo
 colorTo: purple
 sdk: gradio
+sdk_version: 4.8.0
 app_file: app.py
 pinned: false
+python_version: "3.10"
 ---
 # Amphion's Vevo - Voice Conversion & TTS
 ## Usage
+1. Select mode:
+   - **Voice**: Convert voice with both style and timbre transfer
+   - **Timbre**: Convert only the timbre of the voice
+   - **TTS**: Generate speech from text with voice cloning
+2. Upload audio files based on mode:
+   - Source Audio: Your input audio (for voice and timbre modes)
+   - Reference Style: Style reference (for voice and TTS modes)
+   - Reference Timbre: Voice reference (required for all modes)
 3. For TTS mode:
    - Enter the text you want to convert to speech
+   - Optionally provide reference text
+   - Select source and reference languages
+4. Adjust Flow Matching Steps (1-64, default: 32)
+   - Higher values give better quality but take longer
+   - Lower values are faster but may reduce quality
 5. Click "Generate" to create the converted audio
+## Sample Files
+Sample audio files are available in the `Amphion/models/vc/vevo/wav/` directory:
+- arabic_male.wav
+- source.wav
+## Technical Requirements
+- Python 3.10+
+- CUDA-capable GPU recommended for faster inference
+- Minimum 12GB storage space for models
 ## Models
+The application automatically downloads required models from Hugging Face:
 - Content Tokenizer (vq32)
 - Content-Style Tokenizer (vq8192)
 - Autoregressive Transformer
 - Flow Matching Transformer
+- Vocoder