Audio-to-Audio
Transformers
English
music
art
voice-cloning
so-vits-svc
so-vits-svc-fork
quevedo
spanish
Instructions to use lagosproject/quevedo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lagosproject/quevedo with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("lagosproject/quevedo", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload folder using huggingface_hub
Browse files- README.md +80 -83
- app.py +28 -32
- assets/banner.png +2 -2
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
pipeline_tag: audio-to-audio
|
| 3 |
-
language:
|
| 4 |
license: creativeml-openrail-m
|
| 5 |
tags:
|
| 6 |
- music
|
|
@@ -18,104 +18,102 @@ tags:
|
|
| 18 |
|
| 19 |
# 🗣️ Quevedo Voice Model (`so-vits-svc-fork`)
|
| 20 |
|
| 21 |
-
Este repositorio contiene el modelo de voz del cantante español **Quevedo**, entrenado para su uso con la biblioteca **`so-vits-svc-fork`** (versión 3.10.3+ / 4.0.0+).
|
| 22 |
-
|
| 23 |
This repository contains the voice model of the Spanish singer **Quevedo**, trained for use with the **`so-vits-svc-fork`** library (version 3.10.3+ / 4.0.0+).
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
-
## 📋
|
| 28 |
-
- [
|
| 29 |
-
- [
|
| 30 |
-
- [
|
| 31 |
-
- [
|
| 32 |
-
- [
|
| 33 |
-
- [
|
| 34 |
-
- [
|
| 35 |
-
- [
|
| 36 |
-
- [
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
-
## 📊
|
| 41 |
|
| 42 |
-
|
|
| 43 |
| --- | --- |
|
| 44 |
-
| **Speaker ID
|
| 45 |
-
| **
|
| 46 |
-
| **
|
| 47 |
-
| **
|
| 48 |
-
| **
|
| 49 |
|
| 50 |
---
|
| 51 |
|
| 52 |
-
## 📁
|
| 53 |
-
- `G_777.pth`:
|
| 54 |
-
- `config.json`:
|
| 55 |
-
- `app.py`:
|
| 56 |
-
- `requirements.txt`:
|
| 57 |
-
- `assets/banner.png`:
|
| 58 |
|
| 59 |
---
|
| 60 |
|
| 61 |
-
## 🛠️
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
```bash
|
| 66 |
-
# 1.
|
| 67 |
git clone https://huggingface.co/lagosproject/quevedo
|
| 68 |
cd quevedo
|
| 69 |
|
| 70 |
-
# 2.
|
| 71 |
python3 -m venv venv
|
| 72 |
-
source venv/bin/activate #
|
| 73 |
|
| 74 |
-
# 3.
|
| 75 |
pip install -r requirements.txt
|
| 76 |
```
|
| 77 |
|
| 78 |
> [!IMPORTANT]
|
| 79 |
-
>
|
| 80 |
|
| 81 |
---
|
| 82 |
|
| 83 |
-
## 💻
|
| 84 |
|
| 85 |
-
|
| 86 |
|
| 87 |
```bash
|
| 88 |
-
#
|
| 89 |
svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -o output.wav
|
| 90 |
|
| 91 |
-
#
|
| 92 |
svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -t 3 -fm crepe -o output.wav
|
| 93 |
```
|
| 94 |
|
| 95 |
-
###
|
| 96 |
-
* `-m` / `--model-path`:
|
| 97 |
-
* `-c` / `--config-path`:
|
| 98 |
-
* `-s` / `--spk-list`:
|
| 99 |
-
* `-t` / `--trans`:
|
| 100 |
-
* `-fm` / `--f0-method`:
|
| 101 |
|
| 102 |
---
|
| 103 |
|
| 104 |
-
## 🐍
|
| 105 |
|
| 106 |
-
|
| 107 |
|
| 108 |
```python
|
| 109 |
from pathlib import Path
|
| 110 |
from so_vits_svc_fork.inference.main import infer
|
| 111 |
|
| 112 |
-
#
|
| 113 |
-
input_audio = Path("
|
| 114 |
output_audio = Path("quevedo_output.wav")
|
| 115 |
model_path = Path("G_777.pth")
|
| 116 |
config_path = Path("config.json")
|
| 117 |
|
| 118 |
-
#
|
| 119 |
infer(
|
| 120 |
input_path=input_audio,
|
| 121 |
output_path=output_audio,
|
|
@@ -123,65 +121,64 @@ infer(
|
|
| 123 |
config_path=config_path,
|
| 124 |
recursive=False,
|
| 125 |
speaker="quevedo",
|
| 126 |
-
transpose=0, #
|
| 127 |
-
auto_predict_f0=False, # False
|
| 128 |
-
f0_method="crepe", # Crepe
|
| 129 |
noise_scale=0.4
|
| 130 |
)
|
| 131 |
|
| 132 |
-
print(f"
|
| 133 |
```
|
| 134 |
|
| 135 |
---
|
| 136 |
|
| 137 |
-
## 🎨
|
| 138 |
|
| 139 |
-
|
| 140 |
|
| 141 |
```bash
|
| 142 |
python app.py
|
| 143 |
```
|
| 144 |
-
|
| 145 |
|
| 146 |
-
###
|
| 147 |
-
- **
|
| 148 |
-
- **
|
| 149 |
-
- **
|
| 150 |
-
- **Estilo Moderno:** Diseñado con un tema oscuro tipo "glassmorphism" en tonos azul y púrpura de alta calidad.
|
| 151 |
|
| 152 |
---
|
| 153 |
|
| 154 |
-
## 🚀
|
| 155 |
|
| 156 |
-
|
| 157 |
|
| 158 |
-
1.
|
| 159 |
-
2.
|
| 160 |
-
3.
|
| 161 |
-
4.
|
| 162 |
-
5.
|
| 163 |
|
| 164 |
---
|
| 165 |
|
| 166 |
-
## 💡
|
| 167 |
|
| 168 |
-
|
| 169 |
|
| 170 |
-
* **
|
| 171 |
-
*
|
| 172 |
-
*
|
| 173 |
-
*
|
| 174 |
-
* **
|
| 175 |
-
*
|
| 176 |
-
*
|
| 177 |
-
* **
|
| 178 |
-
*
|
| 179 |
-
*
|
| 180 |
|
| 181 |
---
|
| 182 |
|
| 183 |
-
## ⚠️
|
| 184 |
|
| 185 |
-
|
| 186 |
-
*
|
| 187 |
-
*
|
|
|
|
| 1 |
---
|
| 2 |
pipeline_tag: audio-to-audio
|
| 3 |
+
language: en
|
| 4 |
license: creativeml-openrail-m
|
| 5 |
tags:
|
| 6 |
- music
|
|
|
|
| 18 |
|
| 19 |
# 🗣️ Quevedo Voice Model (`so-vits-svc-fork`)
|
| 20 |
|
|
|
|
|
|
|
| 21 |
This repository contains the voice model of the Spanish singer **Quevedo**, trained for use with the **`so-vits-svc-fork`** library (version 3.10.3+ / 4.0.0+).
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
+
## 📋 Table of Contents
|
| 26 |
+
- [Model Specifications](#-model-specifications)
|
| 27 |
+
- [Repository Structure](#-repository-structure)
|
| 28 |
+
- [Quick Installation](#-quick-installation)
|
| 29 |
+
- [CLI Usage](#-cli-usage)
|
| 30 |
+
- [Python API Usage](#-python-api-usage)
|
| 31 |
+
- [Gradio WebUI Interface](#-gradio-webui-interface)
|
| 32 |
+
- [Hugging Face Spaces Deployment](#-hugging-face-spaces-deployment)
|
| 33 |
+
- [Optimization & Tuning Tips](#-optimization--tuning-tips)
|
| 34 |
+
- [Ethical Disclaimer](#-ethical-disclaimer)
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## 📊 Model Specifications
|
| 39 |
|
| 40 |
+
| Feature | Value |
|
| 41 |
| --- | --- |
|
| 42 |
+
| **Speaker ID** | `quevedo` (Index: `0`) |
|
| 43 |
+
| **Sampling Rate** | `44100 Hz` (44.1 kHz) |
|
| 44 |
+
| **Base Architecture** | VITS with SoftVC content encoder (HuBERT) |
|
| 45 |
+
| **Fork Target Version** | `so-vits-svc-fork` v3.x / v4.x |
|
| 46 |
+
| **Pipeline Tag** | Audio-to-Audio (Singing/Speech Voice Conversion) |
|
| 47 |
|
| 48 |
---
|
| 49 |
|
| 50 |
+
## 📁 Repository Structure
|
| 51 |
+
- `G_777.pth`: Generator model weight file (Git LFS).
|
| 52 |
+
- `config.json`: Model configuration file detailing training hyperparameters and speaker metadata.
|
| 53 |
+
- `app.py`: Sleek, custom-themed interactive graphical interface built with **Gradio**.
|
| 54 |
+
- `requirements.txt`: Package requirements to run the inference and the Web UI.
|
| 55 |
+
- `assets/banner.png`: Cover image representing the model repository.
|
| 56 |
|
| 57 |
---
|
| 58 |
|
| 59 |
+
## 🛠️ Quick Installation
|
| 60 |
|
| 61 |
+
To run this model on your local machine, set up a Python environment first (Python 3.10 or 3.11 is recommended):
|
| 62 |
|
| 63 |
```bash
|
| 64 |
+
# 1. Clone the repository
|
| 65 |
git clone https://huggingface.co/lagosproject/quevedo
|
| 66 |
cd quevedo
|
| 67 |
|
| 68 |
+
# 2. Create and activate a virtual environment
|
| 69 |
python3 -m venv venv
|
| 70 |
+
source venv/bin/activate # On Windows use: venv\Scripts\activate
|
| 71 |
|
| 72 |
+
# 3. Install dependencies
|
| 73 |
pip install -r requirements.txt
|
| 74 |
```
|
| 75 |
|
| 76 |
> [!IMPORTANT]
|
| 77 |
+
> You must have **FFmpeg** installed on your system for audio file processing. If you are on Ubuntu/Debian, run `sudo apt install ffmpeg`. On macOS/Windows, install it via your preferred package manager (e.g. `brew install ffmpeg` or `choco install ffmpeg`).
|
| 78 |
|
| 79 |
---
|
| 80 |
|
| 81 |
+
## 💻 CLI Usage
|
| 82 |
|
| 83 |
+
Perform voice conversions directly from your terminal using the `svc` console script:
|
| 84 |
|
| 85 |
```bash
|
| 86 |
+
# Basic inference
|
| 87 |
svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -o output.wav
|
| 88 |
|
| 89 |
+
# Transposed inference (+3 semitones for high pitch shifts)
|
| 90 |
svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -t 3 -fm crepe -o output.wav
|
| 91 |
```
|
| 92 |
|
| 93 |
+
### Useful CLI arguments:
|
| 94 |
+
* `-m` / `--model-path`: Path to the generator checkpoint (`G_777.pth`).
|
| 95 |
+
* `-c` / `--config-path`: Path to the configuration file (`config.json`).
|
| 96 |
+
* `-s` / `--spk-list`: Speaker name (`quevedo`).
|
| 97 |
+
* `-t` / `--trans`: Pitch shift in semitones (negative numbers shift pitch down, positive numbers shift pitch up).
|
| 98 |
+
* `-fm` / `--f0-method`: Pitch tracking algorithm. Recommended choices: `crepe` (highest accuracy) or `dio` (fastest).
|
| 99 |
|
| 100 |
---
|
| 101 |
|
| 102 |
+
## 🐍 Python API Usage
|
| 103 |
|
| 104 |
+
To run voice conversion programmatically inside a custom Python script:
|
| 105 |
|
| 106 |
```python
|
| 107 |
from pathlib import Path
|
| 108 |
from so_vits_svc_fork.inference.main import infer
|
| 109 |
|
| 110 |
+
# Configure paths
|
| 111 |
+
input_audio = Path("vocals_input.wav")
|
| 112 |
output_audio = Path("quevedo_output.wav")
|
| 113 |
model_path = Path("G_777.pth")
|
| 114 |
config_path = Path("config.json")
|
| 115 |
|
| 116 |
+
# Execute inference
|
| 117 |
infer(
|
| 118 |
input_path=input_audio,
|
| 119 |
output_path=output_audio,
|
|
|
|
| 121 |
config_path=config_path,
|
| 122 |
recursive=False,
|
| 123 |
speaker="quevedo",
|
| 124 |
+
transpose=0, # Adjust if input vocals are in a different octave
|
| 125 |
+
auto_predict_f0=False, # Keep False for singing (preserves melody), True for speaking
|
| 126 |
+
f0_method="crepe", # Crepe offers the highest quality pitch extraction
|
| 127 |
noise_scale=0.4
|
| 128 |
)
|
| 129 |
|
| 130 |
+
print(f"Conversion complete: {output_audio}")
|
| 131 |
```
|
| 132 |
|
| 133 |
---
|
| 134 |
|
| 135 |
+
## 🎨 Gradio WebUI Interface
|
| 136 |
|
| 137 |
+
The repository contains a sleek, modern, web interface built with Gradio. To run it locally:
|
| 138 |
|
| 139 |
```bash
|
| 140 |
python app.py
|
| 141 |
```
|
| 142 |
+
Once it starts, navigate to `http://localhost:7860` in your web browser.
|
| 143 |
|
| 144 |
+
### UI Highlights:
|
| 145 |
+
- **Drag & Drop Upload**: Easily upload any WAV/MP3 files or record directly from your microphone.
|
| 146 |
+
- **Visual Parameters Control**: Adjust Pitch Shift, F0 Predictor (`crepe`, `dio`, `harvest`), and Noise Scale interactively.
|
| 147 |
+
- **Responsive Layout**: Designed with a clean glassmorphism dark-mode theme using customized indigo and purple gradients.
|
|
|
|
| 148 |
|
| 149 |
---
|
| 150 |
|
| 151 |
+
## 🚀 Hugging Face Spaces Deployment
|
| 152 |
|
| 153 |
+
To make this model interactive online for public use without requiring local installation:
|
| 154 |
|
| 155 |
+
1. Create a new **Space** on your Hugging Face account.
|
| 156 |
+
2. Select **Gradio** as the Space SDK.
|
| 157 |
+
3. Choose your hardware (a free CPU basic instance is fine, but GPU hardware speeds up inference considerably).
|
| 158 |
+
4. Upload all files from this repository to the Space (including `app.py`, `requirements.txt`, `config.json`, `G_777.pth` and the `assets/` folder).
|
| 159 |
+
5. The Space will build and deploy the WebUI automatically.
|
| 160 |
|
| 161 |
---
|
| 162 |
|
| 163 |
+
## 💡 Optimization & Tuning Tips
|
| 164 |
|
| 165 |
+
Follow these guidelines to achieve the best output vocal quality for Quevedo:
|
| 166 |
|
| 167 |
+
* **Pitch Adjustments**: Quevedo has a deep, resonant baritone singing range.
|
| 168 |
+
* If the source vocals are from a **female singer**, apply a negative pitch shift (typically **-8 to -12 semitones**).
|
| 169 |
+
* If the source vocals are from a **male tenor singer**, shift down by **-3 to -6 semitones**.
|
| 170 |
+
* If the source vocals are already in a **deep baritone range**, keep the transposition at **0**.
|
| 171 |
+
* **Singing vs. Speech**:
|
| 172 |
+
* For **songs**, disable `Auto Predict F0` to maintain the precise pitch notes of the original track.
|
| 173 |
+
* For **speech/voice acting**, enable `Auto Predict F0` so the model generates natural speech intonation.
|
| 174 |
+
* **Vocal Preparation**:
|
| 175 |
+
* Input audio files must be clean, dry acapellas. Background instruments, beats, reverb, noise, or echo will distort the output audio.
|
| 176 |
+
* For long inputs (more than 45 seconds), slice the audio into smaller files to avoid running out of memory (OOM).
|
| 177 |
|
| 178 |
---
|
| 179 |
|
| 180 |
+
## ⚠️ Ethical Disclaimer
|
| 181 |
|
| 182 |
+
This model is intended for artistic, research, and educational purposes. **It should not be used to impersonate individuals for fraudulent, misleading, or defamatory purposes.**
|
| 183 |
+
* If you share covers or musical works created using this model, please label them clearly as AI covers (e.g., "AI Cover").
|
| 184 |
+
* Respect local regulations and the moral rights of the original artist. The author of this repository is not responsible for malicious usage by third parties.
|
app.py
CHANGED
|
@@ -22,7 +22,7 @@ if CONFIG_PATH.exists():
|
|
| 22 |
# Inference function
|
| 23 |
def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, noise_scale):
|
| 24 |
if input_audio is None:
|
| 25 |
-
return None, "
|
| 26 |
|
| 27 |
input_path = Path(input_audio)
|
| 28 |
output_path = input_path.parent / f"{input_path.stem}_quevedo.wav"
|
|
@@ -32,16 +32,14 @@ def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, n
|
|
| 32 |
from so_vits_svc_fork.inference.main import infer
|
| 33 |
except ImportError:
|
| 34 |
return None, (
|
| 35 |
-
"Error: 'so-vits-svc-fork' no está instalado en este entorno.\n"
|
| 36 |
-
"Ejecuta: pip install so-vits-svc-fork\n\n"
|
| 37 |
"Error: 'so-vits-svc-fork' is not installed in this environment.\n"
|
| 38 |
-
"
|
| 39 |
)
|
| 40 |
|
| 41 |
if not MODEL_PATH.exists():
|
| 42 |
-
return None, f"Error:
|
| 43 |
if not CONFIG_PATH.exists():
|
| 44 |
-
return None, f"Error:
|
| 45 |
|
| 46 |
try:
|
| 47 |
# Perform inference using the fork's main infer function
|
|
@@ -59,12 +57,12 @@ def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, n
|
|
| 59 |
)
|
| 60 |
|
| 61 |
if output_path.exists():
|
| 62 |
-
return str(output_path), "
|
| 63 |
else:
|
| 64 |
-
return None, "Error
|
| 65 |
|
| 66 |
except Exception as e:
|
| 67 |
-
return None, f"Error
|
| 68 |
|
| 69 |
# Custom CSS for premium styling matching the blue-purple theme
|
| 70 |
custom_css = """
|
|
@@ -127,26 +125,24 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
|
|
| 127 |
with gr.Row(elem_classes=["header-area"]):
|
| 128 |
gr.HTML(
|
| 129 |
"<h1 class='header-title'>🗣️ Quevedo Voice Model (so-vits-svc-fork)</h1>"
|
| 130 |
-
"<p class='header-desc'>
|
| 131 |
-
"<p style='color: #6b7280; font-size: 0.9rem; margin-top: -10px;'>"
|
| 132 |
-
"Convert any voice or singing file into the voice of Quevedo.</p>"
|
| 133 |
)
|
| 134 |
|
| 135 |
# Main conversion section
|
| 136 |
with gr.Row(elem_classes=["main-box"]):
|
| 137 |
with gr.Column(scale=1):
|
| 138 |
-
gr.Markdown("### 📥 1.
|
| 139 |
input_audio = gr.Audio(
|
| 140 |
-
label="Audio
|
| 141 |
type="filepath",
|
| 142 |
sources=["upload", "microphone"]
|
| 143 |
)
|
| 144 |
|
| 145 |
-
gr.Markdown("### ⚙️ 2.
|
| 146 |
speaker = gr.Dropdown(
|
| 147 |
choices=speakers,
|
| 148 |
value=speakers[0],
|
| 149 |
-
label="Speaker
|
| 150 |
)
|
| 151 |
|
| 152 |
transpose = gr.Slider(
|
|
@@ -154,22 +150,22 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
|
|
| 154 |
maximum=12,
|
| 155 |
value=0,
|
| 156 |
step=1,
|
| 157 |
-
label="
|
| 158 |
-
info="
|
| 159 |
)
|
| 160 |
|
| 161 |
-
with gr.Accordion("
|
| 162 |
auto_predict_f0 = gr.Checkbox(
|
| 163 |
value=False,
|
| 164 |
-
label="Auto
|
| 165 |
-
info="
|
| 166 |
)
|
| 167 |
|
| 168 |
f0_method = gr.Dropdown(
|
| 169 |
choices=["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"],
|
| 170 |
value="crepe",
|
| 171 |
-
label="F0 Predictor
|
| 172 |
-
info="crepe
|
| 173 |
)
|
| 174 |
|
| 175 |
noise_scale = gr.Slider(
|
|
@@ -177,25 +173,25 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
|
|
| 177 |
maximum=1.0,
|
| 178 |
value=0.4,
|
| 179 |
step=0.05,
|
| 180 |
-
label="
|
| 181 |
-
info="
|
| 182 |
)
|
| 183 |
|
| 184 |
with gr.Column(scale=1):
|
| 185 |
-
gr.Markdown("### 📤 3.
|
| 186 |
output_audio = gr.Audio(
|
| 187 |
-
label="
|
| 188 |
type="filepath"
|
| 189 |
)
|
| 190 |
|
| 191 |
status_output = gr.Textbox(
|
| 192 |
-
label="
|
| 193 |
-
value="
|
| 194 |
interactive=False
|
| 195 |
)
|
| 196 |
|
| 197 |
submit_btn = gr.Button(
|
| 198 |
-
"
|
| 199 |
variant="primary",
|
| 200 |
elem_classes=["convert-btn"]
|
| 201 |
)
|
|
@@ -209,8 +205,8 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
|
|
| 209 |
# Footer
|
| 210 |
gr.HTML(
|
| 211 |
"<div style='text-align: center; color: #4b5563; font-size: 0.85rem; padding: 20px 0;'>"
|
| 212 |
-
"
|
| 213 |
-
"
|
| 214 |
"Developed with 💜 for the open voice community.</div>"
|
| 215 |
)
|
| 216 |
|
|
|
|
| 22 |
# Inference function
|
| 23 |
def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, noise_scale):
|
| 24 |
if input_audio is None:
|
| 25 |
+
return None, "Please upload an audio file or use the microphone."
|
| 26 |
|
| 27 |
input_path = Path(input_audio)
|
| 28 |
output_path = input_path.parent / f"{input_path.stem}_quevedo.wav"
|
|
|
|
| 32 |
from so_vits_svc_fork.inference.main import infer
|
| 33 |
except ImportError:
|
| 34 |
return None, (
|
|
|
|
|
|
|
| 35 |
"Error: 'so-vits-svc-fork' is not installed in this environment.\n"
|
| 36 |
+
"Please run: pip install so-vits-svc-fork"
|
| 37 |
)
|
| 38 |
|
| 39 |
if not MODEL_PATH.exists():
|
| 40 |
+
return None, f"Error: Model file {MODEL_PATH} not found."
|
| 41 |
if not CONFIG_PATH.exists():
|
| 42 |
+
return None, f"Error: Config file {CONFIG_PATH} not found."
|
| 43 |
|
| 44 |
try:
|
| 45 |
# Perform inference using the fork's main infer function
|
|
|
|
| 57 |
)
|
| 58 |
|
| 59 |
if output_path.exists():
|
| 60 |
+
return str(output_path), "Conversion completed successfully!"
|
| 61 |
else:
|
| 62 |
+
return None, "Error: Output file was not generated."
|
| 63 |
|
| 64 |
except Exception as e:
|
| 65 |
+
return None, f"Error during inference: {str(e)}"
|
| 66 |
|
| 67 |
# Custom CSS for premium styling matching the blue-purple theme
|
| 68 |
custom_css = """
|
|
|
|
| 125 |
with gr.Row(elem_classes=["header-area"]):
|
| 126 |
gr.HTML(
|
| 127 |
"<h1 class='header-title'>🗣️ Quevedo Voice Model (so-vits-svc-fork)</h1>"
|
| 128 |
+
"<p class='header-desc'>Convert any voice or singing file into the voice of the Spanish singer Quevedo.</p>"
|
|
|
|
|
|
|
| 129 |
)
|
| 130 |
|
| 131 |
# Main conversion section
|
| 132 |
with gr.Row(elem_classes=["main-box"]):
|
| 133 |
with gr.Column(scale=1):
|
| 134 |
+
gr.Markdown("### 📥 1. Audio Input")
|
| 135 |
input_audio = gr.Audio(
|
| 136 |
+
label="Audio to Convert (Clean Vocals / Acapella)",
|
| 137 |
type="filepath",
|
| 138 |
sources=["upload", "microphone"]
|
| 139 |
)
|
| 140 |
|
| 141 |
+
gr.Markdown("### ⚙️ 2. Conversion Parameters")
|
| 142 |
speaker = gr.Dropdown(
|
| 143 |
choices=speakers,
|
| 144 |
value=speakers[0],
|
| 145 |
+
label="Speaker Name"
|
| 146 |
)
|
| 147 |
|
| 148 |
transpose = gr.Slider(
|
|
|
|
| 150 |
maximum=12,
|
| 151 |
value=0,
|
| 152 |
step=1,
|
| 153 |
+
label="Pitch Shift (Semitones)",
|
| 154 |
+
info="Increase for female-to-male voices (e.g. -5 to -12), or decrease for male-to-female."
|
| 155 |
)
|
| 156 |
|
| 157 |
+
with gr.Accordion("Advanced Options", open=False):
|
| 158 |
auto_predict_f0 = gr.Checkbox(
|
| 159 |
value=False,
|
| 160 |
+
label="Auto Predict F0",
|
| 161 |
+
info="Recommended for speech/narration. UNCHECK for singing to preserve notes."
|
| 162 |
)
|
| 163 |
|
| 164 |
f0_method = gr.Dropdown(
|
| 165 |
choices=["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"],
|
| 166 |
value="crepe",
|
| 167 |
+
label="F0 Predictor Algorithm",
|
| 168 |
+
info="crepe offers the best quality but is slower; dio is the fastest."
|
| 169 |
)
|
| 170 |
|
| 171 |
noise_scale = gr.Slider(
|
|
|
|
| 173 |
maximum=1.0,
|
| 174 |
value=0.4,
|
| 175 |
step=0.05,
|
| 176 |
+
label="Noise Scale",
|
| 177 |
+
info="Controls pitch variance and expressiveness (0.4 is standard)."
|
| 178 |
)
|
| 179 |
|
| 180 |
with gr.Column(scale=1):
|
| 181 |
+
gr.Markdown("### 📤 3. Output Audio")
|
| 182 |
output_audio = gr.Audio(
|
| 183 |
+
label="Converted Audio",
|
| 184 |
type="filepath"
|
| 185 |
)
|
| 186 |
|
| 187 |
status_output = gr.Textbox(
|
| 188 |
+
label="Status",
|
| 189 |
+
value="Ready",
|
| 190 |
interactive=False
|
| 191 |
)
|
| 192 |
|
| 193 |
submit_btn = gr.Button(
|
| 194 |
+
"Convert Voice 🚀",
|
| 195 |
variant="primary",
|
| 196 |
elem_classes=["convert-btn"]
|
| 197 |
)
|
|
|
|
| 205 |
# Footer
|
| 206 |
gr.HTML(
|
| 207 |
"<div style='text-align: center; color: #4b5563; font-size: 0.85rem; padding: 20px 0;'>"
|
| 208 |
+
"This model is for artistic demonstration and research purposes only. "
|
| 209 |
+
"Uses so-vits-svc-fork for inference.<br>"
|
| 210 |
"Developed with 💜 for the open voice community.</div>"
|
| 211 |
)
|
| 212 |
|
assets/banner.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|