File size: 6,964 Bytes
0195b48
 
fae208a
ba68a3f
0195b48
 
 
ba68a3f
 
 
 
 
0195b48
 
ba68a3f
674e662
ba68a3f
 
 
 
 
 
 
 
fae208a
 
 
 
 
 
 
 
 
 
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
fae208a
 
 
 
 
ba68a3f
 
 
fae208a
 
 
 
 
 
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
 
fae208a
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
fae208a
ba68a3f
 
 
 
fae208a
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
 
fae208a
ba68a3f
 
fae208a
ba68a3f
 
 
fae208a
 
 
 
 
 
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
 
 
 
 
fae208a
 
ba68a3f
 
 
 
fae208a
ba68a3f
 
 
 
 
 
 
fae208a
 
 
ba68a3f
 
 
fae208a
ba68a3f
 
 
 
fae208a
ba68a3f
fae208a
ba68a3f
 
 
 
fae208a
ba68a3f
fae208a
 
 
 
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
fae208a
 
 
 
 
ba68a3f
 
 
fae208a
ba68a3f
fae208a
ba68a3f
fae208a
 
 
 
 
 
 
 
 
 
ba68a3f
 
 
fae208a
ba68a3f
fae208a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
---
pipeline_tag: audio-to-audio
language: en
license: creativeml-openrail-m
tags:
- music
- art
- voice-cloning
- so-vits-svc
- so-vits-svc-fork
- quevedo
- spanish
---

<p align="center">
  <img src="https://huggingface.co/lagosproject/quevedo/resolve/main/assets/banner.png" alt="Quevedo Voice Model Banner" width="100%">
</p>

# πŸ—£οΈ Quevedo Voice Model (`so-vits-svc-fork`)

This repository contains the voice model of the Spanish singer **Quevedo**, trained for use with the **`so-vits-svc-fork`** library (version 3.10.3+ / 4.0.0+).

---

## πŸ“‹ Table of Contents
- [Model Specifications](#-model-specifications)
- [Repository Structure](#-repository-structure)
- [Quick Installation](#-quick-installation)
- [CLI Usage](#-cli-usage)
- [Python API Usage](#-python-api-usage)
- [Gradio WebUI Interface](#-gradio-webui-interface)
- [Hugging Face Spaces Deployment](#-hugging-face-spaces-deployment)
- [Optimization & Tuning Tips](#-optimization--tuning-tips)
- [Ethical Disclaimer](#-ethical-disclaimer)

---

## πŸ“Š Model Specifications

| Feature | Value |
| --- | --- |
| **Speaker ID** | `quevedo` (Index: `0`) |
| **Sampling Rate** | `44100 Hz` (44.1 kHz) |
| **Base Architecture** | VITS with SoftVC content encoder (HuBERT) |
| **Fork Target Version** | `so-vits-svc-fork` v3.x / v4.x |
| **Pipeline Tag** | Audio-to-Audio (Singing/Speech Voice Conversion) |

---

## πŸ“ Repository Structure
- `G_777.pth`: Generator model weight file (Git LFS).
- `config.json`: Model configuration file detailing training hyperparameters and speaker metadata.
- `app.py`: Sleek, custom-themed interactive graphical interface built with **Gradio**.
- `requirements.txt`: Package requirements to run the inference and the Web UI.
- `assets/banner.png`: Cover image representing the model repository.

---

## πŸ› οΈ Quick Installation

To run this model on your local machine, set up a Python environment first (Python 3.10 or 3.11 is recommended):

```bash
# 1. Clone the repository
git clone https://huggingface.co/lagosproject/quevedo
cd quevedo

# 2. Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt
```

> [!IMPORTANT]
> You must have **FFmpeg** installed on your system for audio file processing. If you are on Ubuntu/Debian, run `sudo apt install ffmpeg`. On macOS/Windows, install it via your preferred package manager (e.g. `brew install ffmpeg` or `choco install ffmpeg`).

---

## πŸ’» CLI Usage

Perform voice conversions directly from your terminal using the `svc` console script:

```bash
# Basic inference
svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -o output.wav

# Transposed inference (+3 semitones for high pitch shifts)
svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -t 3 -fm crepe -o output.wav
```

### Useful CLI arguments:
*   `-m` / `--model-path`: Path to the generator checkpoint (`G_777.pth`).
*   `-c` / `--config-path`: Path to the configuration file (`config.json`).
*   `-s` / `--spk-list`: Speaker name (`quevedo`).
*   `-t` / `--trans`: Pitch shift in semitones (negative numbers shift pitch down, positive numbers shift pitch up).
*   `-fm` / `--f0-method`: Pitch tracking algorithm. Recommended choices: `crepe` (highest accuracy) or `dio` (fastest).

---

## 🐍 Python API Usage

To run voice conversion programmatically inside a custom Python script:

```python
from pathlib import Path
from so_vits_svc_fork.inference.main import infer

# Configure paths
input_audio = Path("vocals_input.wav")
output_audio = Path("quevedo_output.wav")
model_path = Path("G_777.pth")
config_path = Path("config.json")

# Execute inference
infer(
    input_path=input_audio,
    output_path=output_audio,
    model_path=model_path,
    config_path=config_path,
    recursive=False,
    speaker="quevedo",
    transpose=0,              # Adjust if input vocals are in a different octave
    auto_predict_f0=False,    # Keep False for singing (preserves melody), True for speaking
    f0_method="crepe",        # Crepe offers the highest quality pitch extraction
    noise_scale=0.4
)

print(f"Conversion complete: {output_audio}")
```

---

## 🎨 Gradio WebUI Interface

The repository contains a sleek, modern, web interface built with Gradio. To run it locally:

```bash
python app.py
```
Once it starts, navigate to `http://localhost:7860` in your web browser.

### UI Highlights:
- **Drag & Drop Upload**: Easily upload any WAV/MP3 files or record directly from your microphone.
- **Visual Parameters Control**: Adjust Pitch Shift, F0 Predictor (`crepe`, `dio`, `harvest`), and Noise Scale interactively.
- **Responsive Layout**: Designed with a clean glassmorphism dark-mode theme using customized indigo and purple gradients.

---

## πŸš€ Hugging Face Spaces Deployment

To make this model interactive online for public use without requiring local installation:

1. Create a new **Space** on your Hugging Face account.
2. Select **Gradio** as the Space SDK.
3. Choose your hardware (a free CPU basic instance is fine, but GPU hardware speeds up inference considerably).
4. Upload all files from this repository to the Space (including `app.py`, `requirements.txt`, `config.json`, `G_777.pth` and the `assets/` folder).
5. The Space will build and deploy the WebUI automatically.

---

## πŸ’‘ Optimization & Tuning Tips

Follow these guidelines to achieve the best output vocal quality for Quevedo:

*   **Pitch Adjustments**: Quevedo has a deep, resonant baritone singing range.
    *   If the source vocals are from a **female singer**, apply a negative pitch shift (typically **-8 to -12 semitones**).
    *   If the source vocals are from a **male tenor singer**, shift down by **-3 to -6 semitones**.
    *   If the source vocals are already in a **deep baritone range**, keep the transposition at **0**.
*   **Singing vs. Speech**:
    *   For **songs**, disable `Auto Predict F0` to maintain the precise pitch notes of the original track.
    *   For **speech/voice acting**, enable `Auto Predict F0` so the model generates natural speech intonation.
*   **Vocal Preparation**:
    *   Input audio files must be clean, dry acapellas. Background instruments, beats, reverb, noise, or echo will distort the output audio.
    *   For long inputs (more than 45 seconds), slice the audio into smaller files to avoid running out of memory (OOM).

---

## ⚠️ Ethical Disclaimer

This model is intended for artistic, research, and educational purposes. **It should not be used to impersonate individuals for fraudulent, misleading, or defamatory purposes.**
*   If you share covers or musical works created using this model, please label them clearly as AI covers (e.g., "AI Cover").
*   Respect local regulations and the moral rights of the original artist. The author of this repository is not responsible for malicious usage by third parties.