Audio-to-Audio
Transformers
English
music
art
voice-cloning
so-vits-svc
so-vits-svc-fork
quevedo
spanish
Instructions to use lagosproject/quevedo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lagosproject/quevedo with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("lagosproject/quevedo", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| pipeline_tag: audio-to-audio | |
| language: en | |
| license: creativeml-openrail-m | |
| tags: | |
| - music | |
| - art | |
| - voice-cloning | |
| - so-vits-svc | |
| - so-vits-svc-fork | |
| - quevedo | |
| - spanish | |
| <p align="center"> | |
| <img src="https://huggingface.co/lagosproject/quevedo/resolve/main/assets/banner.png" alt="Quevedo Voice Model Banner" width="100%"> | |
| </p> | |
| # ๐ฃ๏ธ Quevedo Voice Model (`so-vits-svc-fork`) | |
| This repository contains the voice model of the Spanish singer **Quevedo**, trained for use with the **`so-vits-svc-fork`** library (version 3.10.3+ / 4.0.0+). | |
| --- | |
| ## ๐ Table of Contents | |
| - [Model Specifications](#-model-specifications) | |
| - [Repository Structure](#-repository-structure) | |
| - [Quick Installation](#-quick-installation) | |
| - [CLI Usage](#-cli-usage) | |
| - [Python API Usage](#-python-api-usage) | |
| - [Gradio WebUI Interface](#-gradio-webui-interface) | |
| - [Hugging Face Spaces Deployment](#-hugging-face-spaces-deployment) | |
| - [Optimization & Tuning Tips](#-optimization--tuning-tips) | |
| - [Ethical Disclaimer](#-ethical-disclaimer) | |
| --- | |
| ## ๐ Model Specifications | |
| | Feature | Value | | |
| | --- | --- | | |
| | **Speaker ID** | `quevedo` (Index: `0`) | | |
| | **Sampling Rate** | `44100 Hz` (44.1 kHz) | | |
| | **Base Architecture** | VITS with SoftVC content encoder (HuBERT) | | |
| | **Fork Target Version** | `so-vits-svc-fork` v3.x / v4.x | | |
| | **Pipeline Tag** | Audio-to-Audio (Singing/Speech Voice Conversion) | | |
| --- | |
| ## ๐ Repository Structure | |
| - `G_777.pth`: Generator model weight file (Git LFS). | |
| - `config.json`: Model configuration file detailing training hyperparameters and speaker metadata. | |
| - `app.py`: Sleek, custom-themed interactive graphical interface built with **Gradio**. | |
| - `requirements.txt`: Package requirements to run the inference and the Web UI. | |
| - `assets/banner.png`: Cover image representing the model repository. | |
| --- | |
| ## ๐ ๏ธ Quick Installation | |
| To run this model on your local machine, set up a Python environment first (Python 3.10 or 3.11 is recommended): | |
| ```bash | |
| # 1. Clone the repository | |
| git clone https://huggingface.co/lagosproject/quevedo | |
| cd quevedo | |
| # 2. Create and activate a virtual environment | |
| python3 -m venv venv | |
| source venv/bin/activate # On Windows use: venv\Scripts\activate | |
| # 3. Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| > [!IMPORTANT] | |
| > You must have **FFmpeg** installed on your system for audio file processing. If you are on Ubuntu/Debian, run `sudo apt install ffmpeg`. On macOS/Windows, install it via your preferred package manager (e.g. `brew install ffmpeg` or `choco install ffmpeg`). | |
| --- | |
| ## ๐ป CLI Usage | |
| Perform voice conversions directly from your terminal using the `svc` console script: | |
| ```bash | |
| # Basic inference | |
| svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -o output.wav | |
| # Transposed inference (+3 semitones for high pitch shifts) | |
| svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -t 3 -fm crepe -o output.wav | |
| ``` | |
| ### Useful CLI arguments: | |
| * `-m` / `--model-path`: Path to the generator checkpoint (`G_777.pth`). | |
| * `-c` / `--config-path`: Path to the configuration file (`config.json`). | |
| * `-s` / `--spk-list`: Speaker name (`quevedo`). | |
| * `-t` / `--trans`: Pitch shift in semitones (negative numbers shift pitch down, positive numbers shift pitch up). | |
| * `-fm` / `--f0-method`: Pitch tracking algorithm. Recommended choices: `crepe` (highest accuracy) or `dio` (fastest). | |
| --- | |
| ## ๐ Python API Usage | |
| To run voice conversion programmatically inside a custom Python script: | |
| ```python | |
| from pathlib import Path | |
| from so_vits_svc_fork.inference.main import infer | |
| # Configure paths | |
| input_audio = Path("vocals_input.wav") | |
| output_audio = Path("quevedo_output.wav") | |
| model_path = Path("G_777.pth") | |
| config_path = Path("config.json") | |
| # Execute inference | |
| infer( | |
| input_path=input_audio, | |
| output_path=output_audio, | |
| model_path=model_path, | |
| config_path=config_path, | |
| recursive=False, | |
| speaker="quevedo", | |
| transpose=0, # Adjust if input vocals are in a different octave | |
| auto_predict_f0=False, # Keep False for singing (preserves melody), True for speaking | |
| f0_method="crepe", # Crepe offers the highest quality pitch extraction | |
| noise_scale=0.4 | |
| ) | |
| print(f"Conversion complete: {output_audio}") | |
| ``` | |
| --- | |
| ## ๐จ Gradio WebUI Interface | |
| The repository contains a sleek, modern, web interface built with Gradio. To run it locally: | |
| ```bash | |
| python app.py | |
| ``` | |
| Once it starts, navigate to `http://localhost:7860` in your web browser. | |
| ### UI Highlights: | |
| - **Drag & Drop Upload**: Easily upload any WAV/MP3 files or record directly from your microphone. | |
| - **Visual Parameters Control**: Adjust Pitch Shift, F0 Predictor (`crepe`, `dio`, `harvest`), and Noise Scale interactively. | |
| - **Responsive Layout**: Designed with a clean glassmorphism dark-mode theme using customized indigo and purple gradients. | |
| --- | |
| ## ๐ Hugging Face Spaces Deployment | |
| To make this model interactive online for public use without requiring local installation: | |
| 1. Create a new **Space** on your Hugging Face account. | |
| 2. Select **Gradio** as the Space SDK. | |
| 3. Choose your hardware (a free CPU basic instance is fine, but GPU hardware speeds up inference considerably). | |
| 4. Upload all files from this repository to the Space (including `app.py`, `requirements.txt`, `config.json`, `G_777.pth` and the `assets/` folder). | |
| 5. The Space will build and deploy the WebUI automatically. | |
| --- | |
| ## ๐ก Optimization & Tuning Tips | |
| Follow these guidelines to achieve the best output vocal quality for Quevedo: | |
| * **Pitch Adjustments**: Quevedo has a deep, resonant baritone singing range. | |
| * If the source vocals are from a **female singer**, apply a negative pitch shift (typically **-8 to -12 semitones**). | |
| * If the source vocals are from a **male tenor singer**, shift down by **-3 to -6 semitones**. | |
| * If the source vocals are already in a **deep baritone range**, keep the transposition at **0**. | |
| * **Singing vs. Speech**: | |
| * For **songs**, disable `Auto Predict F0` to maintain the precise pitch notes of the original track. | |
| * For **speech/voice acting**, enable `Auto Predict F0` so the model generates natural speech intonation. | |
| * **Vocal Preparation**: | |
| * Input audio files must be clean, dry acapellas. Background instruments, beats, reverb, noise, or echo will distort the output audio. | |
| * For long inputs (more than 45 seconds), slice the audio into smaller files to avoid running out of memory (OOM). | |
| --- | |
| ## โ ๏ธ Ethical Disclaimer | |
| This model is intended for artistic, research, and educational purposes. **It should not be used to impersonate individuals for fraudulent, misleading, or defamatory purposes.** | |
| * If you share covers or musical works created using this model, please label them clearly as AI covers (e.g., "AI Cover"). | |
| * Respect local regulations and the moral rights of the original artist. The author of this repository is not responsible for malicious usage by third parties. |