Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +80 -83
app.py +28 -32
assets/banner.png +2 -2

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 pipeline_tag: audio-to-audio
-language: es
 license: creativeml-openrail-m
 tags:
 - music
@@ -18,104 +18,102 @@ tags:
 # 🗣️ Quevedo Voice Model (`so-vits-svc-fork`)
-Este repositorio contiene el modelo de voz del cantante español **Quevedo**, entrenado para su uso con la biblioteca **`so-vits-svc-fork`** (versión 3.10.3+ / 4.0.0+).
 This repository contains the voice model of the Spanish singer **Quevedo**, trained for use with the **`so-vits-svc-fork`** library (version 3.10.3+ / 4.0.0+).
 ---
-## 📋 Tabla de Contenidos / Table of Contents
-- [Especificaciones del Modelo / Model Specifications](#-especificaciones-del-modelo--model-specifications)
-- [Estructura del Repositorio / Repository Structure](#-estructura-del-repositorio--repository-structure)
-- [Instalación Rápida / Quick Installation](#-instalación-rápida--quick-installation)
-- [Uso por Consola (CLI) / CLI Usage](#-uso-por-consola-cli--cli-usage)
-- [Uso por Python API / Python API Usage](#-uso-por-python-api--python-api-usage)
-- [Interfaz Web (Gradio WebUI) / Web Interface](#-interfaz-web-gradio-webui--web-interface)
-- [Despliegue en Hugging Face Spaces / Hugging Face Spaces Deployment](#-despliegue-en-hugging-face-spaces--hugging-face-spaces-deployment)
-- [Consejos de Configuración / Optimization & Tips](#-consejos-de-configuración--optimization--tips)
-- [Aviso de Uso Ético / Ethical Disclaimer](#-aviso-de-uso-ético--ethical-disclaimer)
 ---
-## 📊 Especificaciones del Modelo / Model Specifications
-| Característica / Feature | Valor / Value |
 | --- | --- |
-| **Speaker ID / Nombre** | `quevedo` (Index: `0`) |
-| **Frecuencia de Muestreo / Sampling Rate** | `44100 Hz` (44.1 kHz) |
-| **Algoritmo Base / Architecture** | VITS with SoftVC content encoder (HuBERT) |
-| **Versión del Fork / Fork Version** | `so-vits-svc-fork` v3.x / v4.x |
-| **Tipo de Modelo / Pipeline** | Audio-to-Audio (Singing/Speech Voice Conversion) |
 ---
-## 📁 Estructura del Repositorio / Repository Structure
-- `G_777.pth`: Pesos del generador del modelo (Git LFS).
-- `config.json`: Archivo de configuración del modelo que incluye los parámetros de entrenamiento y el índice del speaker.
-- `app.py`: Interfaz gráfica interactiva y profesional construida con **Gradio**.
-- `requirements.txt`: Dependencias requeridas para la ejecución de la UI y la inferencia.
-- `assets/banner.png`: Imagen de portada representativa para el repositorio.
 ---
-## 🛠️ Instalación Rápida / Quick Installation
-Para empezar a utilizar el modelo en tu ordenador local, primero debes configurar tu entorno de Python (se recomienda Python 3.10 o 3.11):
 ```bash
-# 1. Clonar el repositorio / Clone repository
 git clone https://huggingface.co/lagosproject/quevedo
 cd quevedo
-# 2. Crear y activar un entorno virtual / Create & activate virtual environment
 python3 -m venv venv
-source venv/bin/activate  # En Windows usa: venv\Scripts\activate
-# 3. Instalar dependencias / Install dependencies
 pip install -r requirements.txt
 ```
 > [!IMPORTANT]
-> Se requiere tener instalado **FFmpeg** en tu sistema para el procesamiento de archivos de audio. Si eres usuario de Ubuntu/Debian ejecuta `sudo apt install ffmpeg`, o en Windows/macOS instálalo usando tu gestor de paquetes favorito (ej: `brew install ffmpeg` o `choco install ffmpeg`).
 ---
-## 💻 Uso por Consola (CLI) / CLI Usage
-Puedes realizar conversiones de voz directamente desde tu terminal utilizando el comando unificado `svc`:
 ```bash
-# Inferencia básica / Basic inference
 svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -o output.wav
-# Inferencia con ajuste de tono (+3 semitonos para voces más agudas) / Inference with +3 semitones pitch shift
 svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -t 3 -fm crepe -o output.wav
 ```
-### Parámetros útiles de CLI:
-*   `-m` / `--model-path`: Ruta al modelo generador (`G_777.pth`).
-*   `-c` / `--config-path`: Ruta a la configuración (`config.json`).
-*   `-s` / `--spk-list`: Nombre del speaker (`quevedo`).
-*   `-t` / `--trans`: Ajuste de tono en semitones (usa números negativos para hacer la voz más grave, y positivos para hacerla más aguda).
-*   `-fm` / `--f0-method`: Método de pitch tracking. Opciones recomendadas: `crepe` (mejor calidad) o `dio` (más rápido).
 ---
-## 🐍 Uso por Python API / Python API Usage
-Si quieres integrar la conversión de voz directamente dentro de tus scripts en Python:
 ```python
 from pathlib import Path
 from so_vits_svc_fork.inference.main import infer
-# Definición de rutas
-input_audio = Path("mi_entrada.wav")
 output_audio = Path("quevedo_output.wav")
 model_path = Path("G_777.pth")
 config_path = Path("config.json")
-# Invocación de la inferencia
 infer(
     input_path=input_audio,
     output_path=output_audio,
@@ -123,65 +121,64 @@ infer(
     config_path=config_path,
     recursive=False,
     speaker="quevedo",
-    transpose=0,              # Cambiar si el tono original difiere mucho del de Quevedo
-    auto_predict_f0=False,    # False para cantar (mantiene la melodía), True para hablar
-    f0_method="crepe",        # Crepe ofrece la mejor precisión de tono
     noise_scale=0.4
 )
-print(f"Conversión finalizada: {output_audio}")
 ```
 ---
-## 🎨 Interfaz Web (Gradio WebUI) / Web Interface
-El repositorio incluye una WebUI moderna y estilizada construida con Gradio. Para lanzarla localmente, ejecuta:
 ```bash
 python app.py
 ```
-Una vez iniciado, abre tu navegador en `http://localhost:7860` para acceder a la aplicación web.
-### Captura de Características / Feature highlights:
-- **Arrastrar y soltar (Drag and Drop):** Permite subir archivos fácilmente o grabar directamente usando el micrófono.
-- **Acceso Avanzado:** Controla el Pitch Shift, el método F0 predictor (`crepe`, `dio`, `harvest`), y la Escala de Ruido de forma visual.
-- **Bilingüe:** Totalmente adaptado en inglés y español.
-- **Estilo Moderno:** Diseñado con un tema oscuro tipo "glassmorphism" en tonos azul y púrpura de alta calidad.
 ---
-## 🚀 Despliegue en Hugging Face Spaces / Hugging Face Spaces Deployment
-Si quieres hacer este modelo interactivo directamente en Hugging Face para que cualquier persona del mundo lo pruebe sin instalar nada localmente:
-1. Crea un nuevo **Space** en tu cuenta de Hugging Face.
-2. Selecciona **Gradio** como el SDK del Space.
-3. Elige el hardware correspondiente (CPU básica gratuita es suficiente, aunque una GPU aumentará drásticamente la velocidad).
-4. Sube todos los archivos de este repositorio a tu Space (incluyendo `app.py`, `requirements.txt`, `config.json`, `G_777.pth` y `assets/`).
-5. El Space se compilará automáticamente y lanzará la interfaz web de manera pública y gratuita.
 ---
-## 💡 Consejos de Configuración / Optimization & Tips
-Para obtener la mejor calidad de audio al clonar la voz de Quevedo:
-*   **Ajuste de Tono (Pitch Shift):** Quevedo tiene un registro de barítono medio/bajo muy característico y profundo.
-    *   Si tu audio de entrada es cantado por una **voz femenina**, debes aplicar un pitch shift negativo substancial (típicamente entre **-8 y -12 semitonos**) para que suene natural.
-    *   Si el origen es una **voz masculina aguda/tenor**, ajusta el pitch a **-3 a -6 semitonos**.
-    *   Si es una **voz masculina grave/barítono**, puedes mantenerlo en **0**.
-*   **Voz de Canto vs Voz Hablada:**
-    *   Para canciones, asegúrate de tener desactivada la opción `Auto Predicción de F0`. Esto preservará las notas musicales del cantante original.
-    *   Para diálogos o voz de doblaje, activa `Auto Predicción de F0` para que el modelo aplique la entonación y prosodia típica de Quevedo al texto hablado.
-*   **Preparación del Audio de Entrada:**
-    *   Usa siempre audios de entrada limpios (acapellas). Cualquier música de fondo, eco, reverberación o ruido ambiental se distorsionará y arruinará la calidad de conversión.
-    *   Si tus audios son muy largos (más de 30-45 segundos), divídelos en pequeños segmentos para evitar errores por falta de memoria (Out of Memory).
 ---
-## ⚠️ Aviso de Uso Ético / Ethical Disclaimer
-Este modelo ha sido desarrollado con fines artísticos, educativos y de investigación científica. **No está diseñado para ser usado en suplantaciones de identidad con fines fraudulentos o difamatorios.**
-*   Si publicas música o covers creados con este modelo, debes aclarar explícitamente en el título y descripción que se trata de un cover creado mediante Inteligencia Artificial (ej: "Cover IA").
-*   Respeta las directrices éticas locales y los derechos morales del artista original. El desarrollador de este repositorio no se hace responsable del uso indebido del modelo por parte de terceros.

 ---
 pipeline_tag: audio-to-audio
+language: en
 license: creativeml-openrail-m
 tags:
 - music
 # 🗣️ Quevedo Voice Model (`so-vits-svc-fork`)
 This repository contains the voice model of the Spanish singer **Quevedo**, trained for use with the **`so-vits-svc-fork`** library (version 3.10.3+ / 4.0.0+).
 ---
+## 📋 Table of Contents
+- [Model Specifications](#-model-specifications)
+- [Repository Structure](#-repository-structure)
+- [Quick Installation](#-quick-installation)
+- [CLI Usage](#-cli-usage)
+- [Python API Usage](#-python-api-usage)
+- [Gradio WebUI Interface](#-gradio-webui-interface)
+- [Hugging Face Spaces Deployment](#-hugging-face-spaces-deployment)
+- [Optimization & Tuning Tips](#-optimization--tuning-tips)
+- [Ethical Disclaimer](#-ethical-disclaimer)
 ---
+## 📊 Model Specifications
+| Feature | Value |
 | --- | --- |
+| **Speaker ID** | `quevedo` (Index: `0`) |
+| **Sampling Rate** | `44100 Hz` (44.1 kHz) |
+| **Base Architecture** | VITS with SoftVC content encoder (HuBERT) |
+| **Fork Target Version** | `so-vits-svc-fork` v3.x / v4.x |
+| **Pipeline Tag** | Audio-to-Audio (Singing/Speech Voice Conversion) |
 ---
+## 📁 Repository Structure
+- `G_777.pth`: Generator model weight file (Git LFS).
+- `config.json`: Model configuration file detailing training hyperparameters and speaker metadata.
+- `app.py`: Sleek, custom-themed interactive graphical interface built with **Gradio**.
+- `requirements.txt`: Package requirements to run the inference and the Web UI.
+- `assets/banner.png`: Cover image representing the model repository.
 ---
+## 🛠️ Quick Installation
+To run this model on your local machine, set up a Python environment first (Python 3.10 or 3.11 is recommended):
 ```bash
+# 1. Clone the repository
 git clone https://huggingface.co/lagosproject/quevedo
 cd quevedo
+# 2. Create and activate a virtual environment
 python3 -m venv venv
+source venv/bin/activate  # On Windows use: venv\Scripts\activate
+# 3. Install dependencies
 pip install -r requirements.txt
 ```
 > [!IMPORTANT]
+> You must have **FFmpeg** installed on your system for audio file processing. If you are on Ubuntu/Debian, run `sudo apt install ffmpeg`. On macOS/Windows, install it via your preferred package manager (e.g. `brew install ffmpeg` or `choco install ffmpeg`).
 ---
+## 💻 CLI Usage
+Perform voice conversions directly from your terminal using the `svc` console script:
 ```bash
+# Basic inference
 svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -o output.wav
+# Transposed inference (+3 semitones for high pitch shifts)
 svc infer path/to/input.wav -m G_777.pth -c config.json -s quevedo -t 3 -fm crepe -o output.wav
 ```
+### Useful CLI arguments:
+*   `-m` / `--model-path`: Path to the generator checkpoint (`G_777.pth`).
+*   `-c` / `--config-path`: Path to the configuration file (`config.json`).
+*   `-s` / `--spk-list`: Speaker name (`quevedo`).
+*   `-t` / `--trans`: Pitch shift in semitones (negative numbers shift pitch down, positive numbers shift pitch up).
+*   `-fm` / `--f0-method`: Pitch tracking algorithm. Recommended choices: `crepe` (highest accuracy) or `dio` (fastest).
 ---
+## 🐍 Python API Usage
+To run voice conversion programmatically inside a custom Python script:
 ```python
 from pathlib import Path
 from so_vits_svc_fork.inference.main import infer
+# Configure paths
+input_audio = Path("vocals_input.wav")
 output_audio = Path("quevedo_output.wav")
 model_path = Path("G_777.pth")
 config_path = Path("config.json")
+# Execute inference
 infer(
     input_path=input_audio,
     output_path=output_audio,
     config_path=config_path,
     recursive=False,
     speaker="quevedo",
+    transpose=0,              # Adjust if input vocals are in a different octave
+    auto_predict_f0=False,    # Keep False for singing (preserves melody), True for speaking
+    f0_method="crepe",        # Crepe offers the highest quality pitch extraction
     noise_scale=0.4
 )
+print(f"Conversion complete: {output_audio}")
 ```
 ---
+## 🎨 Gradio WebUI Interface
+The repository contains a sleek, modern, web interface built with Gradio. To run it locally:
 ```bash
 python app.py
 ```
+Once it starts, navigate to `http://localhost:7860` in your web browser.
+### UI Highlights:
+- **Drag & Drop Upload**: Easily upload any WAV/MP3 files or record directly from your microphone.
+- **Visual Parameters Control**: Adjust Pitch Shift, F0 Predictor (`crepe`, `dio`, `harvest`), and Noise Scale interactively.
+- **Responsive Layout**: Designed with a clean glassmorphism dark-mode theme using customized indigo and purple gradients.
 ---
+## 🚀 Hugging Face Spaces Deployment
+To make this model interactive online for public use without requiring local installation:
+1. Create a new **Space** on your Hugging Face account.
+2. Select **Gradio** as the Space SDK.
+3. Choose your hardware (a free CPU basic instance is fine, but GPU hardware speeds up inference considerably).
+4. Upload all files from this repository to the Space (including `app.py`, `requirements.txt`, `config.json`, `G_777.pth` and the `assets/` folder).
+5. The Space will build and deploy the WebUI automatically.
 ---
+## 💡 Optimization & Tuning Tips
+Follow these guidelines to achieve the best output vocal quality for Quevedo:
+*   **Pitch Adjustments**: Quevedo has a deep, resonant baritone singing range.
+    *   If the source vocals are from a **female singer**, apply a negative pitch shift (typically **-8 to -12 semitones**).
+    *   If the source vocals are from a **male tenor singer**, shift down by **-3 to -6 semitones**.
+    *   If the source vocals are already in a **deep baritone range**, keep the transposition at **0**.
+*   **Singing vs. Speech**:
+    *   For **songs**, disable `Auto Predict F0` to maintain the precise pitch notes of the original track.
+    *   For **speech/voice acting**, enable `Auto Predict F0` so the model generates natural speech intonation.
+*   **Vocal Preparation**:
+    *   Input audio files must be clean, dry acapellas. Background instruments, beats, reverb, noise, or echo will distort the output audio.
+    *   For long inputs (more than 45 seconds), slice the audio into smaller files to avoid running out of memory (OOM).
 ---
+## ⚠️ Ethical Disclaimer
+This model is intended for artistic, research, and educational purposes. **It should not be used to impersonate individuals for fraudulent, misleading, or defamatory purposes.**
+*   If you share covers or musical works created using this model, please label them clearly as AI covers (e.g., "AI Cover").
+*   Respect local regulations and the moral rights of the original artist. The author of this repository is not responsible for malicious usage by third parties.

app.py CHANGED Viewed

@@ -22,7 +22,7 @@ if CONFIG_PATH.exists():
 # Inference function
 def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, noise_scale):
     if input_audio is None:
-        return None, "Por favor, sube un archivo de audio o usa el micrófono / Please upload an audio file or use the microphone."
     input_path = Path(input_audio)
     output_path = input_path.parent / f"{input_path.stem}_quevedo.wav"
@@ -32,16 +32,14 @@ def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, n
         from so_vits_svc_fork.inference.main import infer
     except ImportError:
         return None, (
-            "Error: 'so-vits-svc-fork' no está instalado en este entorno.\n"
-            "Ejecuta: pip install so-vits-svc-fork\n\n"
             "Error: 'so-vits-svc-fork' is not installed in this environment.\n"
-            "Run: pip install so-vits-svc-fork"
         )
     if not MODEL_PATH.exists():
-        return None, f"Error: No se encontró el archivo del modelo {MODEL_PATH} / Model file {MODEL_PATH} not found."
     if not CONFIG_PATH.exists():
-        return None, f"Error: No se encontró el archivo de configuración {CONFIG_PATH} / Config file {CONFIG_PATH} not found."
     try:
         # Perform inference using the fork's main infer function
@@ -59,12 +57,12 @@ def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, n
         )
         if output_path.exists():
-            return str(output_path), "¡Conversión completada con éxito! / Conversion completed successfully!"
         else:
-            return None, "Error al generar el archivo de salida / Output file was not generated."
     except Exception as e:
-        return None, f"Error durante la inferencia / Error during inference: {str(e)}"
 # Custom CSS for premium styling matching the blue-purple theme
 custom_css = """
@@ -127,26 +125,24 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
     with gr.Row(elem_classes=["header-area"]):
         gr.HTML(
             "<h1 class='header-title'>🗣️ Quevedo Voice Model (so-vits-svc-fork)</h1>"
-            "<p class='header-desc'>Convierte cualquier archivo de voz o canto en la voz de Quevedo.</p>"
-            "<p style='color: #6b7280; font-size: 0.9rem; margin-top: -10px;'>"
-            "Convert any voice or singing file into the voice of Quevedo.</p>"
         )
     # Main conversion section
     with gr.Row(elem_classes=["main-box"]):
         with gr.Column(scale=1):
-            gr.Markdown("### 📥 1. Entrada de Audio / Audio Input")
             input_audio = gr.Audio(
-                label="Audio a Convertir (Voz limpia / Acapella)",
                 type="filepath",
                 sources=["upload", "microphone"]
             )
-            gr.Markdown("### ⚙️ 2. Parámetros de Conversión / Parameters")
             speaker = gr.Dropdown(
                 choices=speakers,
                 value=speakers[0],
-                label="Speaker (Personaje)"
             )
             transpose = gr.Slider(
@@ -154,22 +150,22 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
                 maximum=12,
                 value=0,
                 step=1,
-                label="Ajuste de Tono (Semitonos) / Pitch Shift (Semitones)",
-                info="Aumenta para voces femeninas a masculinas (ej: -5 a -12), o disminuye para masculinas a femeninas."
             )
-            with gr.Accordion("Parámetros Avanzados / Advanced Options", open=False):
                 auto_predict_f0 = gr.Checkbox(
                     value=False,
-                    label="Auto Predicción de F0 / Auto Predict F0",
-                    info="Recomendado para voz hablada. DESMARCAR para canto / Recommended for speech, UNCHECK for singing."
                 )
                 f0_method = gr.Dropdown(
                     choices=["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"],
                     value="crepe",
-                    label="F0 Predictor (Algoritmo de Tono)",
-                    info="crepe es el de mayor calidad pero más lento; dio es el más rápido."
                 )
                 noise_scale = gr.Slider(
@@ -177,25 +173,25 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
                     maximum=1.0,
                     value=0.4,
                     step=0.05,
-                    label="Escala de Ruido / Noise Scale",
-                    info="Controla la vibración/varianza del tono (0.4 es estándar)."
                 )
         with gr.Column(scale=1):
-            gr.Markdown("### 📤 3. Resultado / Output")
             output_audio = gr.Audio(
-                label="Audio Convertido (Quevedo)",
                 type="filepath"
             )
             status_output = gr.Textbox(
-                label="Estado / Status",
-                value="Listo / Ready",
                 interactive=False
             )
             submit_btn = gr.Button(
-                "Convertir Voz / Convert 🚀",
                 variant="primary",
                 elem_classes=["convert-btn"]
             )
@@ -209,8 +205,8 @@ with gr.Blocks(css=custom_css, theme=gr.themes.Soft()) as demo:
     # Footer
     gr.HTML(
         "<div style='text-align: center; color: #4b5563; font-size: 0.85rem; padding: 20px 0;'>"
-        "Modelo para fines de demostración artística y de investigación. "
-        "Usa so-vits-svc-fork para la inferencia.<br>"
         "Developed with 💜 for the open voice community.</div>"
     )

 # Inference function
 def convert_voice(input_audio, speaker, transpose, auto_predict_f0, f0_method, noise_scale):
     if input_audio is None:
+        return None, "Please upload an audio file or use the microphone."
     input_path = Path(input_audio)
     output_path = input_path.parent / f"{input_path.stem}_quevedo.wav"
         from so_vits_svc_fork.inference.main import infer
     except ImportError:
         return None, (
             "Error: 'so-vits-svc-fork' is not installed in this environment.\n"
+            "Please run: pip install so-vits-svc-fork"
         )
     if not MODEL_PATH.exists():
+        return None, f"Error: Model file {MODEL_PATH} not found."
     if not CONFIG_PATH.exists():
+        return None, f"Error: Config file {CONFIG_PATH} not found."
     try:
         # Perform inference using the fork's main infer function
         )
         if output_path.exists():
+            return str(output_path), "Conversion completed successfully!"
         else:
+            return None, "Error: Output file was not generated."
     except Exception as e:
+        return None, f"Error during inference: {str(e)}"
 # Custom CSS for premium styling matching the blue-purple theme
 custom_css = """
     with gr.Row(elem_classes=["header-area"]):
         gr.HTML(
             "<h1 class='header-title'>🗣️ Quevedo Voice Model (so-vits-svc-fork)</h1>"
+            "<p class='header-desc'>Convert any voice or singing file into the voice of the Spanish singer Quevedo.</p>"
         )
     # Main conversion section
     with gr.Row(elem_classes=["main-box"]):
         with gr.Column(scale=1):
+            gr.Markdown("### 📥 1. Audio Input")
             input_audio = gr.Audio(
+                label="Audio to Convert (Clean Vocals / Acapella)",
                 type="filepath",
                 sources=["upload", "microphone"]
             )
+            gr.Markdown("### ⚙️ 2. Conversion Parameters")
             speaker = gr.Dropdown(
                 choices=speakers,
                 value=speakers[0],
+                label="Speaker Name"
             )
             transpose = gr.Slider(
                 maximum=12,
                 value=0,
                 step=1,
+                label="Pitch Shift (Semitones)",
+                info="Increase for female-to-male voices (e.g. -5 to -12), or decrease for male-to-female."
             )
+            with gr.Accordion("Advanced Options", open=False):
                 auto_predict_f0 = gr.Checkbox(
                     value=False,
+                    label="Auto Predict F0",
+                    info="Recommended for speech/narration. UNCHECK for singing to preserve notes."
                 )
                 f0_method = gr.Dropdown(
                     choices=["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"],
                     value="crepe",
+                    label="F0 Predictor Algorithm",
+                    info="crepe offers the best quality but is slower; dio is the fastest."
                 )
                 noise_scale = gr.Slider(
                     maximum=1.0,
                     value=0.4,
                     step=0.05,
+                    label="Noise Scale",
+                    info="Controls pitch variance and expressiveness (0.4 is standard)."
                 )
         with gr.Column(scale=1):
+            gr.Markdown("### 📤 3. Output Audio")
             output_audio = gr.Audio(
+                label="Converted Audio",
                 type="filepath"
             )
             status_output = gr.Textbox(
+                label="Status",
+                value="Ready",
                 interactive=False
             )
             submit_btn = gr.Button(
+                "Convert Voice 🚀",
                 variant="primary",
                 elem_classes=["convert-btn"]
             )
     # Footer
     gr.HTML(
         "<div style='text-align: center; color: #4b5563; font-size: 0.85rem; padding: 20px 0;'>"
+        "This model is for artistic demonstration and research purposes only. "
+        "Uses so-vits-svc-fork for inference.<br>"
         "Developed with 💜 for the open voice community.</div>"
     )

assets/banner.png CHANGED Viewed

Git LFS Details

SHA256: f5b1a24adac630a6dd86b26eda89683e2cb7ee9234d317ea4b09baabf0dcc10d
Pointer size: 131 Bytes
Size of remote file: 913 kB

Git LFS Details

SHA256: 5dc3064d6848675057578fa442c526bb9ef846570210c212b2cb4a74943130be
Pointer size: 128 Bytes
Size of remote file: 131 Bytes