Spaces:

IAsistemofinteres
/

test

Build error

App Files Files Community

IAsistemofinteres commited on Jan 18

Commit

7358212

verified ·

1 Parent(s): 407cefd

Upload 7 files

Browse files

Files changed (8) hide show

.gitattributes +1 -0
.gitignore +48 -0
COLAB_SETUP.md +254 -0
Chatterbox_TTS_Colab.ipynb +614 -0
Run Chatterbox TTS.bat +10 -0
VC_redist.x64.exe +3 -0
app.py +505 -0
requirements.txt +14 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+VC_redist.x64.exe filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,48 @@

+.vscode
+# Pylance
+pyrightconfig.json
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+syn_out/
+checkpoints/
+.gradio
+# Ignore generated sample .wav files
+**/*.wav

COLAB_SETUP.md ADDED Viewed

	@@ -0,0 +1,254 @@

+# 🚀 Google Colab Setup Guide for Chatterbox TTS
+This guide will help you run Chatterbox TTS on Google Colab for free!
+## 📋 Prerequisites
+- Google account (for Google Colab access)
+- Your Chatterbox TTS project files
+## 🎯 Quick Start
+### Option 1: Using GitHub Repository (Recommended)
+1. **Push your code to GitHub:**
+   ```bash
+   git init
+   git add .
+   git commit -m "Initial commit"
+   git remote add origin https://github.com/YOUR_USERNAME/chatterbox-tts.git
+   git push -u origin main
+   ```
+2. **Upload the notebook to Google Colab:**
+   - Go to [Google Colab](https://colab.research.google.com/)
+   - Click `File` → `Upload notebook`
+   - Upload `Chatterbox_TTS_Colab.ipynb`
+3. **Update the repository URL:**
+   - In the notebook, find Step 2 (Clone Repository)
+   - Replace `YOUR_USERNAME` with your actual GitHub username
+4. **Run all cells:**
+   - Click `Runtime` → `Run all`
+   - Wait for setup to complete (~5-10 minutes)
+   - Click the public URL when it appears
+### Option 2: Manual Upload (No GitHub Required)
+1. **Prepare your project:**
+   - Create a ZIP file of your entire project folder
+   - Include: `app.py`, `modules/`, `src/`, `requirements.txt`
+2. **Upload to Colab:**
+   - Go to [Google Colab](https://colab.research.google.com/)
+   - Upload `Chatterbox_TTS_Colab.ipynb`
+   - In Step 2, uncomment the "Alternative: Upload Files Manually" cell
+   - Run that cell and upload your ZIP file
+3. **Run the notebook:**
+   - Run all remaining cells
+   - Wait for the Gradio interface to launch
+## ⚙️ Configuration
+### Enable GPU (Highly Recommended)
+1. Click `Runtime` → `Change runtime type`
+2. Select `T4 GPU` under Hardware accelerator
+3. Click `Save`
+This will significantly speed up model loading and inference!
+### Adjust Settings
+You can modify these in the notebook:
+- **Model download location**: Models are cached in `/root/.cache/huggingface/`
+- **Gradio share link**: Set `share=True` for public URL (default)
+- **Queue settings**: Adjust `max_size` and `concurrency_limit` as needed
+## 📦 What Gets Installed
+The notebook automatically installs:
+- PyTorch 2.7.1 with CUDA support
+- Gradio 5.44.1
+- All Chatterbox TTS dependencies
+- System packages (ffmpeg)
+Total installation time: ~3-5 minutes
+## 🎮 Using the Application
+Once launched, you'll see a public URL like:
+```
+Running on public URL: https://xxxxx.gradio.live
+```
+Click this URL to access your TTS application from anywhere!
+### Features Available:
+1. **⚡ Turbo TTS** - Fast synthesis with paralinguistic tags
+2. **🎤 TTS Main** - Advanced English TTS
+3. **🌍 Multilingual TTS** - Multiple language support
+4. **🔄 Voice Conversion** - Convert voices
+5. **🧬 Clone Voice** - Clone custom voices
+## ⚠️ Important Notes
+### Session Limits
+- **Free Colab**: ~12 hours max session time
+- **GPU usage**: Limited hours per week
+- **Files are temporary**: Download outputs before session ends!
+### Saving Your Work
+To save generated audio:
+1. Right-click on the audio player
+2. Select "Download"
+3. Or use the download button in Gradio
+### Keeping Session Alive
+Colab disconnects after inactivity. To prevent this:
+- Install [Colab Keep Alive](https://chrome.google.com/webstore/detail/colab-alive/eookkckfbbgnhdgcbfbicoahejkdoele) extension
+- Or periodically interact with the notebook
+## 🐛 Troubleshooting
+### "Out of Memory" Error
+**Solution:**
+```python
+# Restart runtime
+Runtime → Restart runtime
+# Or upgrade to Colab Pro for more RAM
+```
+### "Module Not Found" Error
+**Solution:**
+```python
+# Re-run Step 1 (Install Dependencies)
+# Make sure all cells complete without errors
+```
+### Models Won't Download
+**Solution:**
+1. Check internet connection
+2. Verify HuggingFace Hub is accessible
+3. Try running Step 4 again
+4. Check if `token=False` is set (no authentication required)
+### Import Errors
+**Solution:**
+```python
+# Make sure project structure is correct:
+# ├── app.py
+# ├── modules/
+# │   ├── config.py
+# │   ├── generation_functions.py
+# │   ├── model_manager.py
+# │   ├── ui_components.py
+# │   └── voice_manager.py
+# └── src/
+#     └── chatterbox/
+#         ├── tts.py
+#         └── tts_turbo.py
+```
+### Gradio Won't Launch
+**Solution:**
+```python
+# Check if port is already in use
+# Restart runtime and try again
+# Make sure all previous cells ran successfully
+```
+## 🚀 Performance Optimization
+### For Faster Loading:
+1. **Pre-download models** (Step 4)
+   - Run this cell first
+   - Models will be cached for future use
+2. **Use GPU runtime**
+   - T4 GPU is free and fast
+   - Significantly faster than CPU
+3. **Reduce queue size**
+   ```python
+   demo.queue(max_size=10)  # Lower = less memory
+   ```
+### For Better Quality:
+1. **Adjust generation parameters**
+   - Temperature: 0.7-0.9 for more natural speech
+   - CFG weight: 0.3-0.7 for better control
+2. **Use high-quality voice samples**
+   - 5+ seconds of clear audio
+   - Minimal background noise
+## 📊 Resource Usage
+Typical resource consumption:
+| Component | RAM | GPU Memory | Time |
+|-----------|-----|------------|------|
+| Setup | 2 GB | 0 GB | 3-5 min |
+| Model Load | 4 GB | 3-4 GB | 1-2 min |
+| Inference | 6 GB | 4-5 GB | 5-15 sec |
+**Recommended**: Colab Pro for heavy usage
+## 🔗 Useful Links
+- [Google Colab](https://colab.research.google.com/)
+- [Colab FAQ](https://research.google.com/colaboratory/faq.html)
+- [HuggingFace Hub](https://huggingface.co/)
+- [Gradio Documentation](https://gradio.app/docs/)
+## 💡 Pro Tips
+1. **Save notebook to Google Drive**
+   - File → Save a copy in Drive
+   - Your changes will persist
+2. **Mount Google Drive for persistent storage**
+   ```python
+   from google.colab import drive
+   drive.mount('/content/drive')
+   ```
+3. **Monitor GPU usage**
+   ```python
+   !nvidia-smi
+   ```
+4. **Clear outputs to save space**
+   - Edit → Clear all outputs
+5. **Use keyboard shortcuts**
+   - `Ctrl+Enter`: Run cell
+   - `Shift+Enter`: Run cell and select next
+   - `Ctrl+M B`: Insert cell below
+## 🎉 You're All Set!
+Enjoy using Chatterbox TTS on Google Colab!
+For issues or questions, please open an issue on GitHub.
+---
+**Happy Synthesizing! 🎙️**

Chatterbox_TTS_Colab.ipynb ADDED Viewed

	@@ -0,0 +1,614 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "header"
+   },
+   "source": [
+    "# 🎙️ Chatterbox TTS - Google Colab Edition\n",
+    "\n",
+    "High-quality voice cloning, text-to-speech & voice conversion powered by Chatterbox.\n",
+    "\n",
+    "**Features:**\n",
+    "- ⚡ **Turbo TTS** - Fast, high-quality speech synthesis with paralinguistic tags\n",
+    "- 🎤 **TTS Main** - Advanced English TTS with fine-tuned controls\n",
+    "- 🌍 **Multilingual TTS** - Support for multiple languages\n",
+    "- 🔄 **Voice Conversion** - Convert any voice to your target voice\n",
+    "- 🧬 **Voice Cloning** - Clone voices from audio samples\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "setup-header"
+   },
+   "source": [
+    "## 📦 Step 1: Setup Environment\n",
+    "\n",
+    "This will install all required dependencies. Takes ~3-5 minutes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "install-dependencies"
+   },
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "# Install system dependencies\n",
+    "!apt-get update -qq\n",
+    "!apt-get install -y -qq ffmpeg\n",
+    "\n",
+    "# Install Python packages\n",
+    "!pip install -q numpy>=1.24.0,<1.26.0\n",
+    "!pip install -q librosa==0.11.0\n",
+    "!pip install -q s3tokenizer\n",
+    "!pip install -q torch==2.7.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu118\n",
+    "!pip install -q transformers==4.46.3\n",
+    "!pip install -q diffusers==0.29.0\n",
+    "!pip install -q resemble-perth==1.0.1\n",
+    "!pip install -q conformer==0.3.2\n",
+    "!pip install -q safetensors==0.5.3\n",
+    "!pip install -q pykakasi==2.3.0\n",
+    "!pip install -q gradio==5.44.1\n",
+    "!pip install -q pyloudnorm\n",
+    "!pip install -q omegaconf\n",
+    "!pip install -q huggingface_hub\n",
+    "\n",
+    "print(\"✅ All dependencies installed successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "clone-header"
+   },
+   "source": [
+    "## 📥 Step 2: Clone Repository\n",
+    "\n",
+    "Download the Chatterbox TTS application code."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "clone-repo"
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Clone the repository (replace with your actual repo URL)\n",
+    "if not os.path.exists('chatterbox-tts'):\n",
+    "    !git clone https://github.com/YOUR_USERNAME/chatterbox-tts.git\n",
+    "    print(\"✅ Repository cloned!\")\n",
+    "else:\n",
+    "    print(\"ℹ️ Repository already exists\")\n",
+    "\n",
+    "# Change to project directory\n",
+    "%cd chatterbox-tts"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "alternative-setup"
+   },
+   "source": [
+    "### Alternative: Upload Files Manually\n",
+    "\n",
+    "If you don't have a GitHub repo, run this cell to upload your project files:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "upload-files"
+   },
+   "outputs": [],
+   "source": [
+    "# Uncomment and run this if you want to upload files manually\n",
+    "# from google.colab import files\n",
+    "# import zipfile\n",
+    "\n",
+    "# print(\"Please upload your project as a ZIP file:\")\n",
+    "# uploaded = files.upload()\n",
+    "\n",
+    "# for filename in uploaded.keys():\n",
+    "#     print(f\"Extracting {filename}...\")\n",
+    "#     with zipfile.ZipFile(filename, 'r') as zip_ref:\n",
+    "#         zip_ref.extractall('.')\n",
+    "#     print(\"✅ Files extracted!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "verify-header"
+   },
+   "source": [
+    "## 🔍 Step 3: Verify Installation\n",
+    "\n",
+    "Check if everything is set up correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "verify-setup"
+   },
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "import torch\n",
+    "\n",
+    "# Check CUDA availability\n",
+    "print(f\"🔧 Python version: {sys.version}\")\n",
+    "print(f\"🔥 PyTorch version: {torch.__version__}\")\n",
+    "print(f\"🎮 CUDA available: {torch.cuda.is_available()}\")\n",
+    "if torch.cuda.is_available():\n",
+    "    print(f\"🎮 CUDA device: {torch.cuda.get_device_name(0)}\")\n",
+    "    print(f\"💾 CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB\")\n",
+    "\n",
+    "# Verify project structure\n",
+    "required_files = ['app.py', 'requirements.txt', 'src', 'modules']\n",
+    "for item in required_files:\n",
+    "    if os.path.exists(item):\n",
+    "        print(f\"✅ Found: {item}\")\n",
+    "    else:\n",
+    "        print(f\"❌ Missing: {item}\")\n",
+    "\n",
+    "print(\"\\n✅ Setup verification complete!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "download-models-header"
+   },
+   "source": [
+    "## 🤖 Step 4: Download Models (Optional Pre-download)\n",
+    "\n",
+    "Pre-download the models to speed up first use. This is optional - models will auto-download when first used."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "download-models"
+   },
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import snapshot_download\n",
+    "\n",
+    "print(\"📥 Downloading Chatterbox Turbo model...\")\n",
+    "snapshot_download(\n",
+    "    repo_id=\"ResembleAI/chatterbox-turbo\",\n",
+    "    token=False,  # Use public access without authentication\n",
+    "    allow_patterns=[\"*.safetensors\", \"*.json\", \"*.txt\", \"*.pt\", \"*.model\"]\n",
+    ")\n",
+    "print(\"✅ Turbo model downloaded!\")\n",
+    "\n",
+    "print(\"\\n📥 Downloading Chatterbox main model...\")\n",
+    "from huggingface_hub import hf_hub_download\n",
+    "for fpath in [\"ve.safetensors\", \"t3_cfg.safetensors\", \"s3gen.safetensors\", \"tokenizer.json\", \"conds.pt\"]:\n",
+    "    hf_hub_download(repo_id=\"ResembleAI/chatterbox\", filename=fpath)\n",
+    "print(\"✅ Main model downloaded!\")\n",
+    "\n",
+    "print(\"\\n🎉 All models ready!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "launch-header"
+   },
+   "source": [
+    "## 🚀 Step 5: Launch Application\n",
+    "\n",
+    "Start the Gradio interface. Click the public URL to access the app!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "launch-app"
+   },
+   "outputs": [],
+   "source": [
+    "# Add project paths\n",
+    "import sys\n",
+    "import os\n",
+    "\n",
+    "project_root = os.getcwd()\n",
+    "if project_root not in sys.path:\n",
+    "    sys.path.append(project_root)\n",
+    "\n",
+    "src_path = os.path.join(project_root, \"src\")\n",
+    "if src_path not in sys.path:\n",
+    "    sys.path.append(src_path)\n",
+    "\n",
+    "# Import and launch\n",
+    "import gradio as gr\n",
+    "from modules.config import LANGUAGE_CONFIG, SUPPORTED_LANGUAGES\n",
+    "from modules.voice_manager import (\n",
+    "    load_voices, \n",
+    "    get_voices_for_language, \n",
+    "    get_all_voices_with_gender,\n",
+    "    resolve_voice_path,\n",
+    "    clone_voice,\n",
+    "    delete_voice\n",
+    ")\n",
+    "from modules.generation_functions import (\n",
+    "    generate_speech,\n",
+    "    generate_multilingual_speech,\n",
+    "    convert_voice,\n",
+    "    generate_turbo_speech\n",
+    ")\n",
+    "from modules.ui_components import (\n",
+    "    create_header,\n",
+    "    create_tts_tab,\n",
+    "    create_multilingual_tab,\n",
+    "    create_voice_conversion_tab,\n",
+    "    create_clone_voice_tab,\n",
+    "    create_turbo_tab\n",
+    ")\n",
+    "\n",
+    "# Load voices\n",
+    "available_voices = load_voices()\n",
+    "\n",
+    "# Custom CSS\n",
+    "CUSTOM_CSS = \"\"\"\n",
+    ".tag-container {\n",
+    "    display: flex !important;\n",
+    "    flex-wrap: wrap !important;\n",
+    "    gap: 8px !important;\n",
+    "    margin-top: 5px !important;\n",
+    "    margin-bottom: 10px !important;\n",
+    "    border: none !important;\n",
+    "    background: transparent !important;\n",
+    "}\n",
+    ".tag-btn {\n",
+    "    min-width: fit-content !important;\n",
+    "    width: auto !important;\n",
+    "    height: 32px !important;\n",
+    "    font-size: 13px !important;\n",
+    "    background: #eef2ff !important;\n",
+    "    border: 1px solid #c7d2fe !important;\n",
+    "    color: #3730a3 !important;\n",
+    "    border-radius: 6px !important;\n",
+    "    padding: 0 10px !important;\n",
+    "    margin: 0 !important;\n",
+    "    box-shadow: none !important;\n",
+    "}\n",
+    ".tag-btn:hover {\n",
+    "    background: #c7d2fe !important;\n",
+    "    transform: translateY(-1px);\n",
+    "}\n",
+    "\"\"\"\n",
+    "\n",
+    "# Create Gradio app\n",
+    "with gr.Blocks(title=\"Chatterbox TTS Enhanced\", theme=gr.themes.Soft(), css=CUSTOM_CSS) as demo:\n",
+    "    # State variables\n",
+    "    tts_model_state = gr.State(None)\n",
+    "    vc_model_state = gr.State(None)\n",
+    "    mtl_model_state = gr.State(None)\n",
+    "    \n",
+    "    # Header\n",
+    "    create_header()\n",
+    "    \n",
+    "    # Create tabs\n",
+    "    with gr.Tab(\"⚡ Turbo TTS\"):\n",
+    "        turbo_components = create_turbo_tab()\n",
+    "\n",
+    "    with gr.Tab(\"🎤 TTS Main (English)\"):\n",
+    "        tts_components = create_tts_tab()\n",
+    "    \n",
+    "    with gr.Tab(\"🌍 Multilingual TTS\"):\n",
+    "        mtl_components = create_multilingual_tab()\n",
+    "    \n",
+    "    with gr.Tab(\"🔄 Voice Conversion\"):\n",
+    "        vc_components = create_voice_conversion_tab()\n",
+    "    \n",
+    "    with gr.Tab(\"🧬 Clone Voice\"):\n",
+    "        clone_components = create_clone_voice_tab()\n",
+    "    \n",
+    "    # Event Handlers - TTS Tab\n",
+    "    tts_components['generate_btn'].click(\n",
+    "        fn=generate_speech,\n",
+    "        inputs=[\n",
+    "            tts_components['text'],\n",
+    "            tts_components['voice_select'],\n",
+    "            tts_components['exaggeration'],\n",
+    "            tts_components['temp'],\n",
+    "            tts_components['seed_num'],\n",
+    "            tts_components['cfg_weight'],\n",
+    "            tts_components['min_p'],\n",
+    "            tts_components['top_p'],\n",
+    "            tts_components['repetition_penalty']\n",
+    "        ],\n",
+    "        outputs=[\n",
+    "            tts_components['progress_bar'],\n",
+    "            tts_components['audio_output'],\n",
+    "            tts_components['status_box']\n",
+    "        ]\n",
+    "    )\n",
+    "    \n",
+    "    def update_tts_preview(voice_name):\n",
+    "        path = resolve_voice_path(voice_name, \"en\")\n",
+    "        return path\n",
+    "\n",
+    "    tts_components['voice_select'].change(\n",
+    "        fn=update_tts_preview,\n",
+    "        inputs=[tts_components['voice_select']],\n",
+    "        outputs=[tts_components['preview_audio']]\n",
+    "    )\n",
+    "    \n",
+    "    # Event Handlers - Turbo Tab\n",
+    "    turbo_components['generate_btn'].click(\n",
+    "        fn=generate_turbo_speech,\n",
+    "        inputs=[\n",
+    "            turbo_components['text'],\n",
+    "            turbo_components['voice_select']\n",
+    "        ],\n",
+    "        outputs=[\n",
+    "            turbo_components['progress_bar'],\n",
+    "            turbo_components['audio_output'],\n",
+    "            turbo_components['status_box']\n",
+    "        ]\n",
+    "    )\n",
+    "    \n",
+    "    def update_turbo_preview(voice_name):\n",
+    "        path = resolve_voice_path(voice_name, \"en\")\n",
+    "        return path\n",
+    "\n",
+    "    turbo_components['voice_select'].change(\n",
+    "        fn=update_turbo_preview,\n",
+    "        inputs=[turbo_components['voice_select']],\n",
+    "        outputs=[turbo_components['preview_audio']]\n",
+    "    )\n",
+    "    \n",
+    "    # Tag insertion buttons (Turbo)\n",
+    "    INSERT_TAG_JS = \"\"\"\n",
+    "    (tag_val, current_text) => {\n",
+    "        const textarea = document.querySelector('#turbo_textbox textarea');\n",
+    "        if (!textarea) return current_text + \" \" + tag_val;\n",
+    "        const start = textarea.selectionStart;\n",
+    "        const end = textarea.selectionEnd;\n",
+    "        let prefix = \" \";\n",
+    "        let suffix = \" \";\n",
+    "        if (start === 0) prefix = \"\";\n",
+    "        else if (current_text[start - 1] === ' ') prefix = \"\";\n",
+    "        if (end < current_text.length && current_text[end] === ' ') suffix = \"\";\n",
+    "        return current_text.slice(0, start) + prefix + tag_val + suffix + current_text.slice(end);\n",
+    "    }\n",
+    "    \"\"\"\n",
+    "    \n",
+    "    # Tag button handlers\n",
+    "    for tag_name in ['clear_throat', 'sigh', 'shush', 'cough', 'groan', 'sniff', 'gasp', 'chuckle', 'laugh']:\n",
+    "        btn_key = f'btn_{tag_name}'\n",
+    "        if btn_key in turbo_components:\n",
+    "            turbo_components[btn_key].click(\n",
+    "                fn=None,\n",
+    "                inputs=[turbo_components[btn_key], turbo_components['text']],\n",
+    "                outputs=turbo_components['text'],\n",
+    "                js=INSERT_TAG_JS\n",
+    "            )\n",
+    "    \n",
+    "    # Event Handlers - Multilingual Tab\n",
+    "    mtl_components['generate_btn'].click(\n",
+    "        fn=generate_multilingual_speech,\n",
+    "        inputs=[\n",
+    "            mtl_components['text'],\n",
+    "            mtl_components['voice_select'],\n",
+    "            mtl_components['language_select'],\n",
+    "            mtl_components['exaggeration'],\n",
+    "            mtl_components['temp'],\n",
+    "            mtl_components['seed_num'],\n",
+    "            mtl_components['cfg_weight']\n",
+    "        ],\n",
+    "        outputs=[\n",
+    "            mtl_components['progress_bar'],\n",
+    "            mtl_components['audio_output'],\n",
+    "            mtl_components['status_box']\n",
+    "        ]\n",
+    "    )\n",
+    "    \n",
+    "    mtl_components['language_select'].change(\n",
+    "        fn=lambda lang: (\n",
+    "            LANGUAGE_CONFIG.get(lang, {}).get(\"text\", \"\"),\n",
+    "            gr.update(choices=get_voices_for_language(lang), value=f\"Default ({SUPPORTED_LANGUAGES.get(lang, lang)})\")\n",
+    "        ),\n",
+    "        inputs=[mtl_components['language_select']],\n",
+    "        outputs=[mtl_components['text'], mtl_components['voice_select']]\n",
+    "    )\n",
+    "    \n",
+    "    def update_mtl_preview(voice_name, language_code):\n",
+    "        path = resolve_voice_path(voice_name, language_code)\n",
+    "        return path\n",
+    "\n",
+    "    mtl_components['voice_select'].change(\n",
+    "        fn=update_mtl_preview,\n",
+    "        inputs=[mtl_components['voice_select'], mtl_components['language_select']],\n",
+    "        outputs=[mtl_components['sample_audio']]\n",
+    "    )\n",
+    "    \n",
+    "    # Event Handlers - Voice Conversion Tab\n",
+    "    vc_components['convert_btn'].click(\n",
+    "        fn=convert_voice,\n",
+    "        inputs=[vc_components['input_audio'], vc_components['target_voice_select']],\n",
+    "        outputs=[vc_components['progress_bar'], vc_components['audio_output'], vc_components['status_box']]\n",
+    "    )\n",
+    "    \n",
+    "    def update_vc_preview(voice_name):\n",
+    "        if voice_name == \"None\": \n",
+    "            return None\n",
+    "        \n",
+    "        clean_name = voice_name.replace(\" ♂️\", \"\").replace(\" ♀️\", \"\")\n",
+    "        \n",
+    "        if clean_name.startswith(\"Default (\"):\n",
+    "            lang_name = clean_name.split(\"(\")[1].split(\")\")[0]\n",
+    "            for code, name in SUPPORTED_LANGUAGES.items():\n",
+    "                if name == lang_name:\n",
+    "                    return LANGUAGE_CONFIG.get(code, {}).get(\"audio\")\n",
+    "        \n",
+    "        from modules.voice_manager import VOICES\n",
+    "        possible_names = [\n",
+    "            clean_name,\n",
+    "            f\"{clean_name}_male\",\n",
+    "            f\"{clean_name}_female\"\n",
+    "        ]\n",
+    "        \n",
+    "        for name in possible_names:\n",
+    "            if name in VOICES[\"samples\"]:\n",
+    "                return VOICES[\"samples\"][name]\n",
+    "        \n",
+    "        for code in SUPPORTED_LANGUAGES:\n",
+    "            for name in possible_names:\n",
+    "                full_name = f\"{name}_{code}\"\n",
+    "                if full_name in VOICES[\"samples\"]:\n",
+    "                    return VOICES[\"samples\"][full_name]\n",
+    "        \n",
+    "        return None\n",
+    "\n",
+    "    vc_components['target_voice_select'].change(\n",
+    "        fn=update_vc_preview,\n",
+    "        inputs=[vc_components['target_voice_select']],\n",
+    "        outputs=[vc_components['preview_audio']]\n",
+    "    )\n",
+    "    \n",
+    "    # Event Handlers - Clone Voice Tab\n",
+    "    clone_components['clone_btn'].click(\n",
+    "        fn=clone_voice,\n",
+    "        inputs=[\n",
+    "            clone_components['ref_audio_input'],\n",
+    "            clone_components['new_voice_name'],\n",
+    "            clone_components['voice_language'],\n",
+    "            clone_components['voice_gender']\n",
+    "        ],\n",
+    "        outputs=[clone_components['clone_status'], tts_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=get_voices_for_language(\"en\")),\n",
+    "        outputs=[tts_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=get_voices_for_language(\"en\")),\n",
+    "        outputs=[turbo_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda lang: gr.update(choices=get_voices_for_language(lang)),\n",
+    "        inputs=[mtl_components['language_select']],\n",
+    "        outputs=[mtl_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=[\"None\"] + get_all_voices_with_gender()),\n",
+    "        outputs=[vc_components['target_voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: \"\\n\".join(load_voices()) if load_voices() else \"No voices cloned yet\",\n",
+    "        outputs=[clone_components['current_voices_display']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=[\"None\"] + get_all_voices_with_gender(), value=\"None\"),\n",
+    "        outputs=[clone_components['voice_to_delete']]\n",
+    "    )\n",
+    "    \n",
+    "    clone_components['delete_btn'].click(\n",
+    "        fn=delete_voice,\n",
+    "        inputs=[clone_components['voice_to_delete']],\n",
+    "        outputs=[clone_components['delete_status'], clone_components['voice_to_delete']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=get_voices_for_language(\"en\")),\n",
+    "        outputs=[tts_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=get_voices_for_language(\"en\")),\n",
+    "        outputs=[turbo_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda lang: gr.update(choices=get_voices_for_language(lang)),\n",
+    "        inputs=[mtl_components['language_select']],\n",
+    "        outputs=[mtl_components['voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: gr.update(choices=[\"None\"] + get_all_voices_with_gender()),\n",
+    "        outputs=[vc_components['target_voice_select']]\n",
+    "    ).then(\n",
+    "        fn=lambda: \"\\n\".join(load_voices()) if load_voices() else \"No voices cloned yet\",\n",
+    "        outputs=[clone_components['current_voices_display']]\n",
+    "    )\n",
+    "\n",
+    "# Launch with public sharing enabled\n",
+    "print(\"\\n🚀 Launching Chatterbox TTS...\\n\")\n",
+    "demo.queue(\n",
+    "    max_size=50,\n",
+    "    default_concurrency_limit=1,\n",
+    ").launch(\n",
+    "    share=True,  # Create public URL\n",
+    "    debug=True,\n",
+    "    show_error=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "tips-header"
+   },
+   "source": [
+    "## 💡 Tips & Troubleshooting\n",
+    "\n",
+    "### Common Issues:\n",
+    "\n",
+    "1. **Out of Memory Error**\n",
+    "   - Use a smaller batch size\n",
+    "   - Restart runtime and clear outputs\n",
+    "   - Use Colab Pro for more RAM/GPU\n",
+    "\n",
+    "2. **Model Download Fails**\n",
+    "   - Check internet connection\n",
+    "   - Verify HuggingFace is accessible\n",
+    "   - Try running Step 4 again\n",
+    "\n",
+    "3. **Import Errors**\n",
+    "   - Restart runtime\n",
+    "   - Re-run Step 1 (dependencies)\n",
+    "   - Check if all files are uploaded\n",
+    "\n",
+    "### Performance Tips:\n",
+    "\n",
+    "- **Use GPU**: Runtime → Change runtime type → GPU (T4)\n",
+    "- **Keep session alive**: Install Colab Keep Alive extension\n",
+    "- **Save outputs**: Download generated audio before session expires\n",
+    "\n",
+    "### Colab Limitations:\n",
+    "\n",
+    "- Sessions timeout after ~12 hours of inactivity\n",
+    "- GPU usage is limited (use wisely)\n",
+    "- Files are temporary (download important outputs)\n",
+    "\n",
+    "---\n",
+    "\n",
+    "**Enjoy using Chatterbox TTS! 🎉**"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

Run Chatterbox TTS.bat ADDED Viewed

	@@ -0,0 +1,10 @@

+@echo off
+echo Starting Chatterbox Turbo TTS Enhanced App...
+echo.
+set PYTHON_DIR=%~dp0python
+set PYTHON_EXE=%PYTHON_DIR%\python.exe
+set PROJECT_DIR=%~dp0
+"%PYTHON_EXE%" "%PROJECT_DIR%app.py"
+pause

VC_redist.x64.exe ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8995548dfffcde7c49987029c764355612ba6850ee09a7b6f0fddc85bdc5c280
+size 18558944

app.py ADDED Viewed

	@@ -0,0 +1,505 @@

+"""
+Chatterbox TTS Enhanced - Monolithic Pro Edition (UI UPDATE)
+Fixes: Progress bar moved below audio output.
+"""
+import sys
+import os
+import glob
+import shutil
+import time
+import random
+import re
+import numpy as np
+import torch
+import gradio as gr
+from pathlib import Path
+import gc
+# ==============================================================================
+# 0. SETUP DE RUTAS Y LIBRERÍA
+# ==============================================================================
+PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
+SRC_PATH = os.path.join(PROJECT_ROOT, "src")
+# Añadir 'src' a las rutas de búsqueda
+if os.path.exists(SRC_PATH):
+    if SRC_PATH not in sys.path:
+        sys.path.append(SRC_PATH)
+        print(f"✅ Path 'src' añadido: {SRC_PATH}")
+else:
+    if PROJECT_ROOT not in sys.path:
+        sys.path.append(PROJECT_ROOT)
+# Intentar importar la librería REAL
+try:
+    from chatterbox.tts import ChatterboxTTS
+    from chatterbox.vc import ChatterboxVC
+    from chatterbox.mtl_tts import ChatterboxMultilingualTTS, SUPPORTED_LANGUAGES
+    from chatterbox.tts_turbo import ChatterboxTurboTTS
+    print("✅ Librería 'chatterbox' importada correctamente.")
+except ImportError as e:
+    print(f"❌ ERROR CRÍTICO: No se pudo importar 'chatterbox'. Detalle: {e}")
+    print("⚠️  EJECUTANDO EN MODO MOCK (SIN AUDIO REAL)")
+    SUPPORTED_LANGUAGES = {"en": "English", "es": "Spanish"}
+    class MockModel:
+        def __init__(self, *args, **kwargs): self.sr = 24000
+        @classmethod
+        def from_pretrained(cls, device): return cls()
+        def generate(self, *args, **kwargs):
+            time.sleep(1)
+            return torch.zeros(1, 48000)
+    ChatterboxTTS = ChatterboxVC = ChatterboxMultilingualTTS = ChatterboxTurboTTS = MockModel
+# ==============================================================================
+# 1. CONFIGURACIÓN Y UTILIDADES UI
+# ==============================================================================
+VOICE_WAV_ROOT = os.path.join(PROJECT_ROOT, "modules", "voice_wav")
+os.makedirs(VOICE_WAV_ROOT, exist_ok=True)
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+# --- GENERADOR DE BARRA DE PROGRESO HTML ---
+def make_progress_html(percentage, message="Ready"):
+    color = "linear-gradient(90deg, #f97316 0%, #fbbf24 100%)"
+    if percentage == 100: color = "#22c55e"
+    html = f"""
+    <div style="display: flex; flex-direction: column; gap: 5px; width: 100%; margin-top: 10px;">
+        <div style="width: 100%; background-color: #334155; border-radius: 99px; height: 24px; border: 1px solid #475569; overflow: hidden; position: relative; box-shadow: inset 0 2px 4px rgba(0,0,0,0.3);">
+            <div style="width: {percentage}%; background: {color}; height: 100%; transition: width 0.3s ease-out, background 0.3s ease;"></div>
+            <div style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; display: flex; align-items: center; justify-content: center; color: white; font-size: 11px; font-weight: bold; text-shadow: 0 1px 2px rgba(0,0,0,0.5);">
+                {int(percentage)}%
+            </div>
+        </div>
+        <p style="color: #cbd5e1; font-size: 0.9em; margin: 0; padding: 0; white-space: pre-wrap; font-family: monospace;">{message}</p>
+    </div>
+    """
+    return html
+# ==============================================================================
+# 2. MODEL MANAGER & VOICE MANAGER
+# ==============================================================================
+class ModelManager:
+    def __init__(self):
+        self.tts_model = None; self.mtl_model = None; self.vc_model = None; self.turbo_model = None
+        self.current_model_type = None
+    def unload_all(self):
+        self.tts_model = None; self.mtl_model = None; self.vc_model = None; self.turbo_model = None
+        if DEVICE == "cuda": torch.cuda.empty_cache(); gc.collect()
+        self.current_model_type = None
+    def get_model(self, type_key):
+        if self.current_model_type != type_key:
+            print(f"🔄 Switching to {type_key.upper()} model...")
+            self.unload_all()
+            try:
+                if type_key == "tts": self.tts_model = ChatterboxTTS.from_pretrained(DEVICE)
+                elif type_key == "mtl": self.mtl_model = ChatterboxMultilingualTTS.from_pretrained(DEVICE)
+                elif type_key == "vc": self.vc_model = ChatterboxVC.from_pretrained(DEVICE)
+                elif type_key == "turbo": self.turbo_model = ChatterboxTurboTTS.from_pretrained(device=DEVICE)
+                self.current_model_type = type_key
+                print(f"✅ {type_key.upper()} loaded.")
+            except Exception as e:
+                print(f"❌ Error loading {type_key}: {e}")
+                return None
+        return getattr(self, f"{type_key}_model")
+model_manager = ModelManager()
+# --- VOICE DB ---
+VOICE_DB = {}; CATEGORY_LIST = []; FLAT_PATH_MAP = {}; ALL_VOICES_FLAT_LIST = []
+def scan_voice_wav_structure():
+    global VOICE_DB, CATEGORY_LIST, FLAT_PATH_MAP, ALL_VOICES_FLAT_LIST
+    VOICE_DB = {}; CATEGORY_LIST = []; FLAT_PATH_MAP = {}; ALL_VOICES_FLAT_LIST = []
+    if not os.path.exists(VOICE_WAV_ROOT): return
+    print(f"📂 Escaneando voces en: {VOICE_WAV_ROOT}...")
+    for root, dirs, files in os.walk(VOICE_WAV_ROOT):
+        for f in files:
+            if f.endswith((".wav", ".mp3")):
+                full_path = os.path.join(root, f)
+                rel_path = os.path.relpath(full_path, VOICE_WAV_ROOT)
+                parts = rel_path.split(os.sep)
+                if len(parts) >= 2:
+                    raw_cat = parts[0]
+                    clean_cat = raw_cat.replace("ElevenLabs_", "").replace("_female", "").replace("_male", "").replace("_", " ").strip().title()
+                    if clean_cat not in VOICE_DB:
+                        VOICE_DB[clean_cat] = {}
+                        if clean_cat not in CATEGORY_LIST: CATEGORY_LIST.append(clean_cat)
+                    lang = "en"
+                    if len(parts) >= 3:
+                        possible_lang = parts[1].lower()
+                        if len(possible_lang) == 2:
+                            lang = possible_lang
+                    icon = "♀️" if "_female" in raw_cat.lower() else ("♂️" if "_male" in raw_cat.lower() else "🎙️")
+                    raw_name = os.path.splitext(f)[0]
+                    display = f"{icon} {raw_name}"
+                    if lang not in VOICE_DB[clean_cat]: VOICE_DB[clean_cat][lang] = {}
+                    VOICE_DB[clean_cat][lang][display] = full_path
+                    FLAT_PATH_MAP[display] = full_path
+                    if display not in ALL_VOICES_FLAT_LIST: ALL_VOICES_FLAT_LIST.append(display)
+    CATEGORY_LIST.sort()
+    print(f"✅ {len(ALL_VOICES_FLAT_LIST)} voces encontradas.")
+def resolve_voice_path(voice_name, lang="en"):
+    if not voice_name or voice_name == "None": return None
+    if voice_name in FLAT_PATH_MAP: return FLAT_PATH_MAP[voice_name]
+    for name, path in FLAT_PATH_MAP.items():
+        if voice_name in name: return path
+    return None
+def get_available_languages(category):
+    if not category or category not in VOICE_DB: return []
+    return [(f"{SUPPORTED_LANGUAGES.get(c, c)} ({c})", c) for c in sorted(VOICE_DB[category].keys())]
+def get_voices_for_ui(category, lang):
+    if category in VOICE_DB and lang in VOICE_DB[category]:
+        return sorted(list(VOICE_DB[category][lang].keys()))
+    return []
+def get_all_voices_list():
+    return sorted(ALL_VOICES_FLAT_LIST)
+# Inicializar
+scan_voice_wav_structure()
+# ==============================================================================
+# 3. FUNCIONES DE GENERACIÓN
+# ==============================================================================
+def set_seed(seed):
+    torch.manual_seed(seed)
+    if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)
+    random.seed(seed)
+    np.random.seed(seed)
+def format_time(seconds):
+    if seconds < 60: return f"{seconds:.1f}s"
+    return f"{int(seconds//60)}m {seconds%60:.1f}s"
+def estimate_generation_time(text_length):
+    return (text_length / 50) * 2 + 1
+def smart_chunk_text(text, max_words=40):
+    def has_cjk(text): return bool(re.search(r'[\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff]', text))
+    is_cjk = has_cjk(text)
+    sentences = re.split(r'(?<=[.!?。！？।؟])\s*|\n+', text)
+    chunks = []; current_chunk = []; current_count = 0
+    for sentence in sentences:
+        sentence = sentence.strip()
+        if not sentence: continue
+        count = len(re.sub(r'\s+', '', sentence)) if is_cjk else len(sentence.split())
+        if current_count + count > max_words:
+            if current_chunk:
+                chunks.append(''.join(current_chunk) if is_cjk else ' '.join(current_chunk))
+                current_chunk = []; current_count = 0
+        current_chunk.append(sentence); current_count += count
+    if current_chunk: chunks.append(''.join(current_chunk) if is_cjk else ' '.join(current_chunk))
+    return chunks if chunks else [text]
+def generate_speech(text, voice_name, exaggeration, temperature, seed_num, cfgw, min_p, top_p, repetition_penalty):
+    try:
+        start_time = time.time()
+        if not text.strip(): yield make_progress_html(0, "❌ Error: Texto vacío"), None; return
+        path = resolve_voice_path(voice_name, "en")
+        if not path: yield make_progress_html(0, "❌ Error: Voz no encontrada"), None; return
+        yield make_progress_html(20, "⚙️ Cargando modelo TTS..."), None
+        model = model_manager.get_model("tts")
+        if model is None: yield make_progress_html(0, "❌ Error modelo"), None; return
+        if seed_num != 0: set_seed(int(seed_num))
+        chunks = smart_chunk_text(text)
+        wavs = []
+        for i, chunk in enumerate(chunks):
+            pct = 30 + int((i / len(chunks)) * 60)
+            yield make_progress_html(pct, f"🎙️ Generando parte {i+1}/{len(chunks)}..."), None
+            w = model.generate(chunk, audio_prompt_path=path, exaggeration=exaggeration, temperature=temperature, cfg_weight=cfgw, min_p=min_p, top_p=top_p, repetition_penalty=repetition_penalty)
+            wavs.append(w)
+        yield make_progress_html(95, "🎹 Uniendo audio..."), None
+        full_wav = torch.cat(wavs, dim=-1) if len(wavs) > 1 else wavs[0]
+        yield make_progress_html(100, f"✅ Listo ({format_time(time.time()-start_time)})"), (model.sr, full_wav.squeeze(0).numpy())
+    except Exception as e:
+        yield make_progress_html(0, f"❌ Error: {str(e)}"), None
+def generate_turbo_speech(text, voice_name):
+    try:
+        start_time = time.time()
+        if not text.strip(): yield make_progress_html(0, "❌ Error: Texto vacío"), None; return
+        path = resolve_voice_path(voice_name, "en")
+        if not path: yield make_progress_html(0, "❌ Error: Voz no encontrada"), None; return
+        yield make_progress_html(20, "⚡ Cargando Turbo..."), None
+        model = model_manager.get_model("turbo")
+        if model is None: yield make_progress_html(0, "❌ Error Turbo"), None; return
+        chunks = smart_chunk_text(text)
+        wavs = []
+        for i, chunk in enumerate(chunks):
+            pct = 30 + int((i / len(chunks)) * 60)
+            yield make_progress_html(pct, f"⚡ Turbo chunk {i+1}/{len(chunks)}..."), None
+            w = model.generate(chunk, audio_prompt_path=path)
+            wavs.append(w)
+        full_wav = torch.cat(wavs, dim=-1) if len(wavs) > 1 else wavs[0]
+        yield make_progress_html(100, f"✅ Turbo listo ({format_time(time.time()-start_time)})"), (model.sr, full_wav.squeeze(0).numpy())
+    except Exception as e:
+        yield make_progress_html(0, f"❌ Error: {str(e)}"), None
+def generate_multilingual_speech(text, voice_name, lang_code, exaggeration, temperature, seed_num, cfgw):
+    try:
+        start_time = time.time()
+        path = resolve_voice_path(voice_name, lang_code)
+        yield make_progress_html(20, "🌍 Cargando Multi-TTS..."), None
+        model = model_manager.get_model("mtl")
+        if model is None: yield make_progress_html(0, "❌ Error modelo"), None; return
+        if seed_num != 0: set_seed(int(seed_num))
+        chunks = smart_chunk_text(text)
+        wavs = []
+        for i, chunk in enumerate(chunks):
+            pct = 30 + int((i / len(chunks)) * 60)
+            yield make_progress_html(pct, f"🌍 Generando ({lang_code}) {i+1}..."), None
+            w = model.generate(chunk, language_id=lang_code, audio_prompt_path=path, exaggeration=exaggeration, temperature=temperature, cfg_weight=cfgw)
+            wavs.append(w)
+        full_wav = torch.cat(wavs, dim=-1) if len(wavs) > 1 else wavs[0]
+        yield make_progress_html(100, "✅ Listo"), (model.sr, full_wav.squeeze(0).numpy())
+    except Exception as e:
+        yield make_progress_html(0, f"❌ Error: {str(e)}"), None
+def convert_voice(audio, target_voice):
+    try:
+        start_time = time.time()
+        path = resolve_voice_path(target_voice, "en")
+        if not path: yield make_progress_html(0, "❌ Error: Voz destino no válida"), None; return
+        yield make_progress_html(50, "🔄 Cargando VC..."), None
+        model = model_manager.get_model("vc")
+        if model is None: yield make_progress_html(0, "❌ Error VC"), None; return
+        yield make_progress_html(70, "🔄 Convirtiendo..."), None
+        w = model.generate(audio, target_voice_path=path)
+        yield make_progress_html(100, f"✅ Listo ({format_time(time.time()-start_time)})"), (model.sr, w.squeeze(0).numpy())
+    except Exception as e:
+        yield make_progress_html(0, f"❌ Error: {str(e)}"), None
+def clone_voice_wrapper(ref_audio, name, cat, lang, gender):
+    try:
+        if not name: return "Nombre requerido", gr.update()
+        cat_slug = cat.lower().replace(" ", "_")
+        gender_slug = gender.lower()
+        target_dir = os.path.join(VOICE_WAV_ROOT, f"{cat_slug}_{gender_slug}", lang)
+        os.makedirs(target_dir, exist_ok=True)
+        clean_name = "".join(x for x in name if x.isalnum() or x in " -_").strip()
+        dest = os.path.join(target_dir, f"{clean_name}.wav")
+        if os.path.exists(dest): return "❌ La voz ya existe", gr.update()
+        shutil.copy(ref_audio, dest)
+        scan_voice_wav_structure()
+        return f"✅ Clonada: {clean_name}", gr.update(choices=get_all_voices_list())
+    except Exception as e:
+        return f"❌ Error: {e}", gr.update()
+def delete_voice_wrapper(voice_name):
+    try:
+        path = resolve_voice_path(voice_name, "en")
+        if path and os.path.exists(path):
+            os.remove(path)
+            scan_voice_wav_structure()
+            return f"✅ Eliminada: {voice_name}", gr.update(choices=get_all_voices_list(), value=None)
+        return "❌ Archivo no encontrado", gr.update()
+    except Exception as e:
+        return f"❌ Error: {e}", gr.update()
+# ==============================================================================
+# 4. INTERFAZ GRÁFICA (UI)
+# ==============================================================================
+def create_header():
+    gr.HTML("""
+        <div style="text-align: center; margin-bottom: 1rem; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 25px; border-radius: 16px; color: white; box-shadow: 0 4px 15px rgba(0,0,0,0.2);">
+            <h1 style="font-size: 2.8em; margin: 0; font-weight: 800;">⚡ Chatterbox Turbo</h1>
+            <p style="font-size: 1.1em; opacity: 0.9; margin-top: 10px;">Pro Audio Synthesis Suite</p>
+        </div>
+    """)
+INITIAL_PROGRESS = make_progress_html(0, "Esperando inicio...")
+def create_turbo_tab():
+    with gr.Row():
+        with gr.Column(scale=1):
+            text = gr.Textbox(label="Texto a sintetizar", value="Hello! [laugh] This is Turbo speed!", lines=4, elem_id="turbo_textbox")
+            with gr.Row(elem_classes="tag-container"):
+                tags = [gr.Button(t, size="sm", elem_classes="tag-btn") for t in ["[laugh]", "[sigh]", "[cough]", "[clear throat]", "[gasp]", "[chuckle]"]]
+            voice = gr.Dropdown(label="Seleccionar Voz", choices=[], interactive=True)
+            preview = gr.Audio(label="Preview", interactive=False, visible=True, type="filepath")
+            btn = gr.Button("⚡ Generar Audio (Turbo)", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            # MODIFICADO: Audio primero, luego progreso
+            audio_out = gr.Audio(label="Resultado Final", autoplay=True, show_download_button=True)
+            progress_html = gr.HTML(value=INITIAL_PROGRESS, label="Estado")
+    return {"text": text, "voice": voice, "prev": preview, "btn": btn, "prog": progress_html, "out": audio_out, "tags": tags}
+def create_tts_tab():
+    with gr.Row():
+        with gr.Column(scale=1):
+            text = gr.Textbox(label="Texto", value="Hello world!", lines=4)
+            voice = gr.Dropdown(label="Voz Clonada", choices=[], interactive=True)
+            preview = gr.Audio(label="Preview", interactive=False)
+            with gr.Accordion("⚙️ Opciones Avanzadas", open=False):
+                exag = gr.Slider(0.25, 2, value=.5, step=0.05, label="Exaggeration")
+                cfg = gr.Slider(0, 1, value=.5, step=0.05, label="CFG")
+                temp = gr.Slider(0.05, 5, value=.8, step=0.05, label="Temp")
+                seed = gr.Number(0, label="Seed")
+                min_p = gr.Slider(0, 1, value=0.05, label="Min P")
+                top_p = gr.Slider(0, 1, value=1.0, label="Top P")
+                rep = gr.Slider(1, 2, value=1.2, label="Repetition")
+            btn = gr.Button("🎙️ Generar Audio", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            # MODIFICADO: Audio primero, luego progreso
+            audio_out = gr.Audio(label="Resultado", autoplay=True)
+            progress_html = gr.HTML(value=INITIAL_PROGRESS)
+    return {"text": text, "voice": voice, "prev": preview, "btn": btn, "prog": progress_html, "out": audio_out, "opts": [exag, temp, seed, cfg, min_p, top_p, rep]}
+def create_mtl_tab():
+    with gr.Row():
+        with gr.Column(scale=1):
+            text = gr.Textbox(label="Texto", value="Hola mundo", lines=4)
+            lang_choices = [(f"{v} ({k})", k) for k,v in SUPPORTED_LANGUAGES.items()]
+            lang = gr.Dropdown(label="Idioma", choices=lang_choices, value="es")
+            voice = gr.Dropdown(label="Voz", choices=[])
+            preview = gr.Audio(label="Preview", interactive=False)
+            with gr.Accordion("⚙️ Opciones", open=False):
+                exag = gr.Slider(0.25, 2, value=.5, label="Exaggeration")
+                temp = gr.Slider(0.05, 5, value=.8, label="Temp")
+                seed = gr.Number(0, label="Seed")
+                cfg = gr.Slider(0, 1, value=.5, label="CFG")
+            btn = gr.Button("🌍 Generar", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            # MODIFICADO: Audio primero, luego progreso
+            audio_out = gr.Audio(label="Resultado", autoplay=True)
+            progress_html = gr.HTML(value=INITIAL_PROGRESS)
+    return {"text": text, "lang": lang, "voice": voice, "prev": preview, "btn": btn, "prog": progress_html, "out": audio_out, "opts": [exag, temp, seed, cfg]}
+def create_vc_tab():
+    with gr.Row():
+        with gr.Column(scale=1):
+            inp = gr.Audio(label="Entrada", sources=["upload", "microphone"], type="filepath")
+            voice = gr.Dropdown(label="Voz Objetivo", choices=[])
+            preview = gr.Audio(label="Preview", interactive=False)
+            btn = gr.Button("🔄 Convertir", variant="primary", size="lg")
+        with gr.Column(scale=1):
+            # MODIFICADO: Audio primero, luego progreso
+            audio_out = gr.Audio(label="Resultado", autoplay=True)
+            progress_html = gr.HTML(value=INITIAL_PROGRESS)
+    return {"inp": inp, "voice": voice, "prev": preview, "btn": btn, "prog": progress_html, "out": audio_out}
+def create_clone_tab():
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### 🧬 Clonar Voz")
+            name = gr.Textbox(label="Nombre")
+            with gr.Row():
+                gender = gr.Radio(["Male", "Female"], value="Male", label="Género")
+                lang_choices = [(f"{v} ({k})", k) for k,v in SUPPORTED_LANGUAGES.items()]
+                lang = gr.Dropdown(label="Idioma", choices=lang_choices, value="es")
+            cat = gr.Dropdown(label="Categoría", choices=CATEGORY_LIST, allow_custom_value=False)
+            ref = gr.Audio(label="Referencia", type="filepath")
+            btn = gr.Button("💾 Clonar", variant="primary")
+            status = gr.Textbox(label="Estado")
+        with gr.Column():
+            gr.Markdown("### 🗑️ Borrar")
+            del_sel = gr.Dropdown(label="Seleccionar Voz", choices=[])
+            del_btn = gr.Button("🗑️ Eliminar", variant="stop")
+            del_stat = gr.Textbox(label="Estado")
+    return {"name": name, "gender": gender, "cat": cat, "lang": lang, "ref": ref, "btn": btn, "stat": status, "del_sel": del_sel, "del_btn": del_btn, "del_stat": del_stat}
+CSS = """
+body, .gradio-container { background-color: #0f172a; font-family: 'Segoe UI', sans-serif; }
+.sidebar-container { background-color: #1e293b; padding: 20px; border-right: 1px solid #334155; }
+.nav-btn { background: transparent; border: none; color: #94a3b8; text-align: left; padding: 15px; font-weight: 600; width: 100%; border-radius: 8px; margin-bottom: 5px; transition: all 0.2s; }
+.nav-btn:hover { background: #334155; color: white; padding-left: 20px; }
+.active-btn { background: #334155; color: white; border-left: 4px solid #f97316; padding-left: 20px; box-shadow: 0 4px 6px rgba(0,0,0,0.1); }
+.content-panel { background: #1e293b; border: 1px solid #334155; border-radius: 16px; padding: 30px; margin-top: 20px; box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1); }
+input, textarea, select { background-color: #0f172a !important; border: 1px solid #334155 !important; color: white !important; }
+.tag-btn { background: #334155; color: #e2e8f0; border: 1px solid #475569; margin-right: 5px; }
+"""
+with gr.Blocks(title="Chatterbox Pro", css=CSS, theme=gr.themes.Base()) as demo:
+    with gr.Row(elem_classes="main-layout", equal_height=True):
+        with gr.Column(scale=1, min_width=250, elem_classes="sidebar-container"):
+            gr.Markdown("### 🎛️ CONTROL PANEL")
+            gr.Markdown("---")
+            initial_cat = CATEGORY_LIST[0] if CATEGORY_LIST else None
+            cat_filter = gr.Dropdown(label="📚 Librería", choices=CATEGORY_LIST, value=initial_cat)
+            lang_filter = gr.Dropdown(label="🌐 Idioma", choices=[])
+            gr.Markdown("---")
+            btn_turbo = gr.Button("🚀 Turbo Mode", elem_classes=["nav-btn", "active-btn"])
+            btn_tts = gr.Button("🎤 TTS Pro", elem_classes=["nav-btn"])
+            btn_mtl = gr.Button("🌍 Multilingual", elem_classes=["nav-btn"])
+            btn_vc = gr.Button("🔄 Converter", elem_classes=["nav-btn"])
+            btn_clone = gr.Button("🧬 Cloning Lab", elem_classes=["nav-btn"])
+        with gr.Column(scale=4, elem_classes="content-area"):
+            create_header()
+            with gr.Column(visible=True, elem_classes="content-panel") as v_turbo: ui_turbo = create_turbo_tab()
+            with gr.Column(visible=False, elem_classes="content-panel") as v_tts: ui_tts = create_tts_tab()
+            with gr.Column(visible=False, elem_classes="content-panel") as v_mtl: ui_mtl = create_mtl_tab()
+            with gr.Column(visible=False, elem_classes="content-panel") as v_vc: ui_vc = create_vc_tab()
+            with gr.Column(visible=False, elem_classes="content-panel") as v_clone: ui_clone = create_clone_tab()
+    views = [v_turbo, v_tts, v_mtl, v_vc, v_clone]; btns = [btn_turbo, btn_tts, btn_mtl, btn_vc, btn_clone]
+    def switch_view(idx):
+        return [gr.update(visible=(i==idx)) for i in range(len(views))] + [gr.update(elem_classes=["nav-btn", "active-btn"] if i==idx else ["nav-btn"]) for i in range(len(btns))]
+    for i, b in enumerate(btns): b.click(lambda idx=i: switch_view(idx), outputs=views+btns)
+    def update_ui_lists(cat, lang_code=None):
+        langs = get_available_languages(cat)
+        valid_codes = [c[1] for c in langs]
+        curr_lang = lang_code if lang_code in valid_codes else (valid_codes[0] if valid_codes else None)
+        voices = get_voices_for_ui(cat, curr_lang)
+        v_val = voices[0] if voices else None
+        return (gr.update(choices=langs, value=curr_lang), gr.update(choices=voices, value=v_val), gr.update(choices=voices, value=v_val), gr.update(choices=voices, value=v_val), gr.update(choices=voices, value=v_val), gr.update(choices=CATEGORY_LIST, value=cat), gr.update(choices=get_all_voices_list()))
+    cat_filter.change(update_ui_lists, inputs=[cat_filter], outputs=[lang_filter, ui_turbo["voice"], ui_tts["voice"], ui_mtl["voice"], ui_vc["voice"], ui_clone["cat"], ui_clone["del_sel"]])
+    lang_filter.change(lambda c, l: update_ui_lists(c, l), inputs=[cat_filter, lang_filter], outputs=[lang_filter, ui_turbo["voice"], ui_tts["voice"], ui_mtl["voice"], ui_vc["voice"], ui_clone["cat"], ui_clone["del_sel"]])
+    demo.load(lambda: update_ui_lists(initial_cat), outputs=[lang_filter, ui_turbo["voice"], ui_tts["voice"], ui_mtl["voice"], ui_vc["voice"], ui_clone["cat"], ui_clone["del_sel"]])
+    ui_turbo["voice"].change(lambda v: resolve_voice_path(v, "en"), inputs=ui_turbo["voice"], outputs=ui_turbo["prev"])
+    ui_tts["voice"].change(lambda v: resolve_voice_path(v, "en"), inputs=ui_tts["voice"], outputs=ui_tts["prev"])
+    ui_mtl["voice"].change(lambda v, l: resolve_voice_path(v, l), inputs=[ui_mtl["voice"], ui_mtl["lang"]], outputs=ui_mtl["prev"])
+    ui_vc["voice"].change(lambda v: resolve_voice_path(v, "en"), inputs=ui_vc["voice"], outputs=ui_vc["prev"])
+    ui_turbo["btn"].click(generate_turbo_speech, inputs=[ui_turbo["text"], ui_turbo["voice"]], outputs=[ui_turbo["prog"], ui_turbo["out"]])
+    ui_tts["btn"].click(generate_speech, inputs=[ui_tts["text"], ui_tts["voice"]] + ui_tts["opts"], outputs=[ui_tts["prog"], ui_tts["out"]])
+    ui_mtl["btn"].click(generate_multilingual_speech, inputs=[ui_mtl["text"], ui_mtl["voice"], ui_mtl["lang"]] + ui_mtl["opts"], outputs=[ui_mtl["prog"], ui_mtl["out"]])
+    ui_vc["btn"].click(convert_voice, inputs=[ui_vc["inp"], ui_vc["voice"]], outputs=[ui_vc["prog"], ui_vc["out"]])
+    ui_clone["btn"].click(clone_voice_wrapper, inputs=[ui_clone["ref"], ui_clone["name"], ui_clone["cat"], ui_clone["lang"], ui_clone["gender"]], outputs=[ui_clone["stat"], ui_clone["del_sel"]])
+    ui_clone["del_btn"].click(delete_voice_wrapper, inputs=[ui_clone["del_sel"]], outputs=[ui_clone["del_stat"], ui_clone["del_sel"]])
+    JS_TAGS = """(tag, text) => { var el = document.querySelector('#turbo_textbox textarea'); if(el) { var start = el.selectionStart; var end = el.selectionEnd; return text.slice(0, start) + " " + tag + " " + text.slice(end); } return text + " " + tag; }"""
+    for btn in ui_turbo["tags"]: btn.click(None, inputs=[btn, ui_turbo["text"]], outputs=ui_turbo["text"], js=JS_TAGS)
+if __name__ == "__main__":
+    demo.queue().launch(inbrowser=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+numpy>=1.24.0,<1.26.0
+librosa==0.11.0
+s3tokenizer
+torch==2.7.1
+torchaudio==2.7.1
+transformers==4.46.3
+diffusers==0.29.0
+resemble-perth==1.0.1
+conformer==0.3.2
+safetensors==0.5.3
+pykakasi==2.3.0
+gradio==5.44.1
+pyloudnorm
+omegaconf