Spaces:

bobsackett
/

ChatterboxTTS-DNXS-Spokenwordv1

Sleeping

App Files Files Community

danneauxs commited on Aug 11, 2025

Commit

f3cff30

1 Parent(s): 67b64d0

vader check and batch processing

Browse files

Files changed (20) hide show

BATCH_IMPLEMENTATION_PLAN.md +58 -0
app.py +2 -2
app.py.20250811-120000.bak +523 -0
config/config.py +252 -0
config/config.py.20250811-120000.bak +251 -0
config/config.py~ +251 -0
gradio_main_interface.py +27 -10
gradio_tabs/gradio_imports.py +85 -0
gradio_tabs/tab1_convert_book.py +15 -22
gradio_tabs/tab1_convert_book.py.20250811-120000.bak +1173 -0
gradio_tabs/tab2_configuration.py +68 -45
gradio_tabs/tab_diagnostics.py +558 -0
hold/chatterbox (copy).tar.gz +3 -0
modules/tts_engine.py +143 -575
modules/tts_engine.py.20250811-120000.bak +710 -0
src/chatterbox/models/t3/t3.py +1 -0
src/chatterbox/tts.py +135 -1
src/chatterbox/tts.py.20250811-120000.bak +281 -0
test_parallel_performance.py +235 -0
utils/generate_from_json (copy).py +146 -0

BATCH_IMPLEMENTATION_PLAN.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# Plan for Implementing High-Performance Batch Processing
+This document outlines the necessary code modifications to implement a high-performance batch processing mode that can be toggled by the "Use VADER" checkbox in the GUI.
+The goal is to create two distinct modes:
+- **VADER On (Nuanced Mode):** Slower, processes chunks one-by-one with unique TTS parameters for nuanced delivery.
+- **VADER Off (Batch Mode):** Significantly faster, processes chunks in batches with a single set of TTS parameters.
+---
+## 1. File to Modify: `src/chatterbox/tts.py`
+*   **Purpose:** To enable the core TTS model to handle batches of text.
+*   **Changes Needed:**
+    *   A new method, `generate_batch(self, texts: list, **tts_params)`, needs to be created within the `ChatterboxTTS` class.
+    *   This method must perform the following steps:
+        1.  Accept a list of text strings (`texts`).
+        2.  Tokenize each text string in the list.
+        3.  Pad the tokenized sequences to ensure they all have the same length, creating a single batch tensor. `torch.nn.utils.rnn.pad_sequence` is suitable for this.
+        4.  Feed the complete batch tensor to the underlying model (`self.t3.inference` and `self.s3gen.inference`).
+        5.  Return a list of the resulting audio waveforms.
+---
+## 2. File to Modify: `modules/tts_engine.py`
+*   **Purpose:** To orchestrate the new batching workflow and choose the processing mode.
+*   **Changes Needed:**
+    ### a. Create a New Worker Function
+    *   Add a new function: `process_batch(batch_of_chunks, model, ...)`
+    *   This function will:
+        1.  Accept a list of chunk objects (e.g., a batch of 16).
+        2.  Extract the text from each chunk into a simple list.
+        3.  Call the new `model.generate_batch()` with the list of texts and the shared TTS parameters.
+        4.  Receive a list of audio waveforms back.
+        5.  Loop through the audio waves, apply the existing silence trimming and padding logic to each one, and save them to their respective `chunk_...wav` files.
+    ### b. Modify the Main `process_book_folder` Function
+    *   Locate the `use_vader` flag which is determined from the GUI options.
+    *   Wrap the core processing loop in an `if/else` block based on this flag.
+    *   **`if use_vader:` (Nuanced Mode):**
+        *   Keep the existing code that iterates through chunks one-by-one and submits them to the `process_one_chunk` function.
+    *   **`else:` (Batch Mode):**
+        *   Add the new logic here.
+        *   Group the `all_chunks` list into fixed-size batches based on `TTS_BATCH_SIZE` from the config.
+        *   Use the existing `ThreadPoolExecutor` to submit these new **batches** to the new `process_batch` worker function.
+---
+## 3. Files to Modify: `config/config.py` and `chatterbox_gui.py`
+*   **Purpose:** To provide user control over the batch size for performance tuning.
+*   **Changes Needed:**
+    *   **In `config/config.py`:**
+        *   Add a new configuration variable: `TTS_BATCH_SIZE = 16` (or another sensible default).
+    *   **In `chatterbox_gui.py`:**
+        *   On the "Config" tab, add a new `QSpinBox` (numeric input field) that is linked to the `TTS_BATCH_SIZE` variable. This will allow the user to change the batch size without editing code.

app.py CHANGED Viewed

@@ -495,10 +495,10 @@ def main():
     launcher.run()
 if __name__ == "__main__":
-      # Add current directory to Python path for HF Spaces
       import sys
       import os
-      sys.path.append(os.path.dirname(os.path.abspath(__file__)))
       # Fix OpenMP environment variable for HuggingFace Spaces
       os.environ["OMP_NUM_THREADS"] = "1"

     launcher.run()
 if __name__ == "__main__":
+      # Add src directory to Python path for HF Spaces
       import sys
       import os
+      sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), 'src'))
       # Fix OpenMP environment variable for HuggingFace Spaces
       os.environ["OMP_NUM_THREADS"] = "1"

app.py.20250811-120000.bak ADDED Viewed

	@@ -0,0 +1,523 @@

+#!/usr/bin/env python3
+"""
+Comprehensive Gradio Launcher for ChatterboxTTS
+Automatically handles all requirements, installation, and setup
+"""
+import sys
+import os
+import subprocess
+import importlib
+import pkg_resources
+from pathlib import Path
+import time
+class GradioLauncher:
+    def __init__(self):
+        self.required_packages = {
+            # Core packages with fallbacks
+            'gradio': {'min_version': '4.0.0', 'install_name': 'gradio>=4.0.0'},
+            'torch': {'min_version': '2.0.0', 'install_name': 'torch>=2.0.0'},
+            'torchaudio': {'min_version': '2.0.0', 'install_name': 'torchaudio>=2.0.0'},
+            'transformers': {'min_version': '4.20.0', 'install_name': 'transformers>=4.20.0'},
+            'huggingface_hub': {'min_version': '0.15.0', 'install_name': 'huggingface_hub>=0.15.0'},
+            'safetensors': {'min_version': '0.3.0', 'install_name': 'safetensors>=0.3.0'},
+            # Audio processing
+            'soundfile': {'min_version': '0.12.0', 'install_name': 'soundfile>=0.12.0'},
+            'librosa': {'min_version': '0.10.0', 'install_name': 'librosa>=0.10.0'},
+            'pydub': {'min_version': '0.25.0', 'install_name': 'pydub>=0.25.0'},
+            # Voice Analysis (optional but recommended)
+            'parselmouth': {'min_version': '0.4.3', 'install_name': 'praat-parselmouth>=0.4.3', 'optional': True},
+            'matplotlib': {'min_version': '3.5.0', 'install_name': 'matplotlib>=3.5.0'},
+            'scipy': {'min_version': '1.8.0', 'install_name': 'scipy>=1.8.0'},
+            'numpy': {'min_version': '1.21.0', 'install_name': 'numpy>=1.21.0'},
+            # System utilities
+            'psutil': {'min_version': '5.8.0', 'install_name': 'psutil>=5.8.0'},
+            'vaderSentiment': {'min_version': '3.3.0', 'install_name': 'vaderSentiment>=3.3.0'},
+        }
+        self.chatterbox_git_url = 'git+https://github.com/resemble-ai/chatterbox-tts.git'
+        self.optional_packages = ['parselmouth', 'pynvml']
+    def print_header(self):
+        """Print launcher header"""
+        print("=" * 70)
+        print("🚀 ChatterboxTTS Gradio Launcher")
+        print("=" * 70)
+        print("🔧 Comprehensive setup and dependency manager")
+        print("📦 Automatically installs missing requirements")
+        print("🌐 Launches web interface when ready")
+        print("-" * 70)
+    def check_python_version(self):
+        """Check if Python version is compatible"""
+        print("🐍 Checking Python version...")
+        version_info = sys.version_info
+        if version_info.major < 3 or (version_info.major == 3 and version_info.minor < 8):
+            print("❌ Error: Python 3.8+ required")
+            print(f"   Current version: {version_info.major}.{version_info.minor}.{version_info.micro}")
+            print("   Please upgrade Python and try again")
+            sys.exit(1)
+        print(f"✅ Python {version_info.major}.{version_info.minor}.{version_info.micro} - Compatible")
+    def check_working_directory(self):
+        """Verify we're in the correct directory"""
+        print("📁 Checking working directory...")
+        if missing_files:
+            print(f"❌ Error: Missing required files/directories: {', '.join(missing_files)}")
+            print("   Please run this script from the ChatterboxTTS root directory")
+            print("   Expected structure:")
+            print("   ├── gradio_main_interface.py")
+            print("   ├── gradio_tabs/")
+            print("   ├── config/")
+            print("   ├── src/")
+            print("   └── ...")
+            return False
+        print("✅ Working directory structure verified")
+        return True
+    def create_directories(self):
+        """Create required directories if they don't exist"""
+        print("📂 Creating required directories...")
+        directories = ['Voice_Samples', 'Text_Input', 'Audiobook', 'Output', 'voice_analyzer']
+        created = []
+        for dir_name in directories:
+            dir_path = Path(dir_name)
+            if not dir_path.exists():
+                dir_path.mkdir(parents=True, exist_ok=True)
+                created.append(dir_name)
+        if created:
+            print(f"✅ Created directories: {', '.join(created)}")
+        else:
+            print("✅ All required directories exist")
+    def check_package_installed(self, package_name):
+        """Check if a package is installed and get its version"""
+        # If we have a virtual environment, check there first
+        if hasattr(self, 'venv_python') and Path(self.venv_python).exists():
+            try:
+                cmd = [self.venv_python, '-c', f'''
+try:
+    import {package_name}
+    print("INSTALLED", getattr({package_name}, "__version__", "0.0.0"))
+except ImportError:
+    print("NOT_INSTALLED")
+''']
+                result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
+                if result.returncode == 0:
+                    output = result.stdout.strip()
+                    if output.startswith("INSTALLED"):
+                        version = output.split(" ", 1)[1] if " " in output else "0.0.0"
+                        return True, version
+                    else:
+                        return False, None
+            except Exception:
+                pass  # Fall back to local check
+        # Fallback to local Python environment check
+        try:
+            if package_name == 'parselmouth':
+                # Special case for praat-parselmouth
+                import parselmouth
+                return True, getattr(parselmouth, '__version__', '0.0.0')
+            else:
+                module = importlib.import_module(package_name)
+                version = getattr(module, '__version__', '0.0.0')
+                return True, version
+        except ImportError:
+            try:
+                # Try with pkg_resources as fallback
+                pkg = pkg_resources.get_distribution(package_name)
+                return True, pkg.version
+            except (pkg_resources.DistributionNotFound, ImportError):
+                return False, None
+    def compare_versions(self, current, required):
+        """Compare version strings"""
+        try:
+            current_parts = [int(x) for x in current.split('.')]
+            required_parts = [int(x) for x in required.split('.')]
+            # Pad shorter version with zeros
+            max_len = max(len(current_parts), len(required_parts))
+            current_parts.extend([0] * (max_len - len(current_parts)))
+            required_parts.extend([0] * (max_len - len(required_parts)))
+            return current_parts >= required_parts
+        except (ValueError, AttributeError):
+            # If we can't parse versions, assume it's okay
+            return True
+    def setup_virtual_environment(self):
+        """Set up virtual environment if in externally managed environment"""
+        venv_path = Path("venv")
+        if not venv_path.exists():
+            print("🔧 Creating virtual environment (externally managed Python detected)...")
+            try:
+                result = subprocess.run(
+                    [sys.executable, '-m', 'venv', 'venv'],
+                    capture_output=True,
+                    text=True,
+                    timeout=60
+                )
+                if result.returncode != 0:
+                    print(f"   ❌ Failed to create virtual environment: {result.stderr}")
+                    return False
+                print("   ✅ Virtual environment created")
+            except Exception as e:
+                print(f"   ❌ Error creating virtual environment: {e}")
+                return False
+        else:
+            print("🔧 Using existing virtual environment...")
+        # Update sys.executable to use venv python
+        if os.name == 'nt':  # Windows
+            self.venv_python = str(venv_path / "Scripts" / "python.exe")
+            self.venv_pip = str(venv_path / "Scripts" / "pip.exe")
+        else:  # Unix/Linux/Mac
+            self.venv_python = str(venv_path / "bin" / "python")
+            self.venv_pip = str(venv_path / "bin" / "pip")
+        # Verify venv python works
+        try:
+            result = subprocess.run([self.venv_python, '--version'], capture_output=True, text=True)
+            if result.returncode == 0:
+                print(f"   ✅ Virtual environment Python: {result.stdout.strip()}")
+                return True
+            else:
+                print("   ❌ Virtual environment Python not working")
+                return False
+        except Exception as e:
+            print(f"   ❌ Error testing virtual environment: {e}")
+            return False
+    def install_package(self, package_spec):
+        """Install a package using pip (with virtual environment support)"""
+        try:
+            print(f"   Installing {package_spec}...")
+            # Use venv pip if available, otherwise system pip
+            pip_executable = getattr(self, 'venv_pip', None)
+            if pip_executable and Path(pip_executable).exists():
+                cmd = [pip_executable, 'install', package_spec]
+            else:
+                cmd = [sys.executable, '-m', 'pip', 'install', package_spec]
+            result = subprocess.run(
+                cmd,
+                capture_output=True,
+                text=True,
+                timeout=300  # 5 minute timeout
+            )
+            if result.returncode == 0:
+                print(f"   ✅ Successfully installed {package_spec}")
+                return True
+            else:
+                print(f"   ❌ Failed to install {package_spec}")
+                print(f"   Error: {result.stderr}")
+                # If we get externally-managed error, try setting up venv
+                if "externally-managed-environment" in result.stderr and not hasattr(self, 'venv_python'):
+                    print("   🔄 Detected externally managed environment, setting up virtual environment...")
+                    if self.setup_virtual_environment():
+                        # Retry installation with venv
+                        return self.install_package(package_spec)
+                return False
+        except subprocess.TimeoutExpired:
+            print(f"   ⏰ Installation of {package_spec} timed out")
+            return False
+        except Exception as e:
+            print(f"   ❌ Error installing {package_spec}: {str(e)}")
+            return False
+    def check_and_install_requirements(self):
+        """Check and install all required packages"""
+        print("📦 Checking package requirements...")
+        missing_packages = []
+        outdated_packages = []
+        optional_missing = []
+        # Check each required package
+        for package_name, info in self.required_packages.items():
+            is_installed, current_version = self.check_package_installed(package_name)
+            min_version = info['min_version']
+            is_optional = info.get('optional', False)
+            if not is_installed:
+                if is_optional:
+                    optional_missing.append((package_name, info))
+                    print(f"   ⚠️  Optional package missing: {package_name}")
+                else:
+                    missing_packages.append((package_name, info))
+                    print(f"   ❌ Missing required package: {package_name}")
+            elif current_version and not self.compare_versions(current_version, min_version):
+                if is_optional:
+                    print(f"   ⚠️  Optional package outdated: {package_name} {current_version} < {min_version}")
+                else:
+                    outdated_packages.append((package_name, info))
+                    print(f"   ❌ Outdated package: {package_name} {current_version} < {min_version}")
+            else:
+                status = "✅" if not is_optional else "🔧"
+                print(f"   {status} {package_name}: {current_version}")
+        # Install missing/outdated packages
+        if missing_packages or outdated_packages:
+            print(f"\n🔧 Installing {len(missing_packages + outdated_packages)} required packages...")
+            for package_name, info in missing_packages + outdated_packages:
+                install_spec = info['install_name']
+                if not self.install_package(install_spec):
+                    print(f"❌ Critical error: Failed to install {package_name}")
+                    return False
+        # Install ChatterboxTTS if not available
+        print("🎤 Checking ChatterboxTTS installation...")
+        try:
+            import chatterbox
+            print("   ✅ ChatterboxTTS already installed")
+        except ImportError:
+            print("   📥 Installing ChatterboxTTS from GitHub...")
+            if not self.install_package(self.chatterbox_git_url):
+                print("   ⚠️  ChatterboxTTS installation failed - some features may not work")
+        # Try to install optional packages
+        if optional_missing:
+            print(f"\n🎯 Installing {len(optional_missing)} optional packages...")
+            for package_name, info in optional_missing:
+                install_spec = info['install_name']
+                if self.install_package(install_spec):
+                    print(f"   ✅ Optional package {package_name} installed successfully")
+                else:
+                    print(f"   ⚠️  Optional package {package_name} failed - voice analysis may be limited")
+        return True
+    def check_gpu_availability(self):
+        """Check for GPU availability"""
+        print("🖥️  Checking GPU availability...")
+        try:
+            import torch
+            if torch.cuda.is_available():
+                gpu_count = torch.cuda.device_count()
+                gpu_name = torch.cuda.get_device_name(0)
+                print(f"   ✅ CUDA GPU available: {gpu_name} ({gpu_count} device{'s' if gpu_count > 1 else ''})")
+                return True
+            elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+                print("   ✅ Apple Metal Performance Shaders (MPS) available")
+                return True
+            else:
+                print("   ⚠️  No GPU acceleration available - using CPU")
+                print("   💡 For better performance, consider using a GPU-enabled environment")
+                return False
+        except Exception as e:
+            print(f"   ❌ Error checking GPU: {str(e)}")
+            return False
+    def verify_installation(self):
+        """Verify that all components can be imported"""
+        print("🔍 Verifying installation...")
+        critical_imports = [
+            ('gradio', 'Gradio web interface'),
+            ('torch', 'PyTorch machine learning'),
+            ('transformers', 'Hugging Face transformers'),
+            ('librosa', 'Audio processing'),
+            ('soundfile', 'Audio file I/O'),
+            ('numpy', 'Numerical computing'),
+            ('matplotlib', 'Plotting and visualization')
+        ]
+        optional_imports = [
+            ('parselmouth', 'Praat voice analysis'),
+            ('scipy', 'Scientific computing'),
+            ('psutil', 'System monitoring')
+        ]
+        failed_critical = []
+        failed_optional = []
+        # Check critical imports
+        for module_name, description in critical_imports:
+            try:
+                importlib.import_module(module_name)
+                print(f"   ✅ {description}")
+            except ImportError as e:
+                print(f"   ❌ {description}: {str(e)}")
+                failed_critical.append(module_name)
+        # Check optional imports
+        for module_name, description in optional_imports:
+            try:
+                importlib.import_module(module_name)
+                print(f"   🔧 {description}")
+            except ImportError:
+                print(f"   ⚠️  {description}: Not available")
+                failed_optional.append(module_name)
+        if failed_critical:
+            print(f"\n❌ Critical imports failed: {', '.join(failed_critical)}")
+            print("   The interface may not work properly")
+            return False
+        if failed_optional:
+            print(f"\n⚠️  Optional features unavailable: {', '.join(failed_optional)}")
+            print("   Voice analysis features may be limited")
+        print("✅ Installation verification complete")
+        return True
+    def launch_interface(self):
+        """Launch the Gradio interface"""
+        print("\n🚀 Launching ChatterboxTTS Gradio Interface...")
+        print("-" * 50)
+        # If we're using a virtual environment, launch with venv python
+        if hasattr(self, 'venv_python') and Path(self.venv_python).exists():
+            print("🔧 Using virtual environment Python...")
+            try:
+                print("🌐 Starting web server...")
+                print("📱 Interface will be available in your browser")
+                print("🔗 Default URL: http://localhost:7860")
+                if os.getenv("RUNPOD_POD_ID"):
+                    print("☁️  RunPod deployment detected")
+                elif os.getenv("COLAB_GPU"):
+                    print("☁️  Google Colab detected - sharing link will be generated")
+                print("\n" + "=" * 50)
+                print("🎉 LAUNCHING CHATTERBOX TTS!")
+                print("=" * 50)
+                # Launch using virtual environment python
+                subprocess.run([self.venv_python, "gradio_main_interface.py"])
+            except KeyboardInterrupt:
+                print("\n\n👋 Shutdown requested by user")
+                print("   Thanks for using ChatterboxTTS!")
+                sys.exit(0)
+            except Exception as e:
+                print(f"\n❌ Error launching with virtual environment: {str(e)}")
+                print("   Falling back to direct import...")
+                self._launch_direct()
+        else:
+            self._launch_direct()
+    def _launch_direct(self):
+        """Launch interface by direct import"""
+        try:
+            # Import and launch
+            from gradio_main_interface import launch_interface
+            print("🌐 Starting web server...")
+            print("📱 Interface will be available in your browser")
+            print("🔗 Default URL: http://localhost:7860")
+            if os.getenv("RUNPOD_POD_ID"):
+                print("☁️  RunPod deployment detected")
+            elif os.getenv("COLAB_GPU"):
+                print("☁️  Google Colab detected - sharing link will be generated")
+            print("\n" + "=" * 50)
+            print("🎉 LAUNCHING CHATTERBOX TTS!")
+            print("=" * 50)
+            # Small delay for user to read messages
+            time.sleep(2)
+            # Launch the interface
+            launch_interface()
+        except KeyboardInterrupt:
+            print("\n\n👋 Shutdown requested by user")
+            print("   Thanks for using ChatterboxTTS!")
+            sys.exit(0)
+        except Exception as e:
+            print(f"\n❌ Error launching interface: {str(e)}")
+            print("\nTroubleshooting tips:")
+            print("1. Check that all dependencies are installed")
+            print("2. Verify you're in the correct directory")
+            if hasattr(self, 'venv_python'):
+                print(f"3. Try running: {self.venv_python} gradio_main_interface.py")
+            else:
+                print("3. Try running: python3 gradio_main_interface.py")
+            sys.exit(1)
+    def run(self):
+        """Run the complete launcher process"""
+        self.print_header()
+        # Step 1: Check Python version
+        self.check_python_version()
+        # Step 2: Check working directory
+        if not self.check_working_directory():
+            sys.exit(1)
+        # Step 3: Create required directories
+        self.create_directories()
+        # Step 4: Check and install requirements
+        if not self.check_and_install_requirements():
+            print("\n❌ Failed to install required packages")
+            sys.exit(1)
+        # Step 5: Check GPU availability
+        self.check_gpu_availability()
+        # Step 6: Verify installation
+        if not self.verify_installation():
+            print("\n⚠️  Installation verification failed")
+            print("   Proceeding anyway - some features may not work")
+        # Step 7: Launch interface
+        self.launch_interface()
+def main():
+    """Main entry point"""
+    launcher = GradioLauncher()
+    launcher.run()
+if __name__ == "__main__":
+      # Add current directory to Python path for HF Spaces
+      import sys
+      import os
+      sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+      # Fix OpenMP environment variable for HuggingFace Spaces
+      os.environ["OMP_NUM_THREADS"] = "1"
+      # Skip launcher logic for HF Spaces, run interface directly
+      try:
+          # Import the actual Gradio interface
+          import gradio_main_interface
+          # Create and launch the interface
+          demo = gradio_main_interface.create_main_interface()
+          demo.launch(
+              server_name="0.0.0.0",
+              server_port=7860,
+              share=False,
+              show_error=True
+          )
+      except ImportError as e:
+          print(f"❌ Failed to import gradio_main_interface: {e}")
+          # Fallback to launcher if needed
+          launcher = GradioLauncher()
+          launcher.launch_interface()

config/config.py ADDED Viewed

	@@ -0,0 +1,252 @@

+"""
+GenTTS Configuration Module
+Central location for all settings, paths, and feature toggles
+"""
+import os
+from pathlib import Path
+# ============================================================================
+# CORE DIRECTORIES
+# ============================================================================
+TEXT_INPUT_ROOT = Path("Text_Input")
+AUDIOBOOK_ROOT = Path("Audiobook")
+VOICE_SAMPLES_DIR = Path("Voice_Samples")
+# ============================================================================
+# TEXT PROCESSING SETTINGS
+# ============================================================================
+MAX_CHUNK_WORDS = 28
+MIN_CHUNK_WORDS = 4
+# ============================================================================
+# WORKER AND PERFORMANCE SETTINGS
+# ============================================================================
+MAX_WORKERS = 2
+TEST_MAX_WORKERS = 6                  # For experimentation
+USE_DYNAMIC_WORKERS = False           # Toggle for testing
+VRAM_SAFETY_THRESHOLD = 6.5           # GB
+# ============================================================================
+# AUDIO QUALITY SETTINGS
+# ============================================================================
+ENABLE_MID_DROP_CHECK = False
+ENABLE_ASR = False  # Disabled by default due to tensor dimension errors
+ASR_WORKERS = 4                       # Parallel ASR on CPU threads
+DEFAULT_ASR_MODEL = "base"            # Default Whisper model for ASR validation
+# ASR Model Memory Requirements (approximate)
+ASR_MODEL_VRAM_MB = {
+    "tiny": 39,
+    "base": 74,
+    "small": 244,
+    "medium": 769,
+    "large": 1550,
+    "large-v2": 1550,
+    "large-v3": 1550
+}
+ASR_MODEL_RAM_MB = {
+    "tiny": 150,
+    "base": 300,
+    "small": 800,
+    "medium": 2000,
+    "large": 4000,
+    "large-v2": 4000,
+    "large-v3": 4000
+}
+# ============================================================================
+# TTS HUM DETECTION SETTINGS
+# ============================================================================
+ENABLE_HUM_DETECTION = False
+HUM_FREQ_MIN = 50                     # Hz - Lower frequency bound for hum detection
+HUM_FREQ_MAX = 200                    # Hz - Upper frequency bound for hum detection
+HUM_ENERGY_THRESHOLD = 0.3            # Ratio of hum energy to total energy (0.1-0.5 range)
+HUM_STEADY_THRESHOLD = 0.6            # Ratio of segments with steady amplitude (0.5-0.8 range)
+HUM_AMPLITUDE_MIN = 0.005             # Minimum RMS for steady hum detection
+HUM_AMPLITUDE_MAX = 0.1               # Maximum RMS for steady hum detection
+# ============================================================================
+# AUDIO TRIMMING SETTINGS
+# ============================================================================
+ENABLE_AUDIO_TRIMMING = True
+SPEECH_ENDPOINT_THRESHOLD = 0.006
+TRIMMING_BUFFER_MS = 50
+# ============================================================================
+# SILENCE DURATION SETTINGS (milliseconds)
+# ============================================================================
+SILENCE_CHAPTER_START = 1195
+SILENCE_CHAPTER_END = 1100
+SILENCE_SECTION_BREAK = 700
+SILENCE_PARAGRAPH_END = 1000
+# Punctuation-specific silence settings (milliseconds)
+SILENCE_COMMA = 150
+SILENCE_SEMICOLON = 150               # Medium pause after semicolons
+SILENCE_COLON = 150                   # Pause after colons
+SILENCE_PERIOD = 500
+SILENCE_QUESTION_MARK = 500
+SILENCE_EXCLAMATION = 200
+SILENCE_DASH = 200                    # Em dash pause
+SILENCE_ELLIPSIS = 80                # Ellipsis pause (suspense)
+SILENCE_QUOTE_END = 150               # End of quoted speech
+# Chunk-level silence settings
+ENABLE_CHUNK_END_SILENCE = False
+CHUNK_END_SILENCE_MS = 200
+# Content boundary silence settings (milliseconds)
+SILENCE_PARAGRAPH_FALLBACK = 500      # Original paragraph logic fallback
+# ============================================================================
+# AUDIO NORMALIZATION SETTINGS
+# ============================================================================
+ENABLE_NORMALIZATION = True
+NORMALIZATION_TYPE = "peak"
+TARGET_LUFS = -16
+TARGET_PEAK_DB = -1.5
+TARGET_LRA = 11                       # Target loudness range for consistency
+# ============================================================================
+# AUDIO PLAYBACK SPEED SETTINGS
+# ============================================================================
+ATEMPO_SPEED = 1.0
+# ============================================================================
+# ENVIRONMENT SETUP
+# ============================================================================
+os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "true"
+os.environ["TRANSFORMERS_NO_PROGRESS_BAR"] = "1"
+os.environ["HF_TRANSFORMERS_NO_TQDM"] = "1"
+# Cache handling is now done by launcher scripts:
+# - launch_gradio_local.sh: Sets shared cache for development
+# - launch_gradio.sh: Uses PyTorch defaults for containers/deployment
+# ============================================================================
+# COLOR CODES FOR TERMINAL OUTPUT
+# ============================================================================
+RESET = "\033[0m"
+BOLD = "\033[1m"
+RED = "\033[91m"
+GREEN = "\033[92m"
+YELLOW = "\033[93m"
+CYAN = "\033[96m"
+# ============================================================================
+# TTS MODEL PARAMETERS (DEFAULTS)
+# ============================================================================
+DEFAULT_EXAGGERATION = 0.5
+DEFAULT_CFG_WEIGHT = 0.5
+DEFAULT_TEMPERATURE = 0.85
+# Advanced Sampling Parameters (Min_P Sampler Support)
+DEFAULT_MIN_P = 0.05                   # Min probability threshold (0.0 disables)
+DEFAULT_TOP_P = 1.0                    # Top-p sampling (1.0 disables)
+DEFAULT_REPETITION_PENALTY = 1.2      # Repetition penalty (1.0 = no penalty)
+# ============================================================================
+# VADER SENTIMENT TO TTS PARAMETER MAPPING
+# ============================================================================
+# These settings control how VADER sentiment analysis dynamically adjusts TTS parameters.
+# The formula used is: new_param = base_param + (compound_score * sensitivity)
+# The result is then clamped within the defined MIN/MAX range.
+# --- Base TTS Parameters (used as the starting point) ---
+# These are the same as the main defaults, but listed here for clarity.
+BASE_EXAGGERATION = DEFAULT_EXAGGERATION  # Default: 1.0
+BASE_CFG_WEIGHT = DEFAULT_CFG_WEIGHT      # Default: 0.7
+BASE_TEMPERATURE = DEFAULT_TEMPERATURE    # Default: 0.7
+# --- Sensitivity ---
+# How much VADER's compound score affects each parameter.
+# Higher values mean more dramatic changes based on sentiment.
+VADER_EXAGGERATION_SENSITIVITY = 0.33
+VADER_CFG_WEIGHT_SENSITIVITY = 0.32
+VADER_TEMPERATURE_SENSITIVITY = 0.3
+VADER_MIN_P_SENSITIVITY = 0.01         # Reduced from 0.02 to prevent sampling issues
+VADER_REPETITION_PENALTY_SENSITIVITY = 0.05  # Reduced from 0.1 to be more conservative
+# --- Min/Max Clamps ---
+# Hard limits to prevent extreme, undesirable audio artifacts.
+TTS_PARAM_MIN_EXAGGERATION = 0.1
+TTS_PARAM_MAX_EXAGGERATION = 0.65
+TTS_PARAM_MIN_CFG_WEIGHT = 0.15
+TTS_PARAM_MAX_CFG_WEIGHT = 0.8
+TTS_PARAM_MIN_TEMPERATURE = 0.1
+TTS_PARAM_MAX_TEMPERATURE = 2.3499999999999988
+TTS_PARAM_MIN_MIN_P = 0.02             # Increased from 0.0 to prevent sampling issues
+TTS_PARAM_MAX_MIN_P = 0.3              # Reduced from MAX 0.5 to prevent over-restriction
+TTS_PARAM_MIN_TOP_P = 0.5              # Too low causes repetition
+TTS_PARAM_MAX_TOP_P = 1.0              # MAX 1.0 disables top_p
+TTS_PARAM_MIN_REPETITION_PENALTY = 1.0 # 1.0 = no penalty
+TTS_PARAM_MAX_REPETITION_PENALTY = 2.0 # Higher values too restrictive MAX 2
+# ============================================================================
+# BATCH PROCESSING SETTINGS
+# ============================================================================
+BATCH_SIZE = 400
+TTS_BATCH_SIZE = 16                   # Batch size for TTS inference when VADER is disabled
+CLEANUP_INTERVAL = 500                # Deep cleanup every N chunks (reduced frequency for speed)
+# ============================================================================
+# QUALITY ENHANCEMENT SETTINGS (Phase 1)
+# ============================================================================
+# --- Regeneration Loop Settings ---
+ENABLE_REGENERATION_LOOP = True      # Enable automatic chunk regeneration on quality failure
+MAX_REGENERATION_ATTEMPTS = 3        # Maximum retry attempts per chunk
+QUALITY_THRESHOLD = 0.30              # TEMPORARILY LOWERED - Composite quality score threshold (0.0-1.0)
+# --- Sentiment Smoothing Settings ---
+ENABLE_SENTIMENT_SMOOTHING = True    # Re-enabled - GUI controls now working properly
+SENTIMENT_SMOOTHING_WINDOW = 3       # Number of previous chunks to consider
+SENTIMENT_SMOOTHING_METHOD = "rolling"  # "rolling" or "exp_decay"
+# Exponential decay weights for smoothing (used if method is "exp_decay")
+SENTIMENT_EXP_DECAY_WEIGHTS = [0.5, 0.3, 0.2]  # Most recent to oldest
+# --- Enhanced Anomaly Detection ---
+SPECTRAL_ANOMALY_THRESHOLD = 0.6     # Spectral anomaly score threshold (0.0-1.0)
+ENABLE_MFCC_VALIDATION = True        # Enable MFCC-based spectral analysis
+SPECTRAL_VARIANCE_LIMIT = 100.0      # Maximum spectral variance before flagging as artifact
+# --- Output Validation Settings ---
+ENABLE_OUTPUT_VALIDATION = True      # Enable quality control clearinghouse (runs individual checks when enabled)
+OUTPUT_VALIDATION_THRESHOLD = 0.6    # Minimum F1 score for output validation (reduced for punctuation tolerance)
+# --- Parameter Adjustment for Regeneration ---
+REGEN_TEMPERATURE_ADJUSTMENT = 0.1   # How much to adjust temperature per retry (increased for visibility)
+REGEN_EXAGGERATION_ADJUSTMENT = 0.15 # How much to adjust exaggeration per retry (increased for visibility)
+REGEN_CFG_ADJUSTMENT = 0.1           # How much to adjust cfg_weight per retry (increased for visibility)
+# ============================================================================
+# PERFORMANCE OPTIMIZATION SETTINGS
+# ============================================================================
+# Voice Embedding Caching - Cache voice embeddings to avoid recomputation
+ENABLE_VOICE_EMBEDDING_CACHE = True        # Enable voice embedding caching
+VOICE_CACHE_MEMORY_LIMIT_MB = 500          # Maximum memory for voice cache (MB)
+ENABLE_ADAPTIVE_VOICE_CACHE = True         # Adapt cache based on system memory
+# GPU Persistence Mode - Keep GPU in compute-ready state
+ENABLE_GPU_PERSISTENCE_MODE = False         # Try to enable GPU persistence mode
+GPU_PERSISTENCE_RETRY_COUNT = 3            # Retry attempts for persistence mode
+# CUDA Memory Pool - Advanced GPU memory management
+ENABLE_CUDA_MEMORY_POOL = False             # Enable CUDA memory pooling
+CUDA_MEMORY_POOL_FRACTION = 0.9            # Fraction of GPU memory to pool
+ENABLE_ADAPTIVE_MEMORY_POOL = True         # Adapt pool size to system
+# Producer-Consumer Pipeline - Eliminate chunk loading overhead
+ENABLE_PRODUCER_CONSUMER_PIPELINE = False   # Re-enabled with proper ETA tracking
+PIPELINE_QUEUE_SIZE_MULTIPLIER = 3         # Queue size = workers * multiplier
+PIPELINE_MAX_QUEUE_SIZE = 20               # Maximum queue size limit
+ENABLE_PIPELINE_FALLBACK = True            # Fall back to sequential if pipeline fails
+# ============================================================================
+# FEATURE TOGGLES
+# ============================================================================
+shutdown_requested = False           # Global shutdown flag

config/config.py.20250811-120000.bak ADDED Viewed

	@@ -0,0 +1,251 @@

+"""
+GenTTS Configuration Module
+Central location for all settings, paths, and feature toggles
+"""
+import os
+from pathlib import Path
+# ============================================================================
+# CORE DIRECTORIES
+# ============================================================================
+TEXT_INPUT_ROOT = Path("Text_Input")
+AUDIOBOOK_ROOT = Path("Audiobook")
+VOICE_SAMPLES_DIR = Path("Voice_Samples")
+# ============================================================================
+# TEXT PROCESSING SETTINGS
+# ============================================================================
+MAX_CHUNK_WORDS = 32
+MIN_CHUNK_WORDS = 4
+# ============================================================================
+# WORKER AND PERFORMANCE SETTINGS
+# ============================================================================
+MAX_WORKERS = 2
+TEST_MAX_WORKERS = 6                  # For experimentation
+USE_DYNAMIC_WORKERS = False           # Toggle for testing
+VRAM_SAFETY_THRESHOLD = 6.5           # GB
+# ============================================================================
+# AUDIO QUALITY SETTINGS
+# ============================================================================
+ENABLE_MID_DROP_CHECK = False
+ENABLE_ASR = False  # Disabled by default due to tensor dimension errors
+ASR_WORKERS = 4                       # Parallel ASR on CPU threads
+DEFAULT_ASR_MODEL = "base"            # Default Whisper model for ASR validation
+# ASR Model Memory Requirements (approximate)
+ASR_MODEL_VRAM_MB = {
+    "tiny": 39,
+    "base": 74,
+    "small": 244,
+    "medium": 769,
+    "large": 1550,
+    "large-v2": 1550,
+    "large-v3": 1550
+}
+ASR_MODEL_RAM_MB = {
+    "tiny": 150,
+    "base": 300,
+    "small": 800,
+    "medium": 2000,
+    "large": 4000,
+    "large-v2": 4000,
+    "large-v3": 4000
+}
+# ============================================================================
+# TTS HUM DETECTION SETTINGS
+# ============================================================================
+ENABLE_HUM_DETECTION = False
+HUM_FREQ_MIN = 50                     # Hz - Lower frequency bound for hum detection
+HUM_FREQ_MAX = 200                    # Hz - Upper frequency bound for hum detection
+HUM_ENERGY_THRESHOLD = 0.3            # Ratio of hum energy to total energy (0.1-0.5 range)
+HUM_STEADY_THRESHOLD = 0.6            # Ratio of segments with steady amplitude (0.5-0.8 range)
+HUM_AMPLITUDE_MIN = 0.005             # Minimum RMS for steady hum detection
+HUM_AMPLITUDE_MAX = 0.1               # Maximum RMS for steady hum detection
+# ============================================================================
+# AUDIO TRIMMING SETTINGS
+# ============================================================================
+ENABLE_AUDIO_TRIMMING = True
+SPEECH_ENDPOINT_THRESHOLD = 0.006
+TRIMMING_BUFFER_MS = 50
+# ============================================================================
+# SILENCE DURATION SETTINGS (milliseconds)
+# ============================================================================
+SILENCE_CHAPTER_START = 1195
+SILENCE_CHAPTER_END = 1100
+SILENCE_SECTION_BREAK = 700
+SILENCE_PARAGRAPH_END = 1000
+# Punctuation-specific silence settings (milliseconds)
+SILENCE_COMMA = 150
+SILENCE_SEMICOLON = 150               # Medium pause after semicolons
+SILENCE_COLON = 150                   # Pause after colons
+SILENCE_PERIOD = 500
+SILENCE_QUESTION_MARK = 500
+SILENCE_EXCLAMATION = 200
+SILENCE_DASH = 200                    # Em dash pause
+SILENCE_ELLIPSIS = 80                # Ellipsis pause (suspense)
+SILENCE_QUOTE_END = 150               # End of quoted speech
+# Chunk-level silence settings
+ENABLE_CHUNK_END_SILENCE = False
+CHUNK_END_SILENCE_MS = 200
+# Content boundary silence settings (milliseconds)
+SILENCE_PARAGRAPH_FALLBACK = 500      # Original paragraph logic fallback
+# ============================================================================
+# AUDIO NORMALIZATION SETTINGS
+# ============================================================================
+ENABLE_NORMALIZATION = True
+NORMALIZATION_TYPE = "peak"
+TARGET_LUFS = -16
+TARGET_PEAK_DB = -1.5
+TARGET_LRA = 11                       # Target loudness range for consistency
+# ============================================================================
+# AUDIO PLAYBACK SPEED SETTINGS
+# ============================================================================
+ATEMPO_SPEED = 1.0
+# ============================================================================
+# ENVIRONMENT SETUP
+# ============================================================================
+os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "true"
+os.environ["TRANSFORMERS_NO_PROGRESS_BAR"] = "1"
+os.environ["HF_TRANSFORMERS_NO_TQDM"] = "1"
+# Cache handling is now done by launcher scripts:
+# - launch_gradio_local.sh: Sets shared cache for development
+# - launch_gradio.sh: Uses PyTorch defaults for containers/deployment
+# ============================================================================
+# COLOR CODES FOR TERMINAL OUTPUT
+# ============================================================================
+RESET = "\033[0m"
+BOLD = "\033[1m"
+RED = "\033[91m"
+GREEN = "\033[92m"
+YELLOW = "\033[93m"
+CYAN = "\033[96m"
+# ============================================================================
+# TTS MODEL PARAMETERS (DEFAULTS)
+# ============================================================================
+DEFAULT_EXAGGERATION = 0.5
+DEFAULT_CFG_WEIGHT = 0.5
+DEFAULT_TEMPERATURE = 0.85
+# Advanced Sampling Parameters (Min_P Sampler Support)
+DEFAULT_MIN_P = 0.05                   # Min probability threshold (0.0 disables)
+DEFAULT_TOP_P = 1.0                    # Top-p sampling (1.0 disables)
+DEFAULT_REPETITION_PENALTY = 1.2      # Repetition penalty (1.0 = no penalty)
+# ============================================================================
+# VADER SENTIMENT TO TTS PARAMETER MAPPING
+# ============================================================================
+# These settings control how VADER sentiment analysis dynamically adjusts TTS parameters.
+# The formula used is: new_param = base_param + (compound_score * sensitivity)
+# The result is then clamped within the defined MIN/MAX range.
+# --- Base TTS Parameters (used as the starting point) ---
+# These are the same as the main defaults, but listed here for clarity.
+BASE_EXAGGERATION = DEFAULT_EXAGGERATION  # Default: 1.0
+BASE_CFG_WEIGHT = DEFAULT_CFG_WEIGHT      # Default: 0.7
+BASE_TEMPERATURE = DEFAULT_TEMPERATURE    # Default: 0.7
+# --- Sensitivity ---
+# How much VADER's compound score affects each parameter.
+# Higher values mean more dramatic changes based on sentiment.
+VADER_EXAGGERATION_SENSITIVITY = 0.33
+VADER_CFG_WEIGHT_SENSITIVITY = 0.32
+VADER_TEMPERATURE_SENSITIVITY = 0.3
+VADER_MIN_P_SENSITIVITY = 0.01         # Reduced from 0.02 to prevent sampling issues
+VADER_REPETITION_PENALTY_SENSITIVITY = 0.05  # Reduced from 0.1 to be more conservative
+# --- Min/Max Clamps ---
+# Hard limits to prevent extreme, undesirable audio artifacts.
+TTS_PARAM_MIN_EXAGGERATION = 0.1
+TTS_PARAM_MAX_EXAGGERATION = 0.65
+TTS_PARAM_MIN_CFG_WEIGHT = 0.15
+TTS_PARAM_MAX_CFG_WEIGHT = 0.8
+TTS_PARAM_MIN_TEMPERATURE = 0.1
+TTS_PARAM_MAX_TEMPERATURE = 2.3499999999999988
+TTS_PARAM_MIN_MIN_P = 0.02             # Increased from 0.0 to prevent sampling issues
+TTS_PARAM_MAX_MIN_P = 0.3              # Reduced from MAX 0.5 to prevent over-restriction
+TTS_PARAM_MIN_TOP_P = 0.5              # Too low causes repetition
+TTS_PARAM_MAX_TOP_P = 1.0              # MAX 1.0 disables top_p
+TTS_PARAM_MIN_REPETITION_PENALTY = 1.0 # 1.0 = no penalty
+TTS_PARAM_MAX_REPETITION_PENALTY = 2.0 # Higher values too restrictive MAX 2
+# ============================================================================
+# BATCH PROCESSING SETTINGS
+# ============================================================================
+BATCH_SIZE = 400
+CLEANUP_INTERVAL = 500                # Deep cleanup every N chunks (reduced frequency for speed)
+# ============================================================================
+# QUALITY ENHANCEMENT SETTINGS (Phase 1)
+# ============================================================================
+# --- Regeneration Loop Settings ---
+ENABLE_REGENERATION_LOOP = True      # Enable automatic chunk regeneration on quality failure
+MAX_REGENERATION_ATTEMPTS = 3        # Maximum retry attempts per chunk
+QUALITY_THRESHOLD = 0.30              # TEMPORARILY LOWERED - Composite quality score threshold (0.0-1.0)
+# --- Sentiment Smoothing Settings ---
+ENABLE_SENTIMENT_SMOOTHING = True    # Re-enabled - GUI controls now working properly
+SENTIMENT_SMOOTHING_WINDOW = 3       # Number of previous chunks to consider
+SENTIMENT_SMOOTHING_METHOD = "rolling"  # "rolling" or "exp_decay"
+# Exponential decay weights for smoothing (used if method is "exp_decay")
+SENTIMENT_EXP_DECAY_WEIGHTS = [0.5, 0.3, 0.2]  # Most recent to oldest
+# --- Enhanced Anomaly Detection ---
+SPECTRAL_ANOMALY_THRESHOLD = 0.6     # Spectral anomaly score threshold (0.0-1.0)
+ENABLE_MFCC_VALIDATION = True        # Enable MFCC-based spectral analysis
+SPECTRAL_VARIANCE_LIMIT = 100.0      # Maximum spectral variance before flagging as artifact
+# --- Output Validation Settings ---
+ENABLE_OUTPUT_VALIDATION = True      # Enable quality control clearinghouse (runs individual checks when enabled)
+OUTPUT_VALIDATION_THRESHOLD = 0.6    # Minimum F1 score for output validation (reduced for punctuation tolerance)
+# --- Parameter Adjustment for Regeneration ---
+REGEN_TEMPERATURE_ADJUSTMENT = 0.1   # How much to adjust temperature per retry (increased for visibility)
+REGEN_EXAGGERATION_ADJUSTMENT = 0.15 # How much to adjust exaggeration per retry (increased for visibility)
+REGEN_CFG_ADJUSTMENT = 0.1           # How much to adjust cfg_weight per retry (increased for visibility)
+# ============================================================================
+# PERFORMANCE OPTIMIZATION SETTINGS
+# ============================================================================
+# Voice Embedding Caching - Cache voice embeddings to avoid recomputation
+ENABLE_VOICE_EMBEDDING_CACHE = True        # Enable voice embedding caching
+VOICE_CACHE_MEMORY_LIMIT_MB = 500          # Maximum memory for voice cache (MB)
+ENABLE_ADAPTIVE_VOICE_CACHE = True         # Adapt cache based on system memory
+# GPU Persistence Mode - Keep GPU in compute-ready state
+ENABLE_GPU_PERSISTENCE_MODE = False         # Try to enable GPU persistence mode
+GPU_PERSISTENCE_RETRY_COUNT = 3            # Retry attempts for persistence mode
+# CUDA Memory Pool - Advanced GPU memory management
+ENABLE_CUDA_MEMORY_POOL = True             # Enable CUDA memory pooling
+CUDA_MEMORY_POOL_FRACTION = 0.9            # Fraction of GPU memory to pool
+ENABLE_ADAPTIVE_MEMORY_POOL = True         # Adapt pool size to system
+# Producer-Consumer Pipeline - Eliminate chunk loading overhead
+ENABLE_PRODUCER_CONSUMER_PIPELINE = True   # Re-enabled with proper ETA tracking
+PIPELINE_QUEUE_SIZE_MULTIPLIER = 3         # Queue size = workers * multiplier
+PIPELINE_MAX_QUEUE_SIZE = 20               # Maximum queue size limit
+ENABLE_PIPELINE_FALLBACK = True            # Fall back to sequential if pipeline fails
+# ============================================================================
+# FEATURE TOGGLES
+# ============================================================================
+shutdown_requested = False           # Global shutdown flag

config/config.py~ ADDED Viewed

	@@ -0,0 +1,251 @@

+"""
+GenTTS Configuration Module
+Central location for all settings, paths, and feature toggles
+"""
+import os
+from pathlib import Path
+# ============================================================================
+# CORE DIRECTORIES
+# ============================================================================
+TEXT_INPUT_ROOT = Path("Text_Input")
+AUDIOBOOK_ROOT = Path("Audiobook")
+VOICE_SAMPLES_DIR = Path("Voice_Samples")
+# ============================================================================
+# TEXT PROCESSING SETTINGS
+# ============================================================================
+MAX_CHUNK_WORDS = 32
+MIN_CHUNK_WORDS = 4
+# ============================================================================
+# WORKER AND PERFORMANCE SETTINGS
+# ============================================================================
+MAX_WORKERS = 2
+TEST_MAX_WORKERS = 6                  # For experimentation
+USE_DYNAMIC_WORKERS = False           # Toggle for testing
+VRAM_SAFETY_THRESHOLD = 6.5           # GB
+# ============================================================================
+# AUDIO QUALITY SETTINGS
+# ============================================================================
+ENABLE_MID_DROP_CHECK = False
+ENABLE_ASR = False  # Disabled by default due to tensor dimension errors
+ASR_WORKERS = 4                       # Parallel ASR on CPU threads
+DEFAULT_ASR_MODEL = "base"            # Default Whisper model for ASR validation
+# ASR Model Memory Requirements (approximate)
+ASR_MODEL_VRAM_MB = {
+    "tiny": 39,
+    "base": 74,
+    "small": 244,
+    "medium": 769,
+    "large": 1550,
+    "large-v2": 1550,
+    "large-v3": 1550
+}
+ASR_MODEL_RAM_MB = {
+    "tiny": 150,
+    "base": 300,
+    "small": 800,
+    "medium": 2000,
+    "large": 4000,
+    "large-v2": 4000,
+    "large-v3": 4000
+}
+# ============================================================================
+# TTS HUM DETECTION SETTINGS
+# ============================================================================
+ENABLE_HUM_DETECTION = False
+HUM_FREQ_MIN = 50                     # Hz - Lower frequency bound for hum detection
+HUM_FREQ_MAX = 200                    # Hz - Upper frequency bound for hum detection
+HUM_ENERGY_THRESHOLD = 0.3            # Ratio of hum energy to total energy (0.1-0.5 range)
+HUM_STEADY_THRESHOLD = 0.6            # Ratio of segments with steady amplitude (0.5-0.8 range)
+HUM_AMPLITUDE_MIN = 0.005             # Minimum RMS for steady hum detection
+HUM_AMPLITUDE_MAX = 0.1               # Maximum RMS for steady hum detection
+# ============================================================================
+# AUDIO TRIMMING SETTINGS
+# ============================================================================
+ENABLE_AUDIO_TRIMMING = True
+SPEECH_ENDPOINT_THRESHOLD = 0.006
+TRIMMING_BUFFER_MS = 50
+# ============================================================================
+# SILENCE DURATION SETTINGS (milliseconds)
+# ============================================================================
+SILENCE_CHAPTER_START = 1195
+SILENCE_CHAPTER_END = 1100
+SILENCE_SECTION_BREAK = 700
+SILENCE_PARAGRAPH_END = 1000
+# Punctuation-specific silence settings (milliseconds)
+SILENCE_COMMA = 150
+SILENCE_SEMICOLON = 150               # Medium pause after semicolons
+SILENCE_COLON = 150                   # Pause after colons
+SILENCE_PERIOD = 500
+SILENCE_QUESTION_MARK = 500
+SILENCE_EXCLAMATION = 200
+SILENCE_DASH = 200                    # Em dash pause
+SILENCE_ELLIPSIS = 80                # Ellipsis pause (suspense)
+SILENCE_QUOTE_END = 150               # End of quoted speech
+# Chunk-level silence settings
+ENABLE_CHUNK_END_SILENCE = False
+CHUNK_END_SILENCE_MS = 200
+# Content boundary silence settings (milliseconds)
+SILENCE_PARAGRAPH_FALLBACK = 500      # Original paragraph logic fallback
+# ============================================================================
+# AUDIO NORMALIZATION SETTINGS
+# ============================================================================
+ENABLE_NORMALIZATION = True
+NORMALIZATION_TYPE = "peak"
+TARGET_LUFS = -16
+TARGET_PEAK_DB = -1.5
+TARGET_LRA = 11                       # Target loudness range for consistency
+# ============================================================================
+# AUDIO PLAYBACK SPEED SETTINGS
+# ============================================================================
+ATEMPO_SPEED = 1.0
+# ============================================================================
+# ENVIRONMENT SETUP
+# ============================================================================
+os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "true"
+os.environ["TRANSFORMERS_NO_PROGRESS_BAR"] = "1"
+os.environ["HF_TRANSFORMERS_NO_TQDM"] = "1"
+# Cache handling is now done by launcher scripts:
+# - launch_gradio_local.sh: Sets shared cache for development
+# - launch_gradio.sh: Uses PyTorch defaults for containers/deployment
+# ============================================================================
+# COLOR CODES FOR TERMINAL OUTPUT
+# ============================================================================
+RESET = "\033[0m"
+BOLD = "\033[1m"
+RED = "\033[91m"
+GREEN = "\033[92m"
+YELLOW = "\033[93m"
+CYAN = "\033[96m"
+# ============================================================================
+# TTS MODEL PARAMETERS (DEFAULTS)
+# ============================================================================
+DEFAULT_EXAGGERATION = 0.5
+DEFAULT_CFG_WEIGHT = 0.5
+DEFAULT_TEMPERATURE = 0.85
+# Advanced Sampling Parameters (Min_P Sampler Support)
+DEFAULT_MIN_P = 0.05                   # Min probability threshold (0.0 disables)
+DEFAULT_TOP_P = 1.0                    # Top-p sampling (1.0 disables)
+DEFAULT_REPETITION_PENALTY = 1.2      # Repetition penalty (1.0 = no penalty)
+# ============================================================================
+# VADER SENTIMENT TO TTS PARAMETER MAPPING
+# ============================================================================
+# These settings control how VADER sentiment analysis dynamically adjusts TTS parameters.
+# The formula used is: new_param = base_param + (compound_score * sensitivity)
+# The result is then clamped within the defined MIN/MAX range.
+# --- Base TTS Parameters (used as the starting point) ---
+# These are the same as the main defaults, but listed here for clarity.
+BASE_EXAGGERATION = DEFAULT_EXAGGERATION  # Default: 1.0
+BASE_CFG_WEIGHT = DEFAULT_CFG_WEIGHT      # Default: 0.7
+BASE_TEMPERATURE = DEFAULT_TEMPERATURE    # Default: 0.7
+# --- Sensitivity ---
+# How much VADER's compound score affects each parameter.
+# Higher values mean more dramatic changes based on sentiment.
+VADER_EXAGGERATION_SENSITIVITY = 0.33
+VADER_CFG_WEIGHT_SENSITIVITY = 0.32
+VADER_TEMPERATURE_SENSITIVITY = 0.3
+VADER_MIN_P_SENSITIVITY = 0.01         # Reduced from 0.02 to prevent sampling issues
+VADER_REPETITION_PENALTY_SENSITIVITY = 0.05  # Reduced from 0.1 to be more conservative
+# --- Min/Max Clamps ---
+# Hard limits to prevent extreme, undesirable audio artifacts.
+TTS_PARAM_MIN_EXAGGERATION = 0.1
+TTS_PARAM_MAX_EXAGGERATION = 0.65
+TTS_PARAM_MIN_CFG_WEIGHT = 0.15
+TTS_PARAM_MAX_CFG_WEIGHT = 0.8
+TTS_PARAM_MIN_TEMPERATURE = 0.1
+TTS_PARAM_MAX_TEMPERATURE = 2.3499999999999988
+TTS_PARAM_MIN_MIN_P = 0.02             # Increased from 0.0 to prevent sampling issues
+TTS_PARAM_MAX_MIN_P = 0.3              # Reduced from MAX 0.5 to prevent over-restriction
+TTS_PARAM_MIN_TOP_P = 0.5              # Too low causes repetition
+TTS_PARAM_MAX_TOP_P = 1.0              # MAX 1.0 disables top_p
+TTS_PARAM_MIN_REPETITION_PENALTY = 1.0 # 1.0 = no penalty
+TTS_PARAM_MAX_REPETITION_PENALTY = 2.0 # Higher values too restrictive MAX 2
+# ============================================================================
+# BATCH PROCESSING SETTINGS
+# ============================================================================
+BATCH_SIZE = 400
+CLEANUP_INTERVAL = 500                # Deep cleanup every N chunks (reduced frequency for speed)
+# ============================================================================
+# QUALITY ENHANCEMENT SETTINGS (Phase 1)
+# ============================================================================
+# --- Regeneration Loop Settings ---
+ENABLE_REGENERATION_LOOP = True      # Enable automatic chunk regeneration on quality failure
+MAX_REGENERATION_ATTEMPTS = 3        # Maximum retry attempts per chunk
+QUALITY_THRESHOLD = 0.30              # TEMPORARILY LOWERED - Composite quality score threshold (0.0-1.0)
+# --- Sentiment Smoothing Settings ---
+ENABLE_SENTIMENT_SMOOTHING = True    # Re-enabled - GUI controls now working properly
+SENTIMENT_SMOOTHING_WINDOW = 3       # Number of previous chunks to consider
+SENTIMENT_SMOOTHING_METHOD = "rolling"  # "rolling" or "exp_decay"
+# Exponential decay weights for smoothing (used if method is "exp_decay")
+SENTIMENT_EXP_DECAY_WEIGHTS = [0.5, 0.3, 0.2]  # Most recent to oldest
+# --- Enhanced Anomaly Detection ---
+SPECTRAL_ANOMALY_THRESHOLD = 0.6     # Spectral anomaly score threshold (0.0-1.0)
+ENABLE_MFCC_VALIDATION = True        # Enable MFCC-based spectral analysis
+SPECTRAL_VARIANCE_LIMIT = 100.0      # Maximum spectral variance before flagging as artifact
+# --- Output Validation Settings ---
+ENABLE_OUTPUT_VALIDATION = True      # Enable quality control clearinghouse (runs individual checks when enabled)
+OUTPUT_VALIDATION_THRESHOLD = 0.6    # Minimum F1 score for output validation (reduced for punctuation tolerance)
+# --- Parameter Adjustment for Regeneration ---
+REGEN_TEMPERATURE_ADJUSTMENT = 0.1   # How much to adjust temperature per retry (increased for visibility)
+REGEN_EXAGGERATION_ADJUSTMENT = 0.15 # How much to adjust exaggeration per retry (increased for visibility)
+REGEN_CFG_ADJUSTMENT = 0.1           # How much to adjust cfg_weight per retry (increased for visibility)
+# ============================================================================
+# PERFORMANCE OPTIMIZATION SETTINGS
+# ============================================================================
+# Voice Embedding Caching - Cache voice embeddings to avoid recomputation
+ENABLE_VOICE_EMBEDDING_CACHE = True        # Enable voice embedding caching
+VOICE_CACHE_MEMORY_LIMIT_MB = 500          # Maximum memory for voice cache (MB)
+ENABLE_ADAPTIVE_VOICE_CACHE = True         # Adapt cache based on system memory
+# GPU Persistence Mode - Keep GPU in compute-ready state
+ENABLE_GPU_PERSISTENCE_MODE = False         # Try to enable GPU persistence mode
+GPU_PERSISTENCE_RETRY_COUNT = 3            # Retry attempts for persistence mode
+# CUDA Memory Pool - Advanced GPU memory management
+ENABLE_CUDA_MEMORY_POOL = True             # Enable CUDA memory pooling
+CUDA_MEMORY_POOL_FRACTION = 0.9            # Fraction of GPU memory to pool
+ENABLE_ADAPTIVE_MEMORY_POOL = True         # Adapt pool size to system
+# Producer-Consumer Pipeline - Eliminate chunk loading overhead
+ENABLE_PRODUCER_CONSUMER_PIPELINE = True   # Re-enabled with proper ETA tracking
+PIPELINE_QUEUE_SIZE_MULTIPLIER = 3         # Queue size = workers * multiplier
+PIPELINE_MAX_QUEUE_SIZE = 20               # Maximum queue size limit
+ENABLE_PIPELINE_FALLBACK = True            # Fall back to sequential if pipeline fails
+# ============================================================================
+# FEATURE TOGGLES
+# ============================================================================
+shutdown_requested = False           # Global shutdown flag

gradio_main_interface.py CHANGED Viewed

@@ -16,14 +16,15 @@ ARCHITECTURE:
 AVAILABLE TABS:
 1. Convert Book (Tab 1) - FUNCTIONAL: Main TTS conversion interface
-2. Quick Convert (Tab 2) - PLACEHOLDER: Fast conversion for small texts
 3. Voice Analysis (Tab 3) - PLACEHOLDER: Voice sample analysis tools
-4. Batch Processing (Tab 4) - PLACEHOLDER: Multi-book processing
-5. Audio Tools (Tab 5) - PLACEHOLDER: Audio editing and enhancement
 6. Settings (Tab 6) - FUNCTIONAL: Configuration management
-7. Chunk Tools (Tab 7) - PLACEHOLDER: Chunk editing and repair
-8. Voice Training (Tab 8) - PLACEHOLDER: Voice cloning tools
-9. System Monitor (Tab 9) - PLACEHOLDER: Performance monitoring
 DEPLOYMENT MODES:
 - LOCAL: python3 gradio_main_interface.py (development)
@@ -96,6 +97,13 @@ except ImportError as e:
     print(f"⚠️  Tab 8 (JSON Generate) not available: {e}")
     TAB8_AVAILABLE = False
 def create_placeholder_tab(tab_name, tab_number):
     """Create a placeholder tab for future implementation"""
     with gr.Column():
@@ -185,11 +193,19 @@ def create_main_interface():
                 with gr.Tab("8. JSON Generate"):
                     create_placeholder_tab("JSON Generate", 8)
-            with gr.Tab("9. System Monitor"):
-                create_placeholder_tab("System Monitor", 9)
-            with gr.Tab("10. About"):
-                create_placeholder_tab("About", 10)
         # Footer
         gr.Markdown("""
@@ -211,6 +227,7 @@ def launch_interface():
     print(f"   Tab 6 (Settings): {'✅ Available' if TAB6_AVAILABLE else '❌ Not Available'}")
     print(f"   Tab 7 (Chunk Tools): {'✅ Available' if TAB7_AVAILABLE else '❌ Not Available'}")
     print(f"   Tab 8 (JSON Generate): {'✅ Available' if TAB8_AVAILABLE else '❌ Not Available'}")
     print("   Other Tabs: 🚧 Placeholder (Coming Soon)")
     print("-" * 50)

 AVAILABLE TABS:
 1. Convert Book (Tab 1) - FUNCTIONAL: Main TTS conversion interface
+2. Configuration (Tab 2) - FUNCTIONAL: System configuration settings
 3. Voice Analysis (Tab 3) - PLACEHOLDER: Voice sample analysis tools
+4. Combine Audio (Tab 4) - FUNCTIONAL: Audio file combination tools
+5. Prepare Text (Tab 5) - FUNCTIONAL: Text preparation and chunking
 6. Settings (Tab 6) - FUNCTIONAL: Configuration management
+7. Chunk Tools (Tab 7) - FUNCTIONAL: Chunk editing and repair
+8. JSON Generate (Tab 8) - FUNCTIONAL: Direct JSON-to-audiobook generation
+9. Diagnostics (Tab 9) - FUNCTIONAL: Parallel processing performance diagnostics
+10. System Monitor (Tab 10) - PLACEHOLDER: Performance monitoring
 DEPLOYMENT MODES:
 - LOCAL: python3 gradio_main_interface.py (development)
     print(f"⚠️  Tab 8 (JSON Generate) not available: {e}")
     TAB8_AVAILABLE = False
+try:
+    from gradio_tabs.tab_diagnostics import create_diagnostics_tab
+    TAB_DIAGNOSTICS_AVAILABLE = True
+except ImportError as e:
+    print(f"⚠️  Diagnostics tab not available: {e}")
+    TAB_DIAGNOSTICS_AVAILABLE = False
 def create_placeholder_tab(tab_name, tab_number):
     """Create a placeholder tab for future implementation"""
     with gr.Column():
                 with gr.Tab("8. JSON Generate"):
                     create_placeholder_tab("JSON Generate", 8)
+            # Tab 9: Diagnostics (Working)
+            if TAB_DIAGNOSTICS_AVAILABLE:
+                with gr.Tab("9. Diagnostics"):
+                    create_diagnostics_tab()
+            else:
+                with gr.Tab("9. Diagnostics"):
+                    create_placeholder_tab("System Diagnostics", 9)
+            with gr.Tab("10. System Monitor"):
+                create_placeholder_tab("System Monitor", 10)
+            with gr.Tab("11. About"):
+                create_placeholder_tab("About", 11)
         # Footer
         gr.Markdown("""
     print(f"   Tab 6 (Settings): {'✅ Available' if TAB6_AVAILABLE else '❌ Not Available'}")
     print(f"   Tab 7 (Chunk Tools): {'✅ Available' if TAB7_AVAILABLE else '❌ Not Available'}")
     print(f"   Tab 8 (JSON Generate): {'✅ Available' if TAB8_AVAILABLE else '❌ Not Available'}")
+    print(f"   Tab 9 (Diagnostics): {'✅ Available' if TAB_DIAGNOSTICS_AVAILABLE else '❌ Not Available'}")
     print("   Other Tabs: 🚧 Placeholder (Coming Soon)")
     print("-" * 50)

gradio_tabs/gradio_imports.py ADDED Viewed

	@@ -0,0 +1,85 @@

+#!/usr/bin/env python3
+"""
+Common import utilities for Gradio tabs - HuggingFace deployment compatibility
+"""
+import os
+import sys
+def safe_import(module_name, package=None):
+    """Safely import modules with HuggingFace deployment compatibility"""
+    try:
+        if package:
+            return __import__(f"{package}.{module_name}", fromlist=[module_name])
+        else:
+            return __import__(module_name, fromlist=[''])
+    except ImportError:
+        # Try adding parent directory to path for HuggingFace deployment
+        current_dir = os.path.dirname(__file__)
+        parent_dir = os.path.join(current_dir, '..')
+        if parent_dir not in sys.path:
+            sys.path.append(parent_dir)
+        try:
+            if package:
+                return __import__(f"{package}.{module_name}", fromlist=[module_name])
+            else:
+                return __import__(module_name, fromlist=[''])
+        except ImportError:
+            raise
+def safe_import_config():
+    """Safely import config module and return all config variables"""
+    try:
+        config_module = safe_import('config', 'config')
+        # Return dictionary of all config variables
+        return {name: getattr(config_module, name) for name in dir(config_module) if not name.startswith('_')}, True
+    except ImportError as e:
+        print(f"⚠️  Config not available: {e} - using defaults")
+        return {}, False
+def get_default_config():
+    """Return default configuration values for when config is not available"""
+    return {
+        'AUDIOBOOK_ROOT': 'Audiobook',
+        'TEXT_INPUT_ROOT': 'Text_Input',
+        'VOICE_SAMPLES_DIR': 'Voice_Samples',
+        'MAX_WORKERS': 2,
+        'BATCH_SIZE': 100,
+        'MIN_CHUNK_WORDS': 5,
+        'MAX_CHUNK_WORDS': 25,
+        'ENABLE_NORMALIZATION': True,
+        'TARGET_LUFS': -16,
+        'ENABLE_AUDIO_TRIMMING': True,
+        'SPEECH_ENDPOINT_THRESHOLD': 0.005,
+        'TRIMMING_BUFFER_MS': 100,
+        'TTS_PARAM_MIN_EXAGGERATION': 0.0,
+        'TTS_PARAM_MAX_EXAGGERATION': 2.0,
+        'TTS_PARAM_MIN_CFG_WEIGHT': 0.0,
+        'TTS_PARAM_MAX_CFG_WEIGHT': 1.0,
+        'TTS_PARAM_MIN_TEMPERATURE': 0.0,
+        'TTS_PARAM_MAX_TEMPERATURE': 5.0,
+        'DEFAULT_EXAGGERATION': 0.5,
+        'DEFAULT_CFG_WEIGHT': 0.5,
+        'DEFAULT_TEMPERATURE': 0.8,
+        'VADER_EXAGGERATION_SENSITIVITY': 0.3,
+        'VADER_CFG_WEIGHT_SENSITIVITY': 0.3,
+        'VADER_TEMPERATURE_SENSITIVITY': 0.3,
+        'SILENCE_CHAPTER_START': 1000,
+        'SILENCE_CHAPTER_END': 1500,
+        'SILENCE_SECTION_BREAK': 800,
+        'SILENCE_PARAGRAPH_END': 500,
+        'SILENCE_COMMA': 200,
+        'SILENCE_PERIOD': 400,
+        'SILENCE_QUESTION_MARK': 500,
+        'SILENCE_EXCLAMATION': 500,
+        'CHUNK_END_SILENCE_MS': 0,
+        'ENABLE_SENTIMENT_SMOOTHING': True,
+        'SENTIMENT_SMOOTHING_WINDOW': 3,
+        'SENTIMENT_SMOOTHING_METHOD': 'gaussian',
+        'BASE_EXAGGERATION': 0.5,
+        'BASE_CFG_WEIGHT': 0.5,
+        'BASE_TEMPERATURE': 0.8,
+        'DEFAULT_MIN_P': 0.1,
+        'DEFAULT_TOP_P': 0.9,
+        'DEFAULT_REPETITION_PENALTY': 1.0
+    }

gradio_tabs/tab1_convert_book.py CHANGED Viewed

@@ -71,8 +71,7 @@ conversion_state = {
     'vram_usage': '-- GB',
     'current_chunk': '--',
     'eta': '--',
-    'elapsed': '--',
-    'needs_refresh': False
 }
 def parse_progress_stats(output_line):
@@ -533,6 +532,14 @@ def create_convert_book_tab():
                     value=2.0,
                     info="Reduce repetition"
                 )
         # Action Buttons and Status
         with gr.Row():
@@ -779,7 +786,8 @@ def create_convert_book_tab():
                         regen_enabled_val, max_attempts_val, quality_thresh_val,
                         sentiment_smooth_val, smooth_window_val, smooth_method_val,
                         mfcc_val, output_val, spectral_thresh_val, output_thresh_val,
-                        exag_val, cfg_val, temp_val, min_p_val, top_p_val, rep_penalty_val):
         """Start the actual book conversion - file upload version"""
         # Validation
@@ -882,7 +890,8 @@ def create_convert_book_tab():
             'vader_enabled': vader_val,
             'asr_enabled': asr_val,
             'asr_config': asr_config,
-            'add_to_batch': add_to_batch_val
         }
         # Set conversion state
@@ -905,8 +914,6 @@ def create_convert_book_tab():
                     if result['success']:
                         conversion_state['status'] = '✅ Conversion completed successfully!'
                         conversion_state['progress'] = 100
-                        # Trigger automatic refresh of audiobook dropdowns
-                        conversion_state['needs_refresh'] = True
                     else:
                         conversion_state['status'] = f"❌ Conversion failed: {result.get('error', 'Unknown error')}"
                         conversion_state['progress'] = 0
@@ -974,7 +981,8 @@ def create_convert_book_tab():
             regeneration_enabled, max_attempts, quality_threshold,
             sentiment_smoothing, smoothing_window, smoothing_method,
             mfcc_validation, output_validation, spectral_threshold, output_threshold,
-            exaggeration, cfg_weight, temperature, min_p, top_p, repetition_penalty
         ],
         outputs=[status_display, progress_display, audio_player, audiobook_selector, m4b_file_selector]
     )
@@ -1060,21 +1068,6 @@ def create_convert_book_tab():
             if audiobook_choices['choices']:
                 latest_audiobook = load_selected_audiobook(audiobook_choices['choices'][0])
-            return (
-                conversion_state['status'],
-                conversion_state['progress'],
-                latest_audiobook,
-                audiobook_choices,
-                m4b_choices
-            )
-        elif conversion_state.get('needs_refresh', False):
-            # Auto-refresh requested
-            conversion_state['needs_refresh'] = False
-            audiobook_choices, m4b_choices = update_audiobook_dropdowns_after_conversion()
-            latest_audiobook = None
-            if audiobook_choices['choices']:
-                latest_audiobook = load_selected_audiobook(audiobook_choices['choices'][0])
             return (
                 conversion_state['status'],
                 conversion_state['progress'],

     'vram_usage': '-- GB',
     'current_chunk': '--',
     'eta': '--',
+    'elapsed': '--'
 }
 def parse_progress_stats(output_line):
                     value=2.0,
                     info="Reduce repetition"
                 )
+                # NEW: TTS Inference Batch Size
+                tts_batch_size = gr.Slider(
+                    label="TTS Inference Batch Size (VADER Off)",
+                    minimum=1, maximum=64, step=1,
+                    value=16, # Default value
+                    info="Number of chunks to process simultaneously when VADER is disabled for speed."
+                )
         # Action Buttons and Status
         with gr.Row():
                         regen_enabled_val, max_attempts_val, quality_thresh_val,
                         sentiment_smooth_val, smooth_window_val, smooth_method_val,
                         mfcc_val, output_val, spectral_thresh_val, output_thresh_val,
+                        exag_val, cfg_val, temp_val, min_p_val, top_p_val, rep_penalty_val,
+                        tts_batch_size_val):
         """Start the actual book conversion - file upload version"""
         # Validation
             'vader_enabled': vader_val,
             'asr_enabled': asr_val,
             'asr_config': asr_config,
+            'add_to_batch': add_to_batch_val,
+            'tts_batch_size': tts_batch_size_val
         }
         # Set conversion state
                     if result['success']:
                         conversion_state['status'] = '✅ Conversion completed successfully!'
                         conversion_state['progress'] = 100
                     else:
                         conversion_state['status'] = f"❌ Conversion failed: {result.get('error', 'Unknown error')}"
                         conversion_state['progress'] = 0
             regeneration_enabled, max_attempts, quality_threshold,
             sentiment_smoothing, smoothing_window, smoothing_method,
             mfcc_validation, output_validation, spectral_threshold, output_threshold,
+            exaggeration, cfg_weight, temperature, min_p, top_p, repetition_penalty,
+            tts_batch_size
         ],
         outputs=[status_display, progress_display, audio_player, audiobook_selector, m4b_file_selector]
     )
             if audiobook_choices['choices']:
                 latest_audiobook = load_selected_audiobook(audiobook_choices['choices'][0])
             return (
                 conversion_state['status'],
                 conversion_state['progress'],

gradio_tabs/tab1_convert_book.py.20250811-120000.bak ADDED Viewed

	@@ -0,0 +1,1173 @@

+#!/usr/bin/env python3
+"""
+Gradio Tab 1: Convert Book
+Exact replica of PyQt5 GUI Tab 1 functionality
+"""
+import gradio as gr
+import os
+import sys
+import threading
+import subprocess
+import tempfile
+import json
+import warnings
+import re
+import time
+from pathlib import Path
+from typing import List, Dict, Any, Optional, Tuple
+# Suppress CUDA deprecation warnings
+warnings.filterwarnings("ignore", category=FutureWarning, message=".*torch.backends.cuda.sdp_kernel.*")
+warnings.filterwarnings("ignore", category=FutureWarning, message=".*sdp_kernel.*")
+# Import ChatterboxTTS modules and ensure all config variables are available
+# First set defaults, then try to import from config
+DEFAULT_EXAGGERATION = 0.4
+DEFAULT_CFG_WEIGHT = 0.5
+DEFAULT_TEMPERATURE = 0.9
+TTS_PARAM_MIN_EXAGGERATION = 0.0
+TTS_PARAM_MAX_EXAGGERATION = 2.0
+TTS_PARAM_MIN_CFG_WEIGHT = 0.0
+TTS_PARAM_MAX_CFG_WEIGHT = 1.0
+TTS_PARAM_MIN_TEMPERATURE = 0.0
+TTS_PARAM_MAX_TEMPERATURE = 5.0
+ENABLE_REGENERATION_LOOP = True
+MAX_REGENERATION_ATTEMPTS = 3
+QUALITY_THRESHOLD = 0.7
+ENABLE_SENTIMENT_SMOOTHING = True
+SENTIMENT_SMOOTHING_WINDOW = 3
+SENTIMENT_SMOOTHING_METHOD = "rolling"
+ENABLE_MFCC_VALIDATION = False
+ENABLE_OUTPUT_VALIDATION = False
+SPECTRAL_ANOMALY_THRESHOLD = 0.8
+OUTPUT_VALIDATION_THRESHOLD = 0.85
+# Try to import config and override defaults if available
+try:
+    from config.config import *
+    CONFIG_AVAILABLE = True
+    print("✅ Config loaded successfully")
+except ImportError:
+    print("⚠️  Config not available - using defaults")
+    CONFIG_AVAILABLE = False
+# Import the actual conversion functions from GUI
+try:
+    # We need to import the actual conversion logic
+    import importlib.util
+    gui_spec = importlib.util.spec_from_file_location("chatterbox_gui", "chatterbox_gui.py")
+    gui_module = importlib.util.module_from_spec(gui_spec)
+    # We'll access the GUI's conversion methods
+    GUI_AVAILABLE = True
+except Exception as e:
+    print(f"⚠️  GUI module not available: {e}")
+    GUI_AVAILABLE = False
+# Global state for conversion with enhanced stats
+conversion_state = {
+    'running': False,
+    'progress': 0,
+    'status': 'Ready',
+    'thread': None,
+    'realtime_factor': '--',
+    'vram_usage': '-- GB',
+    'current_chunk': '--',
+    'eta': '--',
+    'elapsed': '--'
+}
+def parse_progress_stats(output_line):
+    """Parse progress statistics from TTS engine output"""
+    # Look for progress pattern: "🌀 Chunk 2/13 | ⏱ Elapsed: 0:01:31 | ETA: 0:09:54 | Remaining: 0:08:23 | Realtime: 0.11x | VRAM: 3.3GB"
+    progress_pattern = r'🌀 Chunk (\d+)/(\d+).*?Realtime: ([\d.]+)x.*?VRAM: ([\d.]+)GB'
+    match = re.search(progress_pattern, output_line)
+    if match:
+        current_chunk = int(match.group(1))
+        total_chunks = int(match.group(2))
+        realtime_factor = f"{match.group(3)}x"
+        vram_usage = f"{match.group(4)} GB"
+        # Update global state
+        conversion_state['current_chunk'] = f"{current_chunk}/{total_chunks}"
+        conversion_state['realtime_factor'] = realtime_factor
+        conversion_state['vram_usage'] = vram_usage
+        conversion_state['progress'] = int((current_chunk / total_chunks) * 100) if total_chunks > 0 else 0
+        print(f"📊 Stats Updated: Chunk {current_chunk}/{total_chunks}, {realtime_factor}, {vram_usage}")
+        return True
+    else:
+        # Try alternative patterns in case the format is different
+        alt_pattern = r'Chunk (\d+)/(\d+).*?Realtime: ([\d.]+)x.*?VRAM: ([\d.]+)GB'
+        alt_match = re.search(alt_pattern, output_line)
+        if alt_match:
+            current_chunk = int(alt_match.group(1))
+            total_chunks = int(alt_match.group(2))
+            realtime_factor = f"{alt_match.group(3)}x"
+            vram_usage = f"{alt_match.group(4)} GB"
+            conversion_state['current_chunk'] = f"{current_chunk}/{total_chunks}"
+            conversion_state['realtime_factor'] = realtime_factor
+            conversion_state['vram_usage'] = vram_usage
+            conversion_state['progress'] = int((current_chunk / total_chunks) * 100) if total_chunks > 0 else 0
+            print(f"📊 Stats Updated: Chunk {current_chunk}/{total_chunks}, {realtime_factor}, {vram_usage}")
+            return True
+    return False
+def get_progress_stats():
+    """Get current progress statistics for UI update"""
+    return (
+        conversion_state['realtime_factor'],
+        conversion_state['vram_usage'],
+        conversion_state['current_chunk'],
+        conversion_state['progress']
+    )
+def get_book_folders():
+    """Get available book folders from Text_Input directory"""
+    text_input_dir = Path("Text_Input")
+    if not text_input_dir.exists():
+        return []
+    folders = []
+    for item in text_input_dir.iterdir():
+        if item.is_dir():
+            folders.append(item.name)  # Show only folder name, not full path
+    return sorted(folders)
+def get_text_files_in_folder(folder_name):
+    """Get text files in selected book folder"""
+    if not folder_name:
+        return []
+    # Build full path from folder name
+    folder = Path("Text_Input") / folder_name
+    if not folder.exists():
+        return []
+    text_files = []
+    for file in folder.glob("*.txt"):
+        text_files.append(file.name)
+    return sorted(text_files)
+def get_voice_samples():
+    """Get available voice samples from Voice_Samples directory"""
+    voice_dir = Path("Voice_Samples")
+    if not voice_dir.exists():
+        return []
+    voices = []
+    for file in voice_dir.glob("*.wav"):
+        voices.append(file.name)  # Show only filename, not full path
+    return sorted(voices)
+def find_generated_audiobook(book_folder_path, voice_sample_path):
+    """Find the generated audiobook files"""
+    try:
+        book_folder = Path(book_folder_path)
+        voice_file = Path(voice_sample_path)
+        voice_name = voice_file.stem
+        # Look in Output/ directory first (final audiobooks)
+        output_dir = Path("Output")
+        if output_dir.exists():
+            # Look for M4B files with voice name
+            for m4b_file in output_dir.glob(f"*[{voice_name}]*.m4b"):
+                if m4b_file.exists():
+                    return str(m4b_file), "M4B audiobook"
+            # Look for WAV files with voice name
+            for wav_file in output_dir.glob(f"*[{voice_name}]*.wav"):
+                if wav_file.exists():
+                    return str(wav_file), "WAV audiobook"
+        # Look in Audiobook/ directory (processing output)
+        audiobook_dir = Path("Audiobook") / book_folder.name
+        if audiobook_dir.exists():
+            # Look for M4B files
+            for m4b_file in audiobook_dir.glob(f"*[{voice_name}]*.m4b"):
+                if m4b_file.exists():
+                    return str(m4b_file), "M4B audiobook"
+            # Look for WAV files
+            for wav_file in audiobook_dir.glob(f"*[{voice_name}]*.wav"):
+                if wav_file.exists():
+                    return str(wav_file), "WAV audiobook"
+            # Look for combined files
+            for combined_file in audiobook_dir.glob("*_combined.*"):
+                if combined_file.suffix in ['.wav', '.m4b', '.mp3']:
+                    return str(combined_file), f"{combined_file.suffix.upper()[1:]} combined audiobook"
+        return None, "No audiobook found"
+    except Exception as e:
+        print(f"Error finding audiobook: {e}")
+        return None, f"Error: {str(e)}"
+def run_book_conversion(book_path, text_file_path, voice_path, tts_params, quality_params, config_params):
+    """Run the actual book conversion - Direct call to TTS engine with progress monitoring"""
+    try:
+        # Import the real TTS engine function directly (avoid interface.py)
+        from modules.tts_engine import process_book_folder
+        # Extract enable_asr from tts_params (matching GUI exactly)
+        enable_asr = tts_params.get('enable_asr', False)
+        print(f"🚀 Starting book conversion with GUI parameters")
+        print(f"📖 Book: {book_path}")
+        print(f"📄 Text file: {text_file_path}")
+        print(f"🎤 Voice: {voice_path}")
+        print(f"🎛️ TTS Params: {tts_params}")
+        print(f"🔬 Quality Params: {quality_params}")
+        print(f"⚙️ Config Params: {config_params}")
+        # Set up progress callback function
+        def progress_callback(current_chunk, total_chunks, realtime_factor, vram_usage):
+            """Callback function to update progress from TTS engine"""
+            conversion_state['current_chunk'] = f"{current_chunk}/{total_chunks}"
+            conversion_state['realtime_factor'] = f"{realtime_factor}x"
+            conversion_state['vram_usage'] = f"{vram_usage} GB"
+            conversion_state['progress'] = int((current_chunk / total_chunks) * 100) if total_chunks > 0 else 0
+            print(f"📊 Progress: {current_chunk}/{total_chunks} ({conversion_state['progress']}%) - {realtime_factor}x - {vram_usage}GB")
+        # Add progress callback to config params
+        config_params['progress_callback'] = progress_callback
+        # Convert string paths to Path objects (required by TTS engine)
+        book_dir_path = Path(book_path)
+        voice_path_obj = Path(voice_path)
+        # Auto-detect device with fallback to CPU
+        import torch
+        if torch.cuda.is_available():
+            device = "cuda"
+            print("✅ Using CUDA GPU for processing")
+        else:
+            device = "cpu"
+            print("💻 Using CPU for processing (no GPU available)")
+        # Direct call to TTS engine (function only accepts: book_dir, voice_path, tts_params, device, skip_cleanup)
+        result = process_book_folder(
+            book_dir=book_dir_path,
+            voice_path=voice_path_obj,
+            tts_params=tts_params,
+            device=device,
+            skip_cleanup=False
+        )
+        print(f"✅ Conversion completed successfully")
+        return {'success': True, 'result': result}
+    except Exception as e:
+        print(f"❌ Conversion failed: {e}")
+        import traceback
+        traceback.print_exc()
+        return {'success': False, 'error': str(e)}
+def regenerate_m4b_file(selected_m4b, playback_speed):
+    """Regenerate M4B file with new playback speed"""
+    if not selected_m4b:
+        return "❌ Please select an M4B file first", None
+    try:
+        print(f"🔄 Regenerating M4B: {selected_m4b} at {playback_speed}x speed")
+        # Import M4B regeneration tools
+        from tools.combine_only import apply_playback_speed_to_m4b
+        # Find the M4B file path
+        audiobook_root = Path("Audiobook")
+        m4b_path = None
+        for book_dir in audiobook_root.iterdir():
+            if book_dir.is_dir():
+                for m4b_file in book_dir.glob("*.m4b"):
+                    if m4b_file.name == selected_m4b:
+                        m4b_path = m4b_file
+                        break
+                if m4b_path:
+                    break
+        if not m4b_path:
+            return "❌ M4B file not found", None
+        # Create new filename with speed suffix
+        speed_suffix = f"_speed{playback_speed}x".replace(".", "p")
+        new_name = m4b_path.stem + speed_suffix + ".m4b"
+        output_path = m4b_path.parent / new_name
+        # Apply speed change
+        success = apply_playback_speed_to_m4b(str(m4b_path), str(output_path), playback_speed)
+        if success:
+            return f"✅ Regenerated M4B at {playback_speed}x speed: {new_name}", str(output_path)
+        else:
+            return "❌ Failed to regenerate M4B", None
+    except Exception as e:
+        print(f"❌ M4B regeneration failed: {e}")
+        return f"❌ Error: {str(e)}", None
+def create_convert_book_tab():
+    """Create Tab 1: Convert Book with all GUI functionality"""
+    with gr.Column():
+        gr.Markdown("# 🚀 Convert Book")
+        gr.Markdown("*Main TTS conversion functionality - matches GUI Tab 1*")
+        # Main Content Layout
+        with gr.Row():
+            # Left Column - File Uploads
+            with gr.Column(scale=2):
+                gr.Markdown("### 📚 Book Selection")
+                # Book text file upload only
+                text_file_upload = gr.File(
+                    label="📚 Upload Book Text File",
+                    file_types=[".txt"],
+                    file_count="single",
+                    interactive=True
+                )
+                gr.Markdown("### 🎤 Voice Selection")
+                # Single voice upload with integrated playback
+                voice_file_upload = gr.File(
+                    label="🎤 Upload Voice Sample",
+                    file_types=[".wav", ".mp3", ".m4a"],
+                    file_count="single",
+                    interactive=True
+                )
+                # Voice sample player (becomes active after upload)
+                voice_audio = gr.Audio(
+                    label="Voice Sample Preview",
+                    interactive=False,
+                    show_download_button=False,
+                    visible=False
+                )
+            # Right Column - All Settings
+            with gr.Column(scale=1):
+                gr.Markdown("### ⚙️ Quick Settings")
+                # VADER and ASR
+                vader_enabled = gr.Checkbox(
+                    label="Use VADER sentiment analysis",
+                    value=True,
+                    info="Adjust TTS params per chunk based on emotion"
+                )
+                # ASR System with intelligent model selection
+                with gr.Row():
+                    asr_enabled = gr.Checkbox(
+                        label="🎤 Enable ASR validation",
+                        value=False,
+                        info="Smart quality control with automatic model selection"
+                    )
+                # ASR Configuration (initially hidden)
+                with gr.Column(visible=False) as asr_config_group:
+                    gr.Markdown("#### 🔍 ASR Configuration")
+                    # System analysis display
+                    system_analysis = gr.Textbox(
+                        label="System Analysis",
+                        value="Click 'Analyze System' to detect capabilities",
+                        lines=3,
+                        interactive=False
+                    )
+                    analyze_system_btn = gr.Button(
+                        "🔍 Analyze System",
+                        size="sm",
+                        variant="secondary"
+                    )
+                    # ASR Level Selection
+                    asr_level = gr.Radio(
+                        label="ASR Quality Level",
+                        choices=[
+                            ("🟢 SAFE - Fast processing, basic accuracy", "safe"),
+                            ("🟡 MODERATE - Balanced speed/accuracy (recommended)", "moderate"),
+                            ("🔴 INSANE - Best accuracy, may stress system", "insane")
+                        ],
+                        value="moderate",
+                        info="Automatically selects best models for your system"
+                    )
+                    # Selected models display
+                    selected_models = gr.Textbox(
+                        label="Selected ASR Models",
+                        value="Select level to see model configuration",
+                        lines=2,
+                        interactive=False
+                    )
+                # Batch processing
+                add_to_batch = gr.Checkbox(
+                    label="📦 Add to batch queue",
+                    value=False,
+                    info="Queue for batch processing"
+                )
+                gr.Markdown("### 🔄 Regeneration Settings")
+                regeneration_enabled = gr.Checkbox(
+                    label="Enable automatic chunk regeneration",
+                    value=ENABLE_REGENERATION_LOOP,
+                    info="Retry failed chunks automatically"
+                )
+                max_attempts = gr.Slider(
+                    label="Max Attempts",
+                    minimum=1, maximum=10, step=1,
+                    value=MAX_REGENERATION_ATTEMPTS
+                )
+                quality_threshold = gr.Slider(
+                    label="Quality Threshold",
+                    minimum=0.1, maximum=1.0, step=0.05,
+                    value=QUALITY_THRESHOLD
+                )
+                gr.Markdown("### 📊 Sentiment Smoothing")
+                sentiment_smoothing = gr.Checkbox(
+                    label="Enable sentiment smoothing",
+                    value=ENABLE_SENTIMENT_SMOOTHING,
+                    info="Smooth emotional transitions"
+                )
+                smoothing_window = gr.Slider(
+                    label="Window Size",
+                    minimum=1, maximum=10, step=1,
+                    value=SENTIMENT_SMOOTHING_WINDOW
+                )
+                smoothing_method = gr.Dropdown(
+                    label="Smoothing Method",
+                    choices=["rolling", "exp_decay"],
+                    value=SENTIMENT_SMOOTHING_METHOD
+                )
+                gr.Markdown("### 🔍 Advanced Detection")
+                mfcc_validation = gr.Checkbox(
+                    label="MFCC spectral analysis",
+                    value=ENABLE_MFCC_VALIDATION,
+                    info="Advanced audio quality detection"
+                )
+                output_validation = gr.Checkbox(
+                    label="Output validation",
+                    value=ENABLE_OUTPUT_VALIDATION,
+                    info="Quality control clearinghouse for enabled checks"
+                )
+                spectral_threshold = gr.Slider(
+                    label="Spectral Threshold",
+                    minimum=0.1, maximum=1.0, step=0.05,
+                    value=SPECTRAL_ANOMALY_THRESHOLD
+                )
+                output_threshold = gr.Slider(
+                    label="Output Threshold",
+                    minimum=0.1, maximum=1.0, step=0.05,
+                    value=OUTPUT_VALIDATION_THRESHOLD
+                )
+        # TTS Parameters
+        with gr.Row():
+            with gr.Column():
+                gr.Markdown("### 🎛️ TTS Parameters")
+                exaggeration = gr.Slider(
+                    label="Exaggeration",
+                    minimum=TTS_PARAM_MIN_EXAGGERATION,
+                    maximum=TTS_PARAM_MAX_EXAGGERATION,
+                    step=0.1,
+                    value=DEFAULT_EXAGGERATION,
+                    info="Emotional intensity"
+                )
+                cfg_weight = gr.Slider(
+                    label="CFG Weight",
+                    minimum=TTS_PARAM_MIN_CFG_WEIGHT,
+                    maximum=TTS_PARAM_MAX_CFG_WEIGHT,
+                    step=0.1,
+                    value=DEFAULT_CFG_WEIGHT,
+                    info="Text faithfulness"
+                )
+                temperature = gr.Slider(
+                    label="Temperature",
+                    minimum=TTS_PARAM_MIN_TEMPERATURE,
+                    maximum=TTS_PARAM_MAX_TEMPERATURE,
+                    step=0.1,
+                    value=DEFAULT_TEMPERATURE,
+                    info="Creativity/randomness"
+                )
+            with gr.Column():
+                gr.Markdown("### ⚡ Advanced Sampling")
+                min_p = gr.Slider(
+                    label="Min-P",
+                    minimum=0.0, maximum=0.5, step=0.01,
+                    value=0.05,
+                    info="Minimum probability threshold"
+                )
+                top_p = gr.Slider(
+                    label="Top-P",
+                    minimum=0.5, maximum=1.0, step=0.1,
+                    value=1.0,
+                    info="Nucleus sampling"
+                )
+                repetition_penalty = gr.Slider(
+                    label="Repetition Penalty",
+                    minimum=1.0, maximum=3.0, step=0.1,
+                    value=2.0,
+                    info="Reduce repetition"
+                )
+                gr.Markdown("### ⚙️ Performance Settings")
+                max_workers = gr.Number(
+                    label="Max Workers",
+                    minimum=1, maximum=8, step=1,
+                    value=2,
+                    info="⚠️ Only increase above 2 if CPU/GPU utilization < 70%"
+                )
+        # Action Buttons and Status
+        with gr.Row():
+            with gr.Column(scale=2):
+                convert_btn = gr.Button(
+                    "🚀 Start Conversion",
+                    variant="primary",
+                    size="lg",
+                    interactive=True
+                )
+                # Status Display
+                status_display = gr.Textbox(
+                    label="Status",
+                    value="⏸ Ready",
+                    interactive=False,
+                    lines=1
+                )
+                progress_display = gr.Number(
+                    label="Progress %",
+                    value=0,
+                    interactive=False,
+                    precision=0
+                )
+            with gr.Column(scale=1):
+                gr.Markdown("### 📊 Processing Stats")
+                realtime_factor = gr.Textbox(
+                    label="Realtime Factor",
+                    value="--",
+                    interactive=False
+                )
+                vram_usage = gr.Textbox(
+                    label="VRAM Usage",
+                    value="-- GB",
+                    interactive=False
+                )
+                current_chunk = gr.Textbox(
+                    label="Current Chunk",
+                    value="--",
+                    interactive=False
+                )
+        # Regenerate M4B Section (moved above audiobook player)
+        with gr.Row():
+            with gr.Column():
+                gr.Markdown("### 🔄 Regenerate M4B")
+                with gr.Row():
+                    with gr.Column(scale=2):
+                        m4b_file_selector = gr.Dropdown(
+                            label="Select M4B File to Regenerate",
+                            choices=[],
+                            value=None,
+                            interactive=True,
+                            info="Choose from generated audiobook files"
+                        )
+                    with gr.Column(scale=1):
+                        playback_speed = gr.Slider(
+                            label="Playback Speed",
+                            minimum=0.5,
+                            maximum=2.0,
+                            step=0.1,
+                            value=1.0,
+                            info="Speed adjustment for regeneration"
+                        )
+                regenerate_m4b_btn = gr.Button(
+                    "🔄 Regenerate M4B",
+                    variant="secondary",
+                    size="lg"
+                )
+        # Generated Audiobook Player (simplified, play-only)
+        with gr.Row():
+            with gr.Column():
+                gr.Markdown("### 🎧 Generated Audiobook Player")
+                # Audiobook file selector dropdown
+                audiobook_selector = gr.Dropdown(
+                    label="Select Audiobook",
+                    choices=[],
+                    value=None,
+                    interactive=True,
+                    info="Choose from session audiobooks"
+                )
+                # Main audio player - play only, no upload
+                audio_player = gr.Audio(
+                    label="Audiobook Player",
+                    value=None,
+                    interactive=False,
+                    show_download_button=True,
+                    show_share_button=False,
+                    waveform_options=gr.WaveformOptions(
+                        show_controls=True,
+                        show_recording_waveform=False,
+                        skip_length=10
+                    )
+                )
+    # Event Handlers
+    def handle_voice_upload(voice_file):
+        """Handle voice file upload and show player"""
+        if voice_file is None:
+            return gr.update(value=None, visible=False)
+        # Show the voice player with uploaded file
+        return gr.update(value=voice_file, visible=True)
+    def get_session_audiobooks():
+        """Get list of M4B files from current session, sorted by creation time (newest first)"""
+        audiobooks = []
+        # Look in Audiobook directory for M4B files
+        audiobook_root = Path("Audiobook")
+        if audiobook_root.exists():
+            for book_dir in audiobook_root.iterdir():
+                if book_dir.is_dir():
+                    # Look for M4B files in book directory
+                    for m4b_file in book_dir.glob("*.m4b"):
+                        # Get creation time for sorting
+                        creation_time = m4b_file.stat().st_mtime
+                        audiobooks.append((str(m4b_file), m4b_file.name, creation_time))
+        # Also check Output directory
+        output_root = Path("Output")
+        if output_root.exists():
+            for m4b_file in output_root.glob("*.m4b"):
+                creation_time = m4b_file.stat().st_mtime
+                audiobooks.append((str(m4b_file), m4b_file.name, creation_time))
+        # Sort by creation time (newest first)
+        audiobooks.sort(key=lambda x: x[2], reverse=True)
+        # Return just path and name (drop creation time)
+        return [(ab[0], ab[1]) for ab in audiobooks]
+    def update_audiobook_dropdowns(latest_file=None):
+        """Update audiobook dropdowns - after conversion both show latest, after regeneration only playback updates"""
+        audiobooks = get_session_audiobooks()
+        choices = [ab[1] for ab in audiobooks]  # Just filenames for display
+        # Determine what to set as selected
+        if latest_file:
+            # Use specific file if provided
+            selected_file = latest_file
+        elif choices:
+            # Default to newest file (first in sorted list)
+            selected_file = choices[0]
+        else:
+            selected_file = None
+        return (
+            gr.update(choices=choices, value=selected_file),  # audiobook_selector (playback)
+            gr.update(choices=choices, value=selected_file)   # m4b_file_selector (regeneration source)
+        )
+    def update_audiobook_dropdowns_after_conversion():
+        """Update both dropdowns to show the newest generated file after conversion"""
+        return update_audiobook_dropdowns()
+    def update_playback_only(new_file_name):
+        """Update only the playback dropdown after regeneration"""
+        audiobooks = get_session_audiobooks()
+        choices = [ab[1] for ab in audiobooks]
+        return (
+            gr.update(choices=choices, value=new_file_name),  # audiobook_selector (playback) - new file
+            gr.update()  # m4b_file_selector (regeneration) - no change
+        )
+    def load_selected_audiobook(selected_audiobook):
+        """Load selected audiobook into player"""
+        if not selected_audiobook:
+            return None
+        # Find the full path for the selected audiobook
+        audiobooks = get_session_audiobooks()
+        for full_path, filename in audiobooks:
+            if filename == selected_audiobook:
+                return full_path
+        return None
+    def handle_asr_toggle(asr_enabled_val):
+        """Show/hide ASR configuration when ASR is toggled"""
+        return gr.update(visible=asr_enabled_val)
+    def analyze_system():
+        """Analyze system capabilities and return summary"""
+        try:
+            from modules.system_detector import get_system_profile, print_system_summary, categorize_system
+            profile = get_system_profile()
+            categories = categorize_system(profile)
+            summary = f"🖥️ System Profile:\n"
+            summary += f"VRAM: {profile['gpu']['total_mb']:,}MB total, {profile['available_vram_after_tts']:,}MB available after TTS ({categories['vram']} class)\n"
+            summary += f"RAM: {profile['ram']['total_mb']:,}MB total, {profile['ram']['available_mb']:,}MB available ({categories['ram']} class)\n"
+            summary += f"CPU: {profile['cpu_cores']} cores ({categories['cpu']} class)"
+            if not profile['has_gpu']:
+                summary += f"\n⚠️ No CUDA GPU detected - ASR will run on CPU only"
+            return summary
+        except Exception as e:
+            return f"❌ Error analyzing system: {str(e)}"
+    def update_asr_models(asr_level_val):
+        """Update ASR model display based on selected level"""
+        try:
+            from modules.system_detector import get_system_profile, recommend_asr_models
+            profile = get_system_profile()
+            recommendations = recommend_asr_models(profile)
+            if asr_level_val not in recommendations:
+                return "❌ Invalid ASR level selected"
+            config = recommendations[asr_level_val]
+            primary = config['primary']
+            fallback = config['fallback']
+            result = f"Primary: {primary['model']} on {primary['device'].upper()}\n"
+            result += f"Fallback: {fallback['model']} on {fallback['device'].upper()}"
+            if asr_level_val == 'insane':
+                result += f"\n⚠️ WARNING: INSANE mode may cause memory pressure"
+            return result
+        except Exception as e:
+            return f"❌ Error getting models: {str(e)}"
+    def start_conversion(text_file_upload, voice_file_upload,
+                        vader_val, asr_val, asr_level_val, add_to_batch_val,
+                        regen_enabled_val, max_attempts_val, quality_thresh_val,
+                        sentiment_smooth_val, smooth_window_val, smooth_method_val,
+                        mfcc_val, output_val, spectral_thresh_val, output_thresh_val,
+                        exag_val, cfg_val, temp_val, min_p_val, top_p_val, rep_penalty_val,
+                        max_workers_val):
+        """Start the actual book conversion - file upload version"""
+        # Validation
+        if not text_file_upload:
+            return "❌ Please upload a text file", 0, None, None
+        if not voice_file_upload:
+            return "❌ Please upload a voice sample", 0, None, None
+        # Check if already running
+        if conversion_state['running']:
+            return "⚠️ Conversion already in progress", conversion_state['progress'], None, None
+        try:
+            # Create temporary book structure from uploads
+            import tempfile
+            import shutil
+            from datetime import datetime
+            # Generate unique book name from text file
+            text_filename = Path(text_file_upload).name
+            book_name = text_filename.replace('.txt', '').replace(' ', '_')
+            timestamp = datetime.now().strftime("%H%M%S")
+            unique_book_name = f"{book_name}_{timestamp}"
+            # Create directory structure
+            text_input_dir = Path("Text_Input")
+            text_input_dir.mkdir(exist_ok=True)
+            book_dir = text_input_dir / unique_book_name
+            book_dir.mkdir(exist_ok=True)
+            # Copy uploaded files to expected locations
+            text_dest = book_dir / f"{unique_book_name}.txt"
+            shutil.copy2(text_file_upload, text_dest)
+            voice_samples_dir = Path("Voice_Samples")
+            voice_samples_dir.mkdir(exist_ok=True)
+            voice_filename = Path(voice_file_upload).name
+            voice_dest = voice_samples_dir / voice_filename
+            shutil.copy2(voice_file_upload, voice_dest)
+            print(f"📁 Created book structure: {book_dir}")
+            print(f"📄 Text file: {text_dest}")
+            print(f"🎤 Voice file: {voice_dest}")
+        except Exception as e:
+            return f"❌ Error setting up files: {e}", 0, None, None
+        # Build ASR configuration first
+        asr_config = {'enabled': False}
+        if asr_val:
+            try:
+                from modules.system_detector import get_system_profile, recommend_asr_models
+                profile = get_system_profile()
+                recommendations = recommend_asr_models(profile)
+                if asr_level_val in recommendations:
+                    selected_config = recommendations[asr_level_val]
+                    primary = selected_config['primary']
+                    fallback = selected_config['fallback']
+                    asr_config = {
+                        'enabled': True,
+                        'level': asr_level_val,
+                        'primary_model': primary['model'],
+                        'primary_device': primary['device'],
+                        'fallback_model': fallback['model'],
+                        'fallback_device': fallback['device']
+                    }
+            except Exception as e:
+                print(f"⚠️ Error configuring ASR: {e}")
+                asr_config = {'enabled': False}
+        # Prepare parameters (matching GUI structure exactly)
+        tts_params = {
+            'exaggeration': exag_val,
+            'cfg_weight': cfg_val,
+            'temperature': temp_val,
+            'min_p': min_p_val,
+            'top_p': top_p_val,
+            'repetition_penalty': rep_penalty_val,
+            'enable_asr': asr_config.get('enabled', False),  # Match GUI pattern
+            'max_workers': int(max_workers_val)  # User-defined worker count
+        }
+        quality_params = {
+            'regeneration_enabled': regen_enabled_val,
+            'max_attempts': max_attempts_val,
+            'quality_threshold': quality_thresh_val,
+            'sentiment_smoothing': sentiment_smooth_val,
+            'smoothing_window': smooth_window_val,
+            'smoothing_method': smooth_method_val,
+            'mfcc_validation': mfcc_val,
+            'output_validation': output_val,
+            'spectral_threshold': spectral_thresh_val,
+            'output_threshold': output_thresh_val
+        }
+        config_params = {
+            'vader_enabled': vader_val,
+            'asr_enabled': asr_val,
+            'asr_config': asr_config,
+            'add_to_batch': add_to_batch_val
+        }
+        # Set conversion state
+        conversion_state['running'] = True
+        conversion_state['progress'] = 0
+        conversion_state['status'] = 'Starting conversion...'
+        conversion_state['current_book'] = book_dir.name  # Track current book
+        try:
+            # Run conversion using the modular backend in a separate thread
+            import threading
+            def run_conversion_thread():
+                try:
+                    result = run_book_conversion(
+                        str(book_dir), str(text_dest), str(voice_dest),
+                        tts_params, quality_params, config_params
+                    )
+                    if result['success']:
+                        conversion_state['status'] = '🎉 CONVERSION COMPLETE! M4B audiobook ready for playback.'
+                        conversion_state['progress'] = 100
+                        conversion_state['auto_refresh_needed'] = True  # Flag for auto-refresh
+                    else:
+                        conversion_state['status'] = f"❌ Conversion failed: {result.get('error', 'Unknown error')}"
+                        conversion_state['progress'] = 0
+                except Exception as e:
+                    conversion_state['status'] = f"❌ Error: {str(e)}"
+                    conversion_state['progress'] = 0
+                finally:
+                    conversion_state['running'] = False
+            # Start conversion thread
+            thread = threading.Thread(target=run_conversion_thread)
+            thread.start()
+            # Return immediate response - user will need to refresh to see final results
+            return (
+                "🚀 Conversion started in background...",
+                5,  # Initial progress
+                None,
+                gr.update(),
+                gr.update()
+            )
+        except Exception as e:
+            conversion_state['status'] = f"❌ Error: {str(e)}"
+            return conversion_state['status'], 0, None, gr.update(), gr.update()
+        finally:
+            conversion_state['running'] = False
+    # Connect event handlers
+    # ASR event handlers
+    asr_enabled.change(
+        handle_asr_toggle,
+        inputs=[asr_enabled],
+        outputs=[asr_config_group]
+    )
+    analyze_system_btn.click(
+        analyze_system,
+        inputs=[],
+        outputs=[system_analysis]
+    )
+    asr_level.change(
+        update_asr_models,
+        inputs=[asr_level],
+        outputs=[selected_models]
+    )
+    # Voice upload handler
+    voice_file_upload.change(
+        handle_voice_upload,
+        inputs=[voice_file_upload],
+        outputs=[voice_audio]
+    )
+    # Main conversion handler
+    convert_btn.click(
+        start_conversion,
+        inputs=[
+            text_file_upload, voice_file_upload,
+            vader_enabled, asr_enabled, asr_level, add_to_batch,
+            regeneration_enabled, max_attempts, quality_threshold,
+            sentiment_smoothing, smoothing_window, smoothing_method,
+            mfcc_validation, output_validation, spectral_threshold, output_threshold,
+            exaggeration, cfg_weight, temperature, min_p, top_p, repetition_penalty,
+            max_workers
+        ],
+        outputs=[status_display, progress_display, audio_player, audiobook_selector, m4b_file_selector]
+    )
+    # Audiobook selector handler
+    audiobook_selector.change(
+        load_selected_audiobook,
+        inputs=[audiobook_selector],
+        outputs=[audio_player]
+    )
+    # M4B regeneration handler
+    def handle_m4b_regeneration(selected_m4b, speed):
+        """Handle M4B regeneration and update player"""
+        status_msg, new_m4b_path = regenerate_m4b_file(selected_m4b, speed)
+        if new_m4b_path:
+            # Load the new M4B in the player
+            new_file_name = Path(new_m4b_path).name
+            new_audio = load_selected_audiobook(new_file_name)
+            # Update only playback dropdown, keep regeneration dropdown on source file
+            audiobook_choices, m4b_choices = update_playback_only(new_file_name)
+            return status_msg, new_audio, audiobook_choices, m4b_choices
+        else:
+            return status_msg, None, gr.update(), gr.update()
+    regenerate_m4b_btn.click(
+        handle_m4b_regeneration,
+        inputs=[m4b_file_selector, playback_speed],
+        outputs=[status_display, audio_player, audiobook_selector, m4b_file_selector]
+    )
+    # Progress monitoring with file-based approach
+    def get_current_stats():
+        """Get current progress statistics by monitoring output files"""
+        try:
+            if conversion_state['running']:
+                # Look for generated audio chunks to estimate progress
+                book_name = conversion_state.get('current_book', 'unknown')
+                audiobook_root = Path("Audiobook") / book_name / "TTS" / "audio_chunks"
+                if audiobook_root.exists():
+                    chunk_files = list(audiobook_root.glob("chunk_*.wav"))
+                    current_chunks = len(chunk_files)
+                    # Try to estimate total from JSON if available
+                    json_path = Path("Text_Input") / f"{book_name}_chunks.json"
+                    total_chunks = 0
+                    if json_path.exists():
+                        import json
+                        with open(json_path, 'r') as f:
+                            data = json.load(f)
+                            total_chunks = len(data)
+                    if total_chunks > 0:
+                        progress = int((current_chunks / total_chunks) * 100)
+                        conversion_state['progress'] = progress
+                        conversion_state['current_chunk'] = f"{current_chunks}/{total_chunks}"
+                        return (
+                            conversion_state.get('realtime_factor', '--'),
+                            conversion_state.get('vram_usage', '-- GB'),
+                            f"{current_chunks}/{total_chunks}",
+                            progress
+                        )
+            return (
+                conversion_state.get('realtime_factor', '--'),
+                conversion_state.get('vram_usage', '-- GB'),
+                conversion_state.get('current_chunk', '--'),
+                conversion_state.get('progress', 0)
+            )
+        except Exception as e:
+            print(f"Error getting stats: {e}")
+            return "--", "-- GB", "--", conversion_state.get('progress', 0)
+    def auto_check_completion():
+        """Automatically check for completion and refresh interface"""
+        # First get current stats
+        stats = get_current_stats()
+        # Check if conversion just completed and needs auto-refresh
+        if (not conversion_state['running'] and
+            conversion_state['progress'] == 100 and
+            conversion_state.get('auto_refresh_needed', False)):
+            # Clear the auto-refresh flag
+            conversion_state['auto_refresh_needed'] = False
+            print("🎉 Auto-detected completion! Refreshing interface...")
+            # Get completion results
+            status, progress, audio, audiobook_choices, m4b_choices = get_status_and_results()
+            # Return combined stats + completion results
+            return (
+                stats[0],  # realtime_factor
+                stats[1],  # vram_usage
+                stats[2],  # current_chunk
+                100,       # progress (completed)
+                status,    # completion status
+                audio,     # audio player
+                audiobook_choices,  # audiobook dropdown
+                m4b_choices        # m4b dropdown
+            )
+        else:
+            # Return stats + current status (no completion)
+            return (
+                stats[0],  # realtime_factor
+                stats[1],  # vram_usage
+                stats[2],  # current_chunk
+                stats[3],  # progress
+                conversion_state.get('status', '⏸ Ready'),  # current status
+                gr.update(),  # no audio update
+                gr.update(),  # no audiobook update
+                gr.update()   # no m4b update
+            )
+    def get_status_and_results():
+        """Get conversion status and results after completion"""
+        if not conversion_state['running'] and conversion_state['progress'] == 100:
+            # Conversion completed, update dropdowns
+            audiobook_choices, m4b_choices = update_audiobook_dropdowns_after_conversion()
+            latest_audiobook = None
+            if audiobook_choices['choices']:
+                latest_audiobook = load_selected_audiobook(audiobook_choices['choices'][0])
+            return (
+                conversion_state['status'],
+                conversion_state['progress'],
+                latest_audiobook,
+                audiobook_choices,
+                m4b_choices
+            )
+        else:
+            return (
+                conversion_state['status'],
+                conversion_state['progress'],
+                None,
+                gr.update(),
+                gr.update()
+            )
+    # Create refresh buttons
+    with gr.Row():
+        refresh_stats_btn = gr.Button("🔄 Refresh Stats", size="sm", variant="secondary")
+        check_completion_btn = gr.Button("📋 Check Completion", size="sm", variant="secondary")
+    # Auto-refresh timer (checks every 5 seconds during conversion)
+    auto_timer = gr.Timer(5.0)  # 5 second interval
+    refresh_stats_btn.click(
+        auto_check_completion,
+        outputs=[realtime_factor, vram_usage, current_chunk, progress_display, status_display, audio_player, audiobook_selector, m4b_file_selector]
+    )
+    check_completion_btn.click(
+        get_status_and_results,
+        outputs=[status_display, progress_display, audio_player, audiobook_selector, m4b_file_selector]
+    )
+    # Auto-timer for progress monitoring and completion detection
+    auto_timer.tick(
+        auto_check_completion,
+        outputs=[realtime_factor, vram_usage, current_chunk, progress_display, status_display, audio_player, audiobook_selector, m4b_file_selector]
+    )
+    return {
+        'convert_button': convert_btn,
+        'status_display': status_display,
+        'progress': progress_display
+    }
+if __name__ == "__main__":
+    # Test the tab
+    with gr.Blocks() as demo:
+        create_convert_book_tab()
+    demo.launch()

gradio_tabs/tab2_configuration.py CHANGED Viewed

@@ -11,15 +11,23 @@ import json
 from pathlib import Path
 from typing import Dict, Any, Tuple, List
-# Import configuration
 try:
-    from config.config import *
-    CONFIG_AVAILABLE = True
-    print("✅ Config module loaded successfully")
 except ImportError as e:
-    print(f"⚠️  Config not available: {e}")
     CONFIG_AVAILABLE = False
-    # Default values if config not available
     MAX_WORKERS = 2
     BATCH_SIZE = 100
     MIN_CHUNK_WORDS = 5
@@ -391,11 +399,19 @@ def create_configuration_tab():
                 'CHUNK_END_SILENCE_MS': int(values[30]) if values[29] else 0
             }
-            # Import the config module and update values
-            from config import config
-            for key, value in config_values.items():
-                if hasattr(config, key):
-                    setattr(config, key, value)
             return "✅ Configuration saved successfully!\n🔄 Settings updated in memory. Restart application to persist changes."
@@ -450,44 +466,51 @@ def create_configuration_tab():
             if not CONFIG_AVAILABLE:
                 return "❌ Configuration module not available"
-            # Reload config module
             import importlib
-            from config import config
-            importlib.reload(config)
             # Return reloaded values
             return (
-                config.MAX_WORKERS,
-                config.BATCH_SIZE,
-                config.MIN_CHUNK_WORDS,
-                config.MAX_CHUNK_WORDS,
-                config.ENABLE_NORMALIZATION,
-                config.TARGET_LUFS,
-                config.ENABLE_AUDIO_TRIMMING,
-                config.SPEECH_ENDPOINT_THRESHOLD,
-                config.TRIMMING_BUFFER_MS,
-                config.TTS_PARAM_MIN_EXAGGERATION,
-                config.TTS_PARAM_MAX_EXAGGERATION,
-                config.TTS_PARAM_MIN_CFG_WEIGHT,
-                config.TTS_PARAM_MAX_CFG_WEIGHT,
-                config.TTS_PARAM_MIN_TEMPERATURE,
-                config.TTS_PARAM_MAX_TEMPERATURE,
-                config.DEFAULT_EXAGGERATION,
-                config.DEFAULT_CFG_WEIGHT,
-                config.DEFAULT_TEMPERATURE,
-                config.VADER_EXAGGERATION_SENSITIVITY,
-                config.VADER_CFG_WEIGHT_SENSITIVITY,
-                config.VADER_TEMPERATURE_SENSITIVITY,
-                config.SILENCE_CHAPTER_START,
-                config.SILENCE_CHAPTER_END,
-                config.SILENCE_SECTION_BREAK,
-                config.SILENCE_PARAGRAPH_END,
-                config.SILENCE_COMMA,
-                config.SILENCE_PERIOD,
-                config.SILENCE_QUESTION_MARK,
-                config.SILENCE_EXCLAMATION,
-                config.CHUNK_END_SILENCE_MS > 0,
-                config.CHUNK_END_SILENCE_MS,
                 "✅ Configuration reloaded from file"
             )

 from pathlib import Path
 from typing import Dict, Any, Tuple, List
+# Import configuration with HuggingFace deployment compatibility
 try:
+    from .gradio_imports import safe_import_config, get_default_config
+    config_vars, CONFIG_AVAILABLE = safe_import_config()
+    if CONFIG_AVAILABLE:
+        print("✅ Config module loaded successfully")
+        # Update local variables with config values
+        locals().update(config_vars)
+    else:
+        print("⚠️  Config not available - using defaults")
+        # Get default values
+        default_config = get_default_config()
+        locals().update(default_config)
 except ImportError as e:
+    print(f"⚠️  Import system not available: {e} - using fallback defaults")
     CONFIG_AVAILABLE = False
+    # Fallback default values if gradio_imports not available
     MAX_WORKERS = 2
     BATCH_SIZE = 100
     MIN_CHUNK_WORDS = 5
                 'CHUNK_END_SILENCE_MS': int(values[30]) if values[29] else 0
             }
+            # Import the config module and update values using safe import
+            try:
+                from .gradio_imports import safe_import
+                config_module = safe_import('config', 'config')
+                for key, value in config_values.items():
+                    if hasattr(config_module, key):
+                        setattr(config_module, key, value)
+            except ImportError:
+                # Fallback to direct import
+                from config import config
+                for key, value in config_values.items():
+                    if hasattr(config, key):
+                        setattr(config, key, value)
             return "✅ Configuration saved successfully!\n🔄 Settings updated in memory. Restart application to persist changes."
             if not CONFIG_AVAILABLE:
                 return "❌ Configuration module not available"
+            # Reload config module using safe import
             import importlib
+            try:
+                from .gradio_imports import safe_import
+                config_module = safe_import('config', 'config')
+                importlib.reload(config_module)
+            except ImportError:
+                # Fallback to direct import
+                from config import config
+                config_module = config
+                importlib.reload(config)
             # Return reloaded values
             return (
+                config_module.MAX_WORKERS,
+                config_module.BATCH_SIZE,
+                config_module.MIN_CHUNK_WORDS,
+                config_module.MAX_CHUNK_WORDS,
+                config_module.ENABLE_NORMALIZATION,
+                config_module.TARGET_LUFS,
+                config_module.ENABLE_AUDIO_TRIMMING,
+                config_module.SPEECH_ENDPOINT_THRESHOLD,
+                config_module.TRIMMING_BUFFER_MS,
+                config_module.TTS_PARAM_MIN_EXAGGERATION,
+                config_module.TTS_PARAM_MAX_EXAGGERATION,
+                config_module.TTS_PARAM_MIN_CFG_WEIGHT,
+                config_module.TTS_PARAM_MAX_CFG_WEIGHT,
+                config_module.TTS_PARAM_MIN_TEMPERATURE,
+                config_module.TTS_PARAM_MAX_TEMPERATURE,
+                config_module.DEFAULT_EXAGGERATION,
+                config_module.DEFAULT_CFG_WEIGHT,
+                config_module.DEFAULT_TEMPERATURE,
+                config_module.VADER_EXAGGERATION_SENSITIVITY,
+                config_module.VADER_CFG_WEIGHT_SENSITIVITY,
+                config_module.VADER_TEMPERATURE_SENSITIVITY,
+                config_module.SILENCE_CHAPTER_START,
+                config_module.SILENCE_CHAPTER_END,
+                config_module.SILENCE_SECTION_BREAK,
+                config_module.SILENCE_PARAGRAPH_END,
+                config_module.SILENCE_COMMA,
+                config_module.SILENCE_PERIOD,
+                config_module.SILENCE_QUESTION_MARK,
+                config_module.SILENCE_EXCLAMATION,
+                config_module.CHUNK_END_SILENCE_MS > 0,
+                config_module.CHUNK_END_SILENCE_MS,
                 "✅ Configuration reloaded from file"
             )

gradio_tabs/tab_diagnostics.py ADDED Viewed

	@@ -0,0 +1,558 @@

+#!/usr/bin/env python3
+"""
+Gradio Diagnostics Tab
+Run parallel processing diagnostics through web interface
+"""
+import gradio as gr
+import time
+import threading
+import multiprocessing
+import concurrent.futures
+import os
+import sys
+import torch
+from pathlib import Path
+import io
+from contextlib import redirect_stdout
+# Try to import psutil, fallback if not available
+try:
+    import psutil
+    PSUTIL_AVAILABLE = True
+except ImportError:
+    PSUTIL_AVAILABLE = False
+class DiagnosticRunner:
+    def __init__(self):
+        self.running = False
+    def test_basic_multiprocessing(self):
+        """Test 1: Basic multiprocessing capability"""
+        output = []
+        output.append("=== TEST 1: Basic Multiprocessing ===")
+        def simple_task(n):
+            return n * n
+        try:
+            # Sequential
+            start = time.time()
+            results_seq = [simple_task(i) for i in range(100)]
+            seq_time = time.time() - start
+            output.append(f"Sequential: {seq_time:.3f}s")
+            # Parallel
+            start = time.time()
+            with multiprocessing.Pool(processes=4) as pool:
+                results_par = pool.map(simple_task, range(100))
+            par_time = time.time() - start
+            output.append(f"Parallel (4 workers): {par_time:.3f}s")
+            output.append(f"Speedup: {seq_time/par_time:.2f}x")
+        except Exception as e:
+            output.append(f"ERROR: {e}")
+        output.append("")
+        return "\n".join(output)
+    def test_thread_vs_process(self):
+        """Test 2: Threading vs Processing"""
+        output = []
+        output.append("=== TEST 2: Threading vs Processing ===")
+        def cpu_task(n):
+            total = 0
+            for i in range(n * 1000):
+                total += i * i
+            return total
+        try:
+            tasks = [1000] * 8
+            # Sequential
+            start = time.time()
+            seq_results = [cpu_task(t) for t in tasks]
+            seq_time = time.time() - start
+            output.append(f"Sequential: {seq_time:.3f}s")
+            # Threading
+            start = time.time()
+            with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+                thread_results = list(executor.map(cpu_task, tasks))
+            thread_time = time.time() - start
+            output.append(f"ThreadPool: {thread_time:.3f}s, speedup: {seq_time/thread_time:.2f}x")
+            # Processing
+            start = time.time()
+            with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
+                process_results = list(executor.map(cpu_task, tasks))
+            process_time = time.time() - start
+            output.append(f"ProcessPool: {process_time:.3f}s, speedup: {seq_time/process_time:.2f}x")
+        except Exception as e:
+            output.append(f"ERROR: {e}")
+        output.append("")
+        return "\n".join(output)
+    def test_gpu_access(self):
+        """Test 3: GPU sharing capability"""
+        output = []
+        output.append("=== TEST 3: GPU Access ===")
+        if not torch.cuda.is_available():
+            output.append("No CUDA available - skipping GPU test")
+            output.append("")
+            return "\n".join(output)
+        def gpu_task(worker_id):
+            try:
+                device = torch.device("cuda")
+                x = torch.randn(1000, 1000, device=device)
+                y = torch.randn(1000, 1000, device=device)
+                start = time.time()
+                for _ in range(10):
+                    z = torch.mm(x, y)
+                duration = time.time() - start
+                return f"Worker {worker_id}: {duration:.3f}s"
+            except Exception as e:
+                return f"Worker {worker_id}: ERROR - {e}"
+        try:
+            # Sequential GPU access
+            start = time.time()
+            seq_results = [gpu_task(i) for i in range(4)]
+            seq_time = time.time() - start
+            output.append("Sequential GPU:")
+            for result in seq_results:
+                output.append(f"  {result}")
+            output.append(f"Total sequential time: {seq_time:.3f}s")
+            # Parallel GPU access
+            start = time.time()
+            with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+                par_results = list(executor.map(gpu_task, range(4)))
+            par_time = time.time() - start
+            output.append("Parallel GPU:")
+            for result in par_results:
+                output.append(f"  {result}")
+            output.append(f"Total parallel time: {par_time:.3f}s")
+        except Exception as e:
+            output.append(f"ERROR: {e}")
+        output.append("")
+        return "\n".join(output)
+    def test_model_loading(self):
+        """Test 4: Model loading overhead"""
+        output = []
+        output.append("=== TEST 4: Model Loading Simulation ===")
+        def load_model():
+            time.sleep(0.5)  # 500ms loading time
+            return {"model": "loaded", "size": "large"}
+        def task_with_model_loading(worker_id):
+            start = time.time()
+            model = load_model()
+            processing_time = 0.1
+            time.sleep(processing_time)
+            total_time = time.time() - start
+            return f"Worker {worker_id}: {total_time:.3f}s"
+        try:
+            output.append("Each worker loads model:")
+            start = time.time()
+            with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+                results = list(executor.map(task_with_model_loading, range(4)))
+            total_time = time.time() - start
+            for result in results:
+                output.append(f"  {result}")
+            output.append(f"Total time with per-worker loading: {total_time:.3f}s")
+            shared_load_time = 0.5
+            processing_time = 0.1 * 4
+            simulated_shared_time = shared_load_time + processing_time
+            output.append(f"Simulated shared model time: {simulated_shared_time:.3f}s")
+            output.append(f"Overhead from per-worker loading: {total_time - simulated_shared_time:.3f}s")
+        except Exception as e:
+            output.append(f"ERROR: {e}")
+        output.append("")
+        return "\n".join(output)
+    def test_environment_info(self):
+        """Test 5: Environment information"""
+        output = []
+        output.append("=== TEST 5: Environment Info ===")
+        try:
+            output.append(f"Python version: {sys.version}")
+            output.append(f"Platform: {sys.platform}")
+            output.append(f"CPU cores: {multiprocessing.cpu_count()}")
+            if PSUTIL_AVAILABLE:
+                output.append(f"CPU usage: {psutil.cpu_percent()}%")
+                output.append(f"Memory: {psutil.virtual_memory().percent}% used")
+            else:
+                output.append("psutil not available - limited system info")
+            if torch.cuda.is_available():
+                output.append(f"CUDA available: Yes")
+                output.append(f"CUDA devices: {torch.cuda.device_count()}")
+                output.append(f"Current device: {torch.cuda.current_device()}")
+                output.append(f"Device name: {torch.cuda.get_device_name()}")
+                if hasattr(torch.cuda, 'memory_summary'):
+                    output.append("GPU Memory:")
+                    output.append(torch.cuda.memory_summary(abbreviated=True))
+            else:
+                output.append("CUDA available: No")
+            mp_vars = [
+                'OMP_NUM_THREADS', 'MKL_NUM_THREADS', 'OPENBLAS_NUM_THREADS',
+                'VECLIB_MAXIMUM_THREADS', 'NUMEXPR_NUM_THREADS'
+            ]
+            output.append("Threading environment variables:")
+            for var in mp_vars:
+                value = os.environ.get(var, 'Not set')
+                output.append(f"  {var}: {value}")
+        except Exception as e:
+            output.append(f"ERROR: {e}")
+        output.append("")
+        return "\n".join(output)
+    def test_worker_creation(self):
+        """Test 6: Worker creation monitoring"""
+        output = []
+        output.append("=== TEST 6: Worker Creation ===")
+        def monitored_task(worker_id):
+            pid = os.getpid()
+            tid = threading.get_ident()
+            return f"Worker {worker_id}: PID={pid}, TID={tid}"
+        try:
+            output.append("Main process:")
+            output.append(f"  PID: {os.getpid()}")
+            output.append(f"  TID: {threading.get_ident()}")
+            output.append("ThreadPoolExecutor workers:")
+            with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+                results = list(executor.map(monitored_task, range(4)))
+            for result in results:
+                output.append(f"  {result}")
+            output.append("ProcessPoolExecutor workers:")
+            with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
+                results = list(executor.map(monitored_task, range(4)))
+            for result in results:
+                output.append(f"  {result}")
+        except Exception as e:
+            output.append(f"ERROR: {e}")
+        output.append("")
+        return "\n".join(output)
+    def test_tts_model_performance(self):
+        """Test 7: Real TTS model performance"""
+        output = []
+        output.append("=== TEST 7: TTS Model Performance ===")
+        try:
+            # Import TTS components
+            sys.path.append(str(Path(__file__).parent.parent))
+            from modules.tts_engine import load_optimized_model, detect_deployment_environment
+            # Detect environment
+            env = detect_deployment_environment()
+            output.append(f"🌐 Environment: {env}")
+            # Test 1: Model loading time
+            output.append("\n--- MODEL LOADING TEST ---")
+            device = "cuda" if torch.cuda.is_available() else "cpu"
+            output.append(f"🚀 Loading model on {device}...")
+            start_time = time.time()
+            model = load_optimized_model(device)
+            load_time = time.time() - start_time
+            output.append(f"⏱️ Model load time: {load_time:.2f}s")
+            # Test 2: Single inference timing
+            output.append("\n--- SINGLE INFERENCE TEST ---")
+            test_text = "Hello world, this is a test."
+            # Warmup run
+            try:
+                with torch.no_grad():
+                    _ = model.generate(test_text, exaggeration=0.5, cfg_weight=0.5, temperature=0.7)
+                if torch.cuda.is_available():
+                    torch.cuda.synchronize()
+                output.append("✅ Warmup completed")
+            except Exception as e:
+                output.append(f"⚠️ Warmup failed: {e}")
+            # Timed run
+            start_time = time.time()
+            try:
+                with torch.no_grad():
+                    audio = model.generate(test_text, exaggeration=0.5, cfg_weight=0.5, temperature=0.7)
+                if torch.cuda.is_available():
+                    torch.cuda.synchronize()
+                inference_time = time.time() - start_time
+                # Calculate realtime factor
+                if hasattr(audio, 'shape'):
+                    sample_rate = getattr(model, 'sr', 24000)
+                    audio_duration = audio.shape[-1] / sample_rate
+                    realtime_factor = audio_duration / inference_time if inference_time > 0 else 0
+                    output.append(f"⏱️ Inference time: {inference_time:.3f}s")
+                    output.append(f"🎵 Audio duration: {audio_duration:.3f}s")
+                    output.append(f"🚀 Realtime factor: {realtime_factor:.2f}x")
+                    # Check if this matches your slow performance
+                    if realtime_factor < 0.5:
+                        output.append("⚠️ WARNING: Very slow realtime factor!")
+                        output.append("   This matches your reported slow performance")
+                    elif realtime_factor > 1.0:
+                        output.append("✅ Good realtime factor - issue may be elsewhere")
+                else:
+                    output.append(f"⏱️ Inference time: {inference_time:.3f}s")
+                    output.append("⚠️ Could not determine audio duration")
+            except Exception as e:
+                output.append(f"❌ Inference failed: {e}")
+            # Test 3: Multiple sequential runs (simulating current problem)
+            output.append("\n--- SEQUENTIAL PROCESSING TEST ---")
+            sequential_times = []
+            for i in range(3):
+                start_time = time.time()
+                try:
+                    with torch.no_grad():
+                        _ = model.generate(f"Test run number {i+1}.", exaggeration=0.5, cfg_weight=0.5, temperature=0.7)
+                    if torch.cuda.is_available():
+                        torch.cuda.synchronize()
+                    run_time = time.time() - start_time
+                    sequential_times.append(run_time)
+                    output.append(f"  Run {i+1}: {run_time:.3f}s")
+                except Exception as e:
+                    output.append(f"  Run {i+1} failed: {e}")
+            if sequential_times:
+                avg_time = sum(sequential_times) / len(sequential_times)
+                output.append(f"📊 Average sequential time: {avg_time:.3f}s")
+                # Check consistency
+                if max(sequential_times) - min(sequential_times) > 0.5:
+                    output.append("⚠️ High variance in processing times - possible memory issues")
+            # Test 4: Threading test with actual model
+            output.append("\n--- THREADING WITH TTS MODEL TEST ---")
+            try:
+                def tts_worker(text_idx):
+                    try:
+                        start = time.time()
+                        with torch.no_grad():
+                            _ = model.generate(f"Threading test {text_idx}.",
+                                             exaggeration=0.5, cfg_weight=0.5, temperature=0.7)
+                        if torch.cuda.is_available():
+                            torch.cuda.synchronize()
+                        return time.time() - start
+                    except Exception as e:
+                        return f"Error: {e}"
+                # Test with 2 workers (like current setup)
+                start_time = time.time()
+                with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
+                    futures = [executor.submit(tts_worker, i) for i in range(4)]
+                    thread_results = [f.result() for f in futures]
+                total_thread_time = time.time() - start_time
+                output.append(f"⏱️ Threading (2 workers, 4 tasks): {total_thread_time:.3f}s")
+                successful_times = [r for r in thread_results if isinstance(r, float)]
+                if successful_times:
+                    output.append(f"📊 Successful tasks: {len(successful_times)}/4")
+                    output.append(f"📊 Average task time: {sum(successful_times)/len(successful_times):.3f}s")
+                    # Compare with sequential
+                    if sequential_times:
+                        expected_sequential = avg_time * 4
+                        speedup = expected_sequential / total_thread_time
+                        output.append(f"📊 Threading speedup: {speedup:.2f}x")
+                        if speedup < 1.2:
+                            output.append("⚠️ Threading provides minimal speedup")
+                            output.append("   This explains your slow HuggingFace performance!")
+                        else:
+                            output.append("✅ Threading working well")
+                else:
+                    output.append("❌ All threading tasks failed")
+                    for i, result in enumerate(thread_results):
+                        output.append(f"  Task {i+1}: {result}")
+            except Exception as e:
+                output.append(f"❌ Threading test failed: {e}")
+            # Test 5: Model reloading overhead
+            output.append("\n--- MODEL RELOADING TEST ---")
+            try:
+                # Simulate what might be happening in your slow processing
+                reload_times = []
+                for i in range(3):
+                    # Delete and reload model
+                    del model
+                    if torch.cuda.is_available():
+                        torch.cuda.empty_cache()
+                    start_time = time.time()
+                    model = load_optimized_model(device)
+                    # Single inference after reload
+                    with torch.no_grad():
+                        _ = model.generate("Reload test.", exaggeration=0.5, cfg_weight=0.5, temperature=0.7)
+                    if torch.cuda.is_available():
+                        torch.cuda.synchronize()
+                    reload_time = time.time() - start_time
+                    reload_times.append(reload_time)
+                    output.append(f"  Reload + inference {i+1}: {reload_time:.3f}s")
+                avg_reload_time = sum(reload_times) / len(reload_times)
+                output.append(f"📊 Average reload + inference: {avg_reload_time:.3f}s")
+                if sequential_times and avg_reload_time > avg_time * 2:
+                    output.append("⚠️ Model reloading adds significant overhead")
+                    output.append("   Workers may be reloading models per chunk!")
+            except Exception as e:
+                output.append(f"❌ Model reloading test failed: {e}")
+            # Cleanup
+            try:
+                del model
+                if torch.cuda.is_available():
+                    torch.cuda.empty_cache()
+                output.append("\n✅ Model cleanup completed")
+            except:
+                pass
+        except Exception as e:
+            output.append(f"❌ TTS performance test failed: {e}")
+            import traceback
+            output.append(f"Traceback: {traceback.format_exc()}")
+        output.append("")
+        return "\n".join(output)
+    def run_all_diagnostics(self, progress=gr.Progress()):
+        """Run all diagnostic tests"""
+        if self.running:
+            return "Diagnostics already running..."
+        self.running = True
+        try:
+            results = []
+            results.append("🔍 Parallel Processing Diagnostic Tool")
+            results.append("=" * 50)
+            results.append("")
+            # Run each test with progress updates
+            progress(0.1, desc="Environment Info...")
+            results.append(self.test_environment_info())
+            progress(0.2, desc="Basic Multiprocessing...")
+            results.append(self.test_basic_multiprocessing())
+            progress(0.4, desc="Thread vs Process...")
+            results.append(self.test_thread_vs_process())
+            progress(0.6, desc="GPU Access...")
+            results.append(self.test_gpu_access())
+            progress(0.8, desc="Model Loading...")
+            results.append(self.test_model_loading())
+            progress(0.85, desc="Worker Creation...")
+            results.append(self.test_worker_creation())
+            progress(0.95, desc="TTS Model Performance...")
+            results.append(self.test_tts_model_performance())
+            progress(1.0, desc="Complete!")
+            results.append("🏁 Diagnostic complete!")
+            results.append("")
+            results.append("ANALYSIS:")
+            results.append("- If basic multiprocessing is slow: Environment blocks parallelism")
+            results.append("- If threading faster than processing: Use ThreadPoolExecutor")
+            results.append("- If GPU parallel time >> sequential: GPU contention issue")
+            results.append("- If model loading overhead high: Need model sharing strategy")
+            results.append("- If same PID for all workers: Using threads, not processes")
+            results.append("- If TTS realtime factor < 0.5x: Severe performance bottleneck")
+            results.append("- If model reloading overhead high: Workers reloading models per chunk")
+            return "\n".join(results)
+        finally:
+            self.running = False
+# Create global diagnostic runner
+diagnostic_runner = DiagnosticRunner()
+def create_diagnostics_tab():
+    """Create the diagnostics tab interface"""
+    with gr.Column():
+        gr.Markdown("# 🔍 System Diagnostics")
+        gr.Markdown("*Test parallel processing capabilities and identify performance bottlenecks*")
+        with gr.Row():
+            run_diagnostics_btn = gr.Button("🚀 Run Full Diagnostics", variant="primary", size="lg")
+            tts_diagnostics_btn = gr.Button("🎤 TTS Performance Test", variant="secondary", size="lg")
+        with gr.Row():
+            diagnostic_output = gr.Textbox(
+                label="Diagnostic Results",
+                lines=30,
+                max_lines=50,
+                interactive=False,
+                show_copy_button=True
+            )
+        # Button click handlers
+        run_diagnostics_btn.click(
+            diagnostic_runner.run_all_diagnostics,
+            outputs=[diagnostic_output]
+        )
+        tts_diagnostics_btn.click(
+            diagnostic_runner.test_tts_model_performance,
+            outputs=[diagnostic_output]
+        )
+        # Instructions
+        with gr.Accordion("📋 How to Interpret Results", open=False):
+            gr.Markdown("""
+            **Key Metrics to Look For:**
+            1. **Basic Multiprocessing Speedup**: Should be > 2x with 4 workers
+            2. **ThreadPool vs ProcessPool**: Which is faster indicates best approach
+            3. **GPU Sequential vs Parallel**: Large difference indicates contention
+            4. **Model Loading Overhead**: High overhead means workers reload models
+            5. **Worker PIDs**: Same PID = threads, different PID = processes
+            **Common Issues:**
+            - **No speedup**: Environment blocks multiprocessing
+            - **GPU parallel slower**: GPU memory contention
+            - **High model loading overhead**: Need shared model architecture
+            - **Threading faster than processing**: Use ThreadPoolExecutor for TTS
+            """)
+    return {}

hold/chatterbox (copy).tar.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:311c3484631f207ce85dbbd14b41b04453783279315c5e5a669a168710d8b934
+size 7728541

modules/tts_engine.py CHANGED Viewed

@@ -1,45 +1,6 @@
 """
-ChatterboxTTS Engine Module - Core TTS Processing System
-=======================================================
-OVERVIEW:
-This is the heart of the ChatterboxTTS system, responsible for loading TTS models,
-processing audio chunks, and managing the complete text-to-speech pipeline.
-It handles voice embedding caching, memory optimization, and parallel processing
-for efficient audiobook generation.
-MAIN COMPONENTS:
-1. MODEL MANAGEMENT: Loading, caching, and optimizing ChatterboxTTS models
-2. VOICE PROCESSING: Voice sample analysis and embedding caching
-3. CHUNK PROCESSING: Individual text chunk → audio conversion
-4. MEMORY OPTIMIZATION: VRAM management and garbage collection
-5. PARALLEL PROCESSING: Multi-threaded chunk processing with producer-consumer pattern
-6. PERFORMANCE MONITORING: Real-time progress tracking and ETA calculations
-CRITICAL PERFORMANCE FEATURES:
-- Voice embedding caching (5-10% speed improvement)
-- GPU persistence mode for faster model loading
-- In-memory processing pipeline (eliminates temp files)
-- Producer-consumer threading for parallel processing
-- Automatic memory management and VRAM monitoring
-- Model reinitialization every 500 chunks for stability
-WORKFLOW:
-Text Chunks → Voice Embedding → TTS Processing → Audio Generation →
-Quality Validation → Silence Insertion → Final WAV Output
-TECHNICAL DETAILS:
-- Supports ChatterboxTTS models with custom voice cloning
-- Handles variable TTS parameters (temperature, CFG, exaggeration)
-- Implements VADER sentiment-driven parameter adjustment
-- Memory-safe processing with configurable VRAM thresholds
-- Automatic fallback for CUDA memory issues
-USAGE CONTEXTS:
-- Called by main processing scripts (GenTTS_Claude.py)
-- Used by JSON generation utilities
-- Integrated with chunk repair tools
-- Supports both GUI and CLI interfaces
 """
 import torch
@@ -48,16 +9,11 @@ import time
 import logging
 import shutil
 import sys
-import os
-import subprocess
-import psutil
 import numpy as np
 from datetime import timedelta
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from pathlib import Path
 import torchaudio as ta
-import queue
-import threading
 from config.config import *
 from modules.text_processor import smart_punctuate, sentence_chunk_text, detect_content_boundaries
@@ -98,6 +54,16 @@ from modules.file_manager import (
 )
 from modules.progress_tracker import setup_logging, log_chunk_progress, log_run
 # ============================================================================
 # MEMORY AND MODEL MANAGEMENT
 # ============================================================================
@@ -126,186 +92,11 @@ def monitor_vram_usage(operation_name=""):
         if allocated > VRAM_SAFETY_THRESHOLD:
             logging.warning(f"⚠️ High VRAM usage during {operation_name}: {allocated:.1f}GB allocated, {reserved:.1f}GB reserved")
-            optimize_cuda_memory_usage()
         return allocated, reserved
     return 0, 0
-# ============================================================================
-# PERFORMANCE OPTIMIZATION UTILITIES
-# ============================================================================
-def detect_deployment_environment():
-    """Detect deployment environment for optimization adaptation"""
-    if os.getenv("RUNPOD_POD_ID"):
-        return "runpod"
-    elif os.getenv("SPACE_ID"):  # Hugging Face Spaces
-        return "huggingface"
-    elif os.path.exists("/.dockerenv"):
-        return "container"
-    elif torch.cuda.is_available():
-        return "local_gpu"
-    else:
-        return "local_cpu"
-def get_available_memory():
-    """Get available system memory in MB"""
-    try:
-        memory = psutil.virtual_memory()
-        return memory.available // (1024 * 1024)
-    except:
-        return 8192  # Safe default of 8GB
-def has_nvidia_smi():
-    """Check if nvidia-smi is available"""
-    try:
-        subprocess.run(['nvidia-smi', '--version'], capture_output=True, check=True)
-        return True
-    except:
-        return False
-def enable_gpu_persistence_mode():
-    """Enable GPU persistence mode with proper fallbacks"""
-    if not ENABLE_GPU_PERSISTENCE_MODE:
-        return False
-    try:
-        if torch.cuda.is_available() and has_nvidia_smi():
-            for attempt in range(GPU_PERSISTENCE_RETRY_COUNT):
-                result = subprocess.run(['nvidia-smi', '-pm', '1'],
-                                     capture_output=True, text=True)
-                if result.returncode == 0:
-                    logging.info("✅ GPU persistence mode enabled")
-                    return True
-                elif "Insufficient permissions" in result.stderr:
-                    logging.warning("⚠️ GPU persistence mode failed (insufficient privileges)")
-                    break
-                time.sleep(0.5)  # Brief delay between attempts
-            logging.warning("📝 Continuing with standard GPU power management")
-        else:
-            logging.info("ℹ️ GPU persistence mode not applicable (no NVIDIA GPU detected)")
-    except Exception as e:
-        logging.warning(f"⚠️ GPU persistence mode failed: {e}")
-    return False
-def setup_cuda_memory_pool():
-    """Configure CUDA memory pool for enhanced performance and reduced fragmentation"""
-    if not ENABLE_CUDA_MEMORY_POOL or not torch.cuda.is_available():
-        return False
-    try:
-        # Get current device and memory info
-        device = torch.cuda.current_device()
-        total_memory = torch.cuda.get_device_properties(device).total_memory
-        total_memory_gb = total_memory / (1024**3)
-        deployment_env = detect_deployment_environment()
-        # Adaptive pool sizing based on environment and available memory
-        if ENABLE_ADAPTIVE_MEMORY_POOL:
-            if deployment_env == "runpod":
-                pool_fraction = min(CUDA_MEMORY_POOL_FRACTION, 0.85)  # More conservative on RunPod
-            elif deployment_env == "huggingface":
-                pool_fraction = min(CUDA_MEMORY_POOL_FRACTION, 0.75)  # Very conservative on HF Spaces
-            elif total_memory_gb < 8:
-                pool_fraction = min(CUDA_MEMORY_POOL_FRACTION, 0.8)   # Conservative for <8GB GPUs
-            else:
-                pool_fraction = CUDA_MEMORY_POOL_FRACTION  # Use full config for high-memory GPUs
-        else:
-            pool_fraction = CUDA_MEMORY_POOL_FRACTION
-        # Calculate pool size
-        pool_size = int(total_memory * pool_fraction)
-        pool_size_gb = pool_size / (1024**3)
-        # Configure memory pool allocator settings
-        # Set memory pool to reduce fragmentation and improve allocation speed
-        if hasattr(torch.cuda, 'memory') and hasattr(torch.cuda.memory, 'set_per_process_memory_fraction'):
-            torch.cuda.memory.set_per_process_memory_fraction(pool_fraction, device)
-            logging.info(f"✅ CUDA memory pool configured: {pool_size_gb:.1f}GB ({pool_fraction*100:.0f}% of {total_memory_gb:.1f}GB)")
-        # Configure allocator settings for better memory management
-        if hasattr(torch.cuda, 'empty_cache'):
-            # Clear any existing allocations before setting up pool
-            torch.cuda.empty_cache()
-        # Enable memory pool optimizations if available in PyTorch version
-        try:
-            # Try to enable expandable segments for better memory utilization
-            os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
-            logging.info("✅ CUDA expandable segments enabled")
-        except:
-            pass  # Not available in all PyTorch versions
-        # Warm up the memory pool with a small allocation
-        try:
-            warmup_tensor = torch.zeros(1024, 1024, device=device)
-            del warmup_tensor
-            torch.cuda.empty_cache()
-            logging.info("✅ CUDA memory pool warmed up")
-        except Exception as e:
-            logging.warning(f"⚠️ Memory pool warmup failed: {e}")
-        logging.info(f"🚀 CUDA memory pool setup complete - environment: {deployment_env}")
-        return True
-    except Exception as e:
-        logging.error(f"❌ CUDA memory pool setup failed: {e}")
-        return False
-def optimize_cuda_memory_usage():
-    """Advanced CUDA memory optimization for better performance"""
-    if not torch.cuda.is_available():
-        return
-    try:
-        # More aggressive cleanup for memory pool systems
-        torch.cuda.empty_cache()
-        # Synchronize to ensure all operations complete before cleanup
-        torch.cuda.synchronize()
-        # Additional memory pool optimization if available
-        if hasattr(torch.cuda, 'reset_peak_memory_stats'):
-            torch.cuda.reset_peak_memory_stats()
-    except Exception as e:
-        logging.warning(f"⚠️ CUDA memory optimization failed: {e}")
-# Global voice embedding cache
-_voice_embedding_cache = {}
-_cache_memory_usage = 0
-def get_voice_cache_key(voice_path, exaggeration):
-    """Generate cache key for voice embeddings"""
-    try:
-        # Use file path and modification time for cache invalidation
-        stat = os.stat(voice_path)
-        return f"{voice_path}:{stat.st_mtime}:{exaggeration}"
-    except:
-        return f"{voice_path}:{exaggeration}"
-def clear_voice_embedding_cache():
-    """Clear voice embedding cache to free memory"""
-    global _voice_embedding_cache, _cache_memory_usage
-    _voice_embedding_cache.clear()
-    _cache_memory_usage = 0
-    if torch.cuda.is_available():
-        torch.cuda.empty_cache()
-    logging.info("🗑️ Voice embedding cache cleared")
-def estimate_cache_memory_mb(conds_object):
-    """Estimate memory usage of cached voice embeddings in MB"""
-    try:
-        if hasattr(conds_object, 't3') and hasattr(conds_object.t3, 'voice_embed'):
-            # Rough estimate based on typical voice embedding sizes
-            return 50  # Typical voice embedding ~50MB
-        return 30  # Conservative estimate
-    except:
-        return 30
 def get_optimal_workers():
     """Dynamic worker allocation based on VRAM usage"""
     if not USE_DYNAMIC_WORKERS:
@@ -401,299 +192,28 @@ def get_best_available_device():
     return "cpu"
 def load_optimized_model(device):
-    """Load TTS model with memory optimizations and device fallback"""
     from src.chatterbox.tts import ChatterboxTTS
-    # Validate device availability
-    original_device = device
-    try:
-        if device == "cuda":
-            # Test CUDA availability with a small operation
-            test_tensor = torch.tensor([1.0]).to("cuda")
-            del test_tensor
-            torch.cuda.empty_cache()
-            logging.info(f"✅ CUDA device validated successfully")
-        elif device == "mps" and torch.backends.mps.is_available():
-            # Test MPS availability
-            test_tensor = torch.tensor([1.0]).to("mps")
-            del test_tensor
-            logging.info(f"✅ MPS device validated successfully")
-    except Exception as e:
-        logging.warning(f"⚠️ Device {device} failed validation: {e}")
-        logging.info("🔄 Falling back to CPU")
-        device = "cpu"
     try:
-        # Load model with validated device (ChatterboxTTS doesn't support torch_dtype parameter)
         model = ChatterboxTTS.from_pretrained(device=device)
-        logging.info(f"✅ Model loaded successfully on {device.upper()}")
-        if original_device != device:
-            logging.info(f"📝 Note: Requested {original_device.upper()} but using {device.upper()} due to availability")
-    except Exception as e:
-        logging.error(f"❌ Failed to load model on {device}: {e}")
-        if device != "cpu":
-            logging.info("🔄 Final fallback to CPU...")
-            device = "cpu"
-            model = ChatterboxTTS.from_pretrained(device=device)
-            logging.info("✅ Model loaded on CPU as final fallback")
-        else:
-            raise RuntimeError(f"Failed to load model even on CPU: {e}")
     # Only apply eval() and benchmark if the model has these attributes
     if hasattr(model, 'eval'):
         model.eval()
-    # Set CUDNN benchmark for performance (if available and using CUDA)
-    if device == "cuda" and torch.backends.cudnn.is_available():
         torch.backends.cudnn.benchmark = True
-        logging.info("✅ CUDNN benchmark enabled for performance")
-    # Initialize CUDA memory pool if enabled and using CUDA
-    if device == "cuda" and ENABLE_CUDA_MEMORY_POOL:
-        memory_pool_success = setup_cuda_memory_pool()
-        if memory_pool_success:
-            logging.info("🚀 CUDA memory pool optimization enabled")
-        else:
-            logging.warning("⚠️ CUDA memory pool setup failed, continuing without optimization")
     return model
-# ============================================================================
-# PRODUCER-CONSUMER PIPELINE (PHASE 4)
-# ============================================================================
-def chunk_producer_thread(all_chunks, chunk_queue, start_index=0, max_queue_size=10):
-    """
-    Producer thread that pre-loads chunks into a queue for worker threads to consume.
-    This eliminates chunk loading overhead during TTS processing.
-    Args:
-        all_chunks: List of chunk data (dict format with text, boundary_type, etc)
-        chunk_queue: Queue to place prepared chunk data
-        start_index: Index to start producing from (for resume functionality)
-        max_queue_size: Maximum queue size to prevent memory overflow
-    """
-    try:
-        logging.info(f"🏭 Producer thread started - pre-loading chunks from index {start_index}")
-        for i, chunk_data in enumerate(all_chunks[start_index:], start=start_index):
-            # Check if we should stop (via sentinel or shutdown)
-            if shutdown_requested:
-                break
-            # Handle both dictionary and tuple formats for backward compatibility
-            if isinstance(chunk_data, dict):
-                chunk_text = chunk_data["text"]
-                boundary_type = chunk_data.get("boundary_type", "none")
-                chunk_tts_params = chunk_data.get("tts_params", None)
-            else:
-                # Handle old tuple format (text, is_para_end)
-                chunk_text = chunk_data[0] if len(chunk_data) > 0 else str(chunk_data)
-                is_old_para_end = chunk_data[1] if len(chunk_data) > 1 else False
-                boundary_type = "paragraph_end" if is_old_para_end else "none"
-                chunk_tts_params = None
-            # Create standardized chunk package for workers
-            chunk_package = {
-                'index': i,
-                'text': chunk_text,
-                'boundary_type': boundary_type,
-                'tts_params': chunk_tts_params
-            }
-            # Put chunk in queue (blocks if queue is full)
-            chunk_queue.put(chunk_package, timeout=30)
-            # Log progress every 50 chunks to avoid spam
-            if (i + 1) % 50 == 0:
-                logging.info(f"📦 Producer queued {i + 1} chunks")
-        logging.info(f"✅ Producer thread completed - {len(all_chunks) - start_index} chunks queued")
-    except Exception as e:
-        logging.error(f"❌ Producer thread failed: {e}")
-    finally:
-        # Signal completion by adding sentinel value
-        try:
-            chunk_queue.put(None, timeout=5)  # None = end of chunks signal
-        except queue.Full:
-            logging.warning("⚠️ Could not add completion signal - queue full")
-def process_chunks_with_pipeline(
-    all_chunks, batch_chunks, chunk_offset, text_chunks_dir, audio_chunks_dir,
-    voice_path, tts_params, start_time, total_chunks, punc_norm, book_name,
-    log_run_func, log_path, device, model, asr_model, asr_enabled, optimal_workers,
-    accumulated_audio_duration=0.0
-):
-    """
-    Enhanced chunk processing with producer-consumer pipeline for 5-10% performance improvement.
-    Args:
-        all_chunks: Complete list of all chunks (for context)
-        batch_chunks: Current batch of chunks to process
-        chunk_offset: Offset for global chunk indexing
-        ... (other parameters same as original ThreadPoolExecutor pattern)
-    Returns:
-        Tuple of (batch_results, total_audio_duration) where:
-        - batch_results: List of (index, wav_path) tuples for successful chunks
-        - total_audio_duration: Total audio duration for batch (for progress tracking)
-    """
-    try:
-        # Create thread-safe queue with size limit to prevent memory overflow
-        max_queue_size = min(optimal_workers * 3, 20)  # 3x workers or 20, whichever is smaller
-        chunk_queue = queue.Queue(maxsize=max_queue_size)
-        # Start producer thread to pre-load chunks
-        producer_thread = threading.Thread(
-            target=chunk_producer_thread,
-            args=(batch_chunks, chunk_queue, 0, max_queue_size),
-            daemon=True
-        )
-        producer_thread.start()
-        logging.info(f"🚀 Producer-consumer pipeline started with queue size {max_queue_size}")
-        # Consumer pattern: workers pull from queue instead of sequential loading
-        batch_results = []
-        futures = []
-        with ThreadPoolExecutor(max_workers=optimal_workers) as executor:
-            # Process chunks as they become available and handle results in real-time
-            chunks_submitted = 0
-            completed_count = 0
-            total_audio_duration = accumulated_audio_duration
-            # Import audio processing functions
-            from modules.audio_processor import get_chunk_audio_duration
-            from modules.progress_tracker import log_chunk_progress
-            while True:
-                try:
-                    # Get next chunk from producer (blocks until available)
-                    chunk_package = chunk_queue.get(timeout=10)
-                    # Check for completion signal
-                    if chunk_package is None:
-                        break
-                    # Check for shutdown request
-                    if shutdown_requested:
-                        logging.info("🛑 Shutdown requested - stopping chunk submission")
-                        break
-                    # Extract chunk data from package
-                    global_chunk_index = chunk_offset + chunk_package['index']
-                    chunk_text = chunk_package['text']
-                    boundary_type = chunk_package['boundary_type']
-                    chunk_tts_params = chunk_package.get('tts_params') or tts_params
-                    # Build context for chunk (all chunk texts)
-                    all_chunk_texts = []
-                    for cd in all_chunks:
-                        if isinstance(cd, dict):
-                            all_chunk_texts.append(cd["text"])
-                        else:
-                            all_chunk_texts.append(cd[0] if len(cd) > 0 else str(cd))
-                    # Submit chunk to worker thread
-                    future = executor.submit(
-                        process_one_chunk,
-                        global_chunk_index, chunk_text, text_chunks_dir, audio_chunks_dir,
-                        voice_path, chunk_tts_params, start_time, total_chunks,
-                        punc_norm, book_name, log_run_func, log_path, device,
-                        model, asr_model, all_chunk_texts, boundary_type,
-                        asr_enabled
-                    )
-                    futures.append(future)
-                    chunks_submitted += 1
-                    chunk_queue.task_done()
-                    # Check for completed futures while submitting new ones
-                    completed_futures = []
-                    for fut in futures:
-                        if fut.done():
-                            completed_futures.append(fut)
-                    # Process completed futures
-                    for fut in completed_futures:
-                        try:
-                            idx, wav_path = fut.result()
-                            if wav_path and wav_path.exists():
-                                batch_results.append((idx, wav_path))
-                                # Update totals for final batch calculation
-                                chunk_duration = get_chunk_audio_duration(wav_path)
-                                total_audio_duration += chunk_duration
-                                completed_count += 1
-                            futures.remove(fut)  # Remove completed future
-                        except Exception as e:
-                            logging.error(f"❌ Future failed during real-time processing: {e}")
-                            futures.remove(fut)
-                except queue.Empty:
-                    # Timeout waiting for chunks - check if producer is done
-                    if not producer_thread.is_alive():
-                        break
-                    else:
-                        # Producer still working - check for completed futures while waiting
-                        completed_futures = [fut for fut in futures if fut.done()]
-                        for fut in completed_futures:
-                            try:
-                                idx, wav_path = fut.result()
-                                if wav_path and wav_path.exists():
-                                    batch_results.append((idx, wav_path))
-                                    chunk_duration = get_chunk_audio_duration(wav_path)
-                                    total_audio_duration += chunk_duration
-                                    completed_count += 1
-                                futures.remove(fut)
-                            except Exception as e:
-                                logging.error(f"❌ Future failed during timeout processing: {e}")
-                                futures.remove(fut)
-                        continue
-                except Exception as e:
-                    logging.error(f"❌ Error in consumer loop: {e}")
-                    break
-        # Process any remaining futures
-        if futures:
-            for fut in as_completed(futures):
-                try:
-                    idx, wav_path = fut.result()
-                    if wav_path and wav_path.exists():
-                        batch_results.append((idx, wav_path))
-                        # Update batch totals
-                        chunk_duration = get_chunk_audio_duration(wav_path)
-                        total_audio_duration += chunk_duration
-                        completed_count += 1
-                except Exception as e:
-                    logging.error(f"❌ Final future failed: {e}")
-        # Wait for producer thread to complete cleanly
-        if producer_thread.is_alive():
-            producer_thread.join(timeout=5)
-        # Calculate batch-specific audio duration for return
-        batch_audio_duration = total_audio_duration - accumulated_audio_duration
-        logging.info(f"🎉 Producer-consumer pipeline completed: {len(batch_results)} chunks processed")
-        return batch_results, batch_audio_duration
-    except Exception as e:
-        logging.error(f"❌ Producer-consumer pipeline failed: {e}")
-        logging.info("🔄 Falling back to sequential processing...")
-        return [], 0.0  # Return empty results to trigger fallback
 # ============================================================================
 # CHUNK PROCESSING
 # ============================================================================
@@ -710,11 +230,86 @@ def patch_alignment_layer(tfmr, alignment_layer_idx=12):
     target_layer.forward = MethodType(patched_forward, target_layer)
 def process_one_chunk(
     i, chunk, text_chunks_dir, audio_chunks_dir,
     voice_path, tts_params, start_time, total_chunks,
     punc_norm, basename, log_run_func, log_path, device,
-    model, asr_model, all_chunks, boundary_type="none",
     enable_asr=None
 ):
     """Enhanced chunk processing with quality control, contextual silence, and deep cleanup"""
@@ -938,32 +533,13 @@ def process_one_chunk(
     # Enhanced regular cleanup (every chunk)
     del wav
-    optimize_cuda_memory_usage()
     # Additional per-chunk cleanup for long runs
     if (i + 1) % 50 == 0:
         torch.cuda.empty_cache()
         gc.collect()
-    # Show ETA progress updates during actual processing (every 2 chunks)
-    if i % 2 == 0:
-        try:
-            from modules.audio_processor import get_chunk_audio_duration
-            from modules.progress_tracker import log_chunk_progress
-            # Calculate running total audio duration by checking existing chunks
-            total_audio_duration = 0.0
-            for j in range(i + 1):  # Include current chunk
-                check_path = audio_chunks_dir / f"chunk_{j+1:05}.wav"
-                if check_path.exists():
-                    total_audio_duration += get_chunk_audio_duration(check_path)
-            # Show ETA update with accumulated audio
-            log_chunk_progress(i, total_chunks, start_time, total_audio_duration)
-        except Exception as e:
-            # Don't let ETA calculation failures break chunk processing
-            pass
     return i, final_path
 # ============================================================================
@@ -1261,13 +837,6 @@ def process_book_folder(book_dir, voice_path, tts_params, device, skip_cleanup=F
     log_path = output_root / "chunk_validation.log"
     total_audio_duration = 0.0
-    # Initialize performance optimizations
-    deployment_env = detect_deployment_environment()
-    print(f"🌍 Deployment environment: {deployment_env}")
-    # Enable GPU persistence mode for better performance
-    gpu_persistence_enabled = enable_gpu_persistence_mode()
     # Batch processing
     print(f"📊 Processing {total_chunks} chunks in batches of {BATCH_SIZE}")
@@ -1304,45 +873,51 @@ def process_book_folder(book_dir, voice_path, tts_params, device, skip_cleanup=F
                 print(f"❌ ASR model loading failed completely - disabling ASR for this batch")
                 asr_enabled = False
         # Dynamic worker allocation
         optimal_workers = get_optimal_workers()
         print(f"🔧 Using {optimal_workers} workers for batch {batch_start+1}-{batch_end}")
-        # Try producer-consumer pipeline first (Phase 4 optimization)
-        batch_results = []
-        if ENABLE_PRODUCER_CONSUMER_PIPELINE:
-            try:
-                print(f"🚀 Producer-consumer pipeline for batch {batch_start+1}-{batch_end}")
-                pipeline_results = process_chunks_with_pipeline(
-                    all_chunks, batch_chunks, batch_start, text_chunks_dir, audio_chunks_dir,
-                    voice_path, tts_params, start_time, total_chunks, punc_norm, book_dir.name,
-                    log_run, log_path, device, model, asr_model, asr_enabled, optimal_workers,
-                    total_audio_duration  # Pass accumulated duration for proper ETA calculation
-                )
-                # Handle tuple return from pipeline
-                if isinstance(pipeline_results, tuple) and len(pipeline_results) == 2:
-                    batch_results, batch_audio_duration = pipeline_results
-                    total_audio_duration += batch_audio_duration
-                else:
-                    # Fallback for old return format
-                    batch_results = pipeline_results
-                if batch_results:
-                    print(f"✅ Producer-consumer pipeline completed: {len(batch_results)} chunks")
-                    # Pipeline already handled progress logging internally
-            except Exception as e:
-                logging.error(f"❌ Producer-consumer pipeline failed: {e}")
-                if not ENABLE_PIPELINE_FALLBACK:
-                    raise
-                batch_results = []  # Clear failed results
-        # Fallback to original sequential processing if pipeline disabled or failed
-        if not batch_results:
-            print(f"🔄 Sequential processing fallback for batch {batch_start+1}-{batch_end}")
-            futures = []
             with ThreadPoolExecutor(max_workers=optimal_workers) as executor:
                 for i, chunk_data in enumerate(batch_chunks):
                     global_chunk_index = batch_start + i
@@ -1366,21 +941,14 @@ def process_book_folder(book_dir, voice_path, tts_params, device, skip_cleanup=F
                         boundary_type = "paragraph_end" if is_old_para_end else "none"
                         chunk_tts_params = tts_params # Fallback for old format
-                    # Handle both dictionary and tuple formats for backward compatibility
-                    all_chunk_texts = []
-                    for cd in all_chunks:
-                        if isinstance(cd, dict):
-                            all_chunk_texts.append(cd["text"])
-                        else:
-                            # Handle old tuple format (text, is_para_end)
-                            all_chunk_texts.append(cd[0] if len(cd) > 0 else str(cd))
                     futures.append(executor.submit(
                         process_one_chunk,
                         global_chunk_index, chunk, text_chunks_dir, audio_chunks_dir,
                         voice_path, chunk_tts_params, start_time, total_chunks,
                         punc_norm, book_dir.name, log_run, log_path, device,
-                        model, asr_model, all_chunk_texts, boundary_type,
                         asr_enabled
                     ))
@@ -1397,7 +965,7 @@ def process_book_folder(book_dir, voice_path, tts_params, device, skip_cleanup=F
                             total_audio_duration += chunk_duration
                             batch_results.append((idx, wav_path))
-                            # Update progress every 2 chunks within batch
                             completed_count += 1
                             if completed_count % 2 == 0:
                                 log_chunk_progress(batch_start + completed_count - 1, total_chunks, start_time, total_audio_duration)
@@ -1478,4 +1046,4 @@ def process_book_folder(book_dir, voice_path, tts_params, device, skip_cleanup=F
     log_run("\n".join(run_log_lines), output_root / "run.log")
     print(f"📝 Run log written to: {output_root / 'run.log'}")
-    return final_m4b_path, combined_wav_path, run_log_lines

 """
+TTS Engine Module
+Handles ChatterboxTTS interface, model loading, and chunk processing coordination
 """
 import torch
 import logging
 import shutil
 import sys
 import numpy as np
 from datetime import timedelta
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from pathlib import Path
 import torchaudio as ta
 from config.config import *
 from modules.text_processor import smart_punctuate, sentence_chunk_text, detect_content_boundaries
 )
 from modules.progress_tracker import setup_logging, log_chunk_progress, log_run
+# Global shutdown flag
+shutdown_requested = False
+# Console colors
+RED = '\033[91m'
+GREEN = '\033[92m'
+YELLOW = '\033[93m'
+CYAN = '\033[96m'
+RESET = '\033[0m'
 # ============================================================================
 # MEMORY AND MODEL MANAGEMENT
 # ============================================================================
         if allocated > VRAM_SAFETY_THRESHOLD:
             logging.warning(f"⚠️ High VRAM usage during {operation_name}: {allocated:.1f}GB allocated, {reserved:.1f}GB reserved")
+            optimize_memory_usage()
         return allocated, reserved
     return 0, 0
 def get_optimal_workers():
     """Dynamic worker allocation based on VRAM usage"""
     if not USE_DYNAMIC_WORKERS:
     return "cpu"
 def load_optimized_model(device):
+    """Load TTS model with memory optimizations"""
     from src.chatterbox.tts import ChatterboxTTS
     try:
+        # Try to load with FP16 if supported
+        model = ChatterboxTTS.from_pretrained(device=device, torch_dtype=torch.float16)
+        logging.info("✅ Loaded model in FP16 mode (halved VRAM usage)")
+    except:
+        # Fallback to default loading
         model = ChatterboxTTS.from_pretrained(device=device)
+        logging.info("⚠️ Using FP32 mode (FP16 not supported)")
     # Only apply eval() and benchmark if the model has these attributes
     if hasattr(model, 'eval'):
         model.eval()
+    # Set CUDNN benchmark for performance (if available)
+    if torch.backends.cudnn.is_available():
         torch.backends.cudnn.benchmark = True
     return model
 # ============================================================================
 # CHUNK PROCESSING
 # ============================================================================
     target_layer.forward = MethodType(patched_forward, target_layer)
+def process_batch(
+    batch, text_chunks_dir, audio_chunks_dir,
+    voice_path, tts_params, start_time, total_chunks,
+    punc_norm, basename, log_run_func, log_path, device,
+    model, asr_model, all_chunks,
+    enable_asr=None
+):
+    """
+    Process a batch of chunks using the batch-enabled TTS model.
+    """
+    from pydub import AudioSegment
+    import io
+    import soundfile as sf
+    # 1. Prepare batch for TTS
+    texts = [chunk_data['text'] for chunk_data in batch]
+    # All params are the same, so we take them from the first chunk
+    shared_tts_params = batch[0].get("tts_params", tts_params)
+    supported_params = {"exaggeration", "cfg_weight", "temperature", "min_p", "top_p", "repetition_penalty"}
+    tts_args = {k: v for k, v in shared_tts_params.items() if k in supported_params}
+    # 2. Generate audio in a batch
+    try:
+        with torch.no_grad():
+            wavs = model.generate_batch(texts, **tts_args)
+    except Exception as e:
+        logging.error(f"❌ Batch TTS generation failed: {e}")
+        # Fallback to individual processing for this batch
+        results = []
+        for chunk_data in batch:
+             i = chunk_data['index']
+             chunk = chunk_data['text']
+             boundary_type = chunk_data.get("boundary_type", "none")
+             chunk_tts_params = chunk_data.get("tts_params", tts_params)
+             result = process_one_chunk(i, chunk, text_chunks_dir, audio_chunks_dir, voice_path, chunk_tts_params, start_time, total_chunks, punc_norm, basename, log_run_func, log_path, device, model, asr_model, boundary_type, enable_asr)
+             results.append(result)
+        return results
+    # 3. Process and save each audio file from the batch
+    batch_results = []
+    for i, wav_tensor in enumerate(wavs):
+        chunk_data = batch[i]
+        chunk_index = chunk_data['index']
+        boundary_type = chunk_data.get("boundary_type", "none")
+        chunk_id_str = f"{chunk_index+1:05}"
+        if wav_tensor.dim() == 1:
+            wav_tensor = wav_tensor.unsqueeze(0)
+        wav_np = wav_tensor.squeeze().cpu().numpy()
+        with io.BytesIO() as wav_buffer:
+            sf.write(wav_buffer, wav_np, model.sr, format='wav')
+            wav_buffer.seek(0)
+            audio_segment = AudioSegment.from_wav(wav_buffer)
+        # Apply trimming and contextual silence
+        from modules.audio_processor import process_audio_with_trimming_and_silence, trim_audio_endpoint
+        if boundary_type and boundary_type != "none":
+            final_audio = process_audio_with_trimming_and_silence(audio_segment, boundary_type)
+        elif ENABLE_AUDIO_TRIMMING:
+            final_audio = trim_audio_endpoint(audio_segment)
+        else:
+            final_audio = audio_segment
+        # Final save
+        final_path = audio_chunks_dir / f"chunk_{chunk_id_str}.wav"
+        final_audio.export(final_path, format="wav")
+        logging.info(f"✅ Saved final chunk from batch: {final_path.name}")
+        batch_results.append((chunk_index, final_path))
+    return batch_results
 def process_one_chunk(
     i, chunk, text_chunks_dir, audio_chunks_dir,
     voice_path, tts_params, start_time, total_chunks,
     punc_norm, basename, log_run_func, log_path, device,
+    model, asr_model, boundary_type="none",
     enable_asr=None
 ):
     """Enhanced chunk processing with quality control, contextual silence, and deep cleanup"""
     # Enhanced regular cleanup (every chunk)
     del wav
+    optimize_memory_usage()
     # Additional per-chunk cleanup for long runs
     if (i + 1) % 50 == 0:
         torch.cuda.empty_cache()
         gc.collect()
     return i, final_path
 # ============================================================================
     log_path = output_root / "chunk_validation.log"
     total_audio_duration = 0.0
     # Batch processing
     print(f"📊 Processing {total_chunks} chunks in batches of {BATCH_SIZE}")
                 print(f"❌ ASR model loading failed completely - disabling ASR for this batch")
                 asr_enabled = False
+        futures = []
+        batch_results = []
         # Dynamic worker allocation
         optimal_workers = get_optimal_workers()
         print(f"🔧 Using {optimal_workers} workers for batch {batch_start+1}-{batch_end}")
+        use_vader = tts_params.get('use_vader', True)
+        if not use_vader:
+            # --- BATCH MODE ---
+            print(f"🚀 VADER disabled. Running in high-performance batch mode.")
+            tts_batch_size = config_params.get('tts_batch_size', 16)
+            chunk_batches = [batch_chunks[i:i + tts_batch_size] for i in range(0, len(batch_chunks), tts_batch_size)]
+            print(f"📊 Processing {len(batch_chunks)} chunks in {len(chunk_batches)} batches of size {tts_batch_size}.")
+            with ThreadPoolExecutor(max_workers=optimal_workers) as executor:
+                for batch in chunk_batches:
+                    if shutdown_requested:
+                        break
+                    futures.append(executor.submit(
+                        process_batch,
+                        batch, text_chunks_dir, audio_chunks_dir,
+                        voice_path, tts_params, start_time, total_chunks,
+                        punc_norm, book_dir.name, log_run, log_path, device,
+                        model, asr_model, all_chunks, asr_enabled
+                    ))
+                # Wait for batches to complete
+                for fut in as_completed(futures):
+                    try:
+                        # process_batch returns a list of (idx, wav_path) tuples
+                        results_list = fut.result()
+                        for idx, wav_path in results_list:
+                            if wav_path and wav_path.exists():
+                                chunk_duration = get_chunk_audio_duration(wav_path)
+                                total_audio_duration += chunk_duration
+                                batch_results.append((idx, wav_path))
+                        log_chunk_progress(len(batch_results), total_chunks, start_time, total_audio_duration)
+                    except Exception as e:
+                        logging.error(f"Future failed in batch: {e}")
+        else:
+            # --- SINGLE/NUANCED MODE ---
+            print(f"🎨 VADER enabled. Running in nuanced, single-chunk mode.")
             with ThreadPoolExecutor(max_workers=optimal_workers) as executor:
                 for i, chunk_data in enumerate(batch_chunks):
                     global_chunk_index = batch_start + i
                         boundary_type = "paragraph_end" if is_old_para_end else "none"
                         chunk_tts_params = tts_params # Fallback for old format
                     futures.append(executor.submit(
                         process_one_chunk,
                         global_chunk_index, chunk, text_chunks_dir, audio_chunks_dir,
                         voice_path, chunk_tts_params, start_time, total_chunks,
                         punc_norm, book_dir.name, log_run, log_path, device,
+                        model, asr_model, boundary_type,
                         asr_enabled
                     ))
                             total_audio_duration += chunk_duration
                             batch_results.append((idx, wav_path))
+                            # Update progress every 10 chunks within batch
                             completed_count += 1
                             if completed_count % 2 == 0:
                                 log_chunk_progress(batch_start + completed_count - 1, total_chunks, start_time, total_audio_duration)
     log_run("\n".join(run_log_lines), output_root / "run.log")
     print(f"📝 Run log written to: {output_root / 'run.log'}")
+    return final_m4b_path, combined_wav_path, run_log_lines

modules/tts_engine.py.20250811-120000.bak ADDED Viewed

	@@ -0,0 +1,710 @@

+"""
+TTS Engine Module
+Handles ChatterboxTTS interface, model loading, and chunk processing coordination
+"""
+import torch
+import gc
+import time
+import logging
+import shutil
+import sys
+from datetime import timedelta
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from pathlib import Path
+import torchaudio as ta
+from config.config import *
+from modules.text_processor import smart_punctuate, sentence_chunk_text, detect_content_boundaries
+def find_chunks_json_file(book_name):
+    """Find the corresponding chunks JSON file for a book"""
+    from config.config import AUDIOBOOK_ROOT
+    # Look in the TTS processing directory
+    tts_chunks_dir = AUDIOBOOK_ROOT / book_name / "TTS" / "text_chunks"
+    json_path = tts_chunks_dir / "chunks_info.json"
+    if json_path.exists():
+        return json_path
+    # Also check old Text_Input location for backwards compatibility
+    text_input_dir = Path("Text_Input")
+    possible_names = [
+        f"{book_name}_chunks.json",
+        f"{book_name.lower()}_chunks.json",
+        f"{book_name.replace(' ', '_')}_chunks.json"
+    ]
+    for name in possible_names:
+        old_json_path = text_input_dir / name
+        if old_json_path.exists():
+            return old_json_path
+    return None
+from modules.audio_processor import (
+    smart_audio_validation, apply_smart_fade, add_chunk_end_silence,
+    add_contextual_silence, pause_for_chunk_review, get_chunk_audio_duration,
+    has_mid_energy_drop, apply_smart_fade_memory, smart_audio_validation_memory
+)
+from modules.file_manager import (
+    setup_book_directories, find_book_files, ensure_voice_sample_compatibility,
+    combine_audio_chunks, get_audio_files_in_directory, convert_to_m4b, add_metadata_to_m4b
+)
+from modules.progress_tracker import setup_logging, log_chunk_progress, log_run
+# ============================================================================
+# MEMORY AND MODEL MANAGEMENT
+# ============================================================================
+def monitor_gpu_activity(operation_name):
+    """Lightweight GPU monitoring for high-speed processing"""
+    # Disabled expensive pynvml queries to free up GPU cycles
+    if torch.cuda.is_available():
+        allocated = torch.cuda.memory_allocated() / 1024**3
+        # Skip GPU utilization queries during production runs
+        return allocated, 0
+    return 0, 0
+def optimize_memory_usage():
+    """Aggressive memory management for 8GB VRAM"""
+    torch.cuda.empty_cache()
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.ipc_collect()
+def monitor_vram_usage(operation_name=""):
+    """Real-time VRAM monitoring"""
+    if torch.cuda.is_available():
+        allocated = torch.cuda.memory_allocated() / 1024**3
+        reserved = torch.cuda.memory_reserved() / 1024**3
+        if allocated > VRAM_SAFETY_THRESHOLD:
+            logging.warning(f"⚠️ High VRAM usage during {operation_name}: {allocated:.1f}GB allocated, {reserved:.1f}GB reserved")
+            optimize_memory_usage()
+        return allocated, reserved
+    return 0, 0
+def get_optimal_workers(user_max_workers=None):
+    """Dynamic worker allocation based on device type and resources"""
+    # Check for user override first
+    if user_max_workers is not None:
+        print(f"👤 Using user-defined workers: {user_max_workers}")
+        return int(user_max_workers)
+    if not USE_DYNAMIC_WORKERS:
+        return MAX_WORKERS
+    # CPU-based worker calculation
+    if not torch.cuda.is_available():
+        import psutil
+        cpu_cores = psutil.cpu_count(logical=False)  # Physical cores
+        available_memory = psutil.virtual_memory().available / 1024**3  # GB
+        # Each TTS model instance needs ~2-3GB RAM
+        # Conservative estimation: allow 1 worker per 4GB available RAM
+        memory_limited_workers = max(1, int(available_memory / 4))
+        # CPU-based calculation: use 50% of physical cores for intensive TTS work
+        cpu_limited_workers = max(1, int(cpu_cores * 0.5))
+        optimal_workers = min(memory_limited_workers, cpu_limited_workers, MAX_WORKERS)
+        print(f"💻 CPU mode: {cpu_cores} cores, {available_memory:.1f}GB RAM → {optimal_workers} workers")
+        return optimal_workers
+    # GPU-based worker calculation (existing logic)
+    allocated_vram = torch.cuda.memory_allocated() / 1024**3
+    if allocated_vram < 5.0:
+        return min(TEST_MAX_WORKERS, MAX_WORKERS)
+    elif allocated_vram < VRAM_SAFETY_THRESHOLD:
+        return min(2, MAX_WORKERS)
+    else:
+        return 1
+def load_optimized_model(device):
+    """Load TTS model with memory optimizations and device detection"""
+    from chatterbox.tts import ChatterboxTTS
+    # Detect available device if not specified or if CUDA not available
+    if device == "cuda" and not torch.cuda.is_available():
+        print("⚠️  CUDA not available, falling back to CPU")
+        device = "cpu"
+    elif device == "auto":
+        if torch.cuda.is_available():
+            device = "cuda"
+            print("✅ CUDA detected, using GPU")
+        else:
+            device = "cpu"
+            print("💻 No GPU detected, using CPU")
+    print(f"🔧 Loading ChatterboxTTS model on device: {device}")
+    try:
+        # Load model (ChatterboxTTS.from_pretrained doesn't support torch_dtype parameter)
+        model = ChatterboxTTS.from_pretrained(device=device)
+        logging.info(f"✅ Loaded ChatterboxTTS model on {device}")
+    except Exception as e:
+        print(f"❌ Failed to load model on {device}: {e}")
+        if device == "cuda":
+            print("🔄 Retrying with CPU...")
+            try:
+                model = ChatterboxTTS.from_pretrained(device="cpu")
+                logging.info("✅ Loaded model on CPU (GPU failed)")
+                device = "cpu"
+            except Exception as e2:
+                print(f"❌ Failed to load model on CPU: {e2}")
+                raise e2
+        else:
+            raise e
+    # Only apply eval() and benchmark if the model has these attributes
+    if hasattr(model, 'eval'):
+        model.eval()
+    # Set CUDNN benchmark for performance (if available)
+    if torch.backends.cudnn.is_available():
+        torch.backends.cudnn.benchmark = True
+    return model
+# ============================================================================
+# CHUNK PROCESSING
+# ============================================================================
+def patch_alignment_layer(tfmr, alignment_layer_idx=12):
+    """Patch alignment layer to avoid recursion"""
+    from types import MethodType
+    target_layer = tfmr.layers[alignment_layer_idx].self_attn
+    original_forward = target_layer.forward
+    def patched_forward(self, *args, **kwargs):
+        kwargs['output_attentions'] = True
+        return original_forward(*args, **kwargs)
+    target_layer.forward = MethodType(patched_forward, target_layer)
+def process_one_chunk(
+    i, chunk, text_chunks_dir, audio_chunks_dir,
+    voice_path, tts_params, start_time, total_chunks,
+    punc_norm, basename, log_run_func, log_path, device,
+    model, asr_model, all_chunks, boundary_type="none"
+):
+    """Enhanced chunk processing with quality control, contextual silence, and deep cleanup"""
+    import difflib
+    from pydub import AudioSegment
+    chunk_id_str = f"{i+1:05}"
+    chunk_path = text_chunks_dir / f"chunk_{chunk_id_str}.txt"
+    with open(chunk_path, 'w', encoding='utf-8') as cf:
+        cf.write(chunk)
+    chunk_audio_path = audio_chunks_dir / f"chunk_{chunk_id_str}.wav"
+    # ============================================================================
+    # ENHANCED PERIODIC DEEP CLEANUP
+    # ============================================================================
+    cleanup_interval = CLEANUP_INTERVAL
+    # Skip cleanup on model reinitialization chunks to avoid conflicts
+    if (i + 1) % cleanup_interval == 0 and (i + 1) % BATCH_SIZE != 0:
+        print(f"\n🧹 {YELLOW}DEEP CLEANUP at chunk {i+1}/{total_chunks}...{RESET}")
+        # Enhanced VRAM monitoring before cleanup
+        allocated_before = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0
+        reserved_before = torch.cuda.memory_reserved() / 1024**3 if torch.cuda.is_available() else 0
+        print(f"   Before: VRAM Allocated: {allocated_before:.1f}GB | Reserved: {reserved_before:.1f}GB")
+        # Bulk temp file cleanup
+        print("   🗑️ Cleaning bulk temporary files...")
+        temp_patterns = ["*_try*.wav", "*_pre.wav", "*_fade*.wav", "*_debug*.wav", "*_temp*.wav", "*_backup*.wav"]
+        total_temp_files = 0
+        for pattern in temp_patterns:
+            temp_files = list(audio_chunks_dir.glob(pattern))
+            for temp_file in temp_files:
+                temp_file.unlink(missing_ok=True)
+            total_temp_files += len(temp_files)
+        if total_temp_files > 0:
+            print(f"   🗑️ Removed {total_temp_files} temporary audio files")
+        # Aggressive CUDA context reset
+        print("   🔄 Performing aggressive CUDA context reset...")
+        torch.cuda.synchronize()
+        torch.cuda.empty_cache()
+        torch.cuda.ipc_collect()
+        # Force CUDA context reset
+        if hasattr(torch.cuda, 'reset_peak_memory_stats'):
+            torch.cuda.reset_peak_memory_stats()
+        if hasattr(torch._C, '_cuda_clearCublasWorkspaces'):
+            torch._C._cuda_clearCublasWorkspaces()
+        # Force garbage collection multiple times
+        for _ in range(3):
+            gc.collect()
+        # Clear model cache if it has one
+        if hasattr(model, 'clear_cache'):
+            model.clear_cache()
+        elif hasattr(model, 'reset_states'):
+            model.reset_states()
+        # Brief pause to let GPU settle
+        time.sleep(1.0)
+        # Monitor after cleanup
+        allocated_after = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0
+        reserved_after = torch.cuda.memory_reserved() / 1024**3 if torch.cuda.is_available() else 0
+        print(f"   After:  VRAM Allocated: {allocated_after:.1f}GB | Reserved: {reserved_after:.1f}GB")
+        print(f"   Freed:  {allocated_before - allocated_after:.1f}GB allocated, {reserved_before - reserved_after:.1f}GB reserved")
+        print(f"🧹 {GREEN}Deep cleanup complete!{RESET}\n")
+    best_sim, best_asr_text = -1, ""
+    wav_path_active = None
+    attempt_paths = []
+    mid_drop_retries = 0
+    max_mid_drop_retries = 2
+    for attempt_num in range(1, 3):
+        logging.info(f"🔁 Starting TTS for chunk {chunk_id_str}, attempt {attempt_num}")
+        try:
+            tts_args = {k: v for k, v in tts_params.items() if k not in ["max_workers", "enable_asr"]}
+            # monitor_gpu_activity(f"Before TTS chunk_{chunk_id_str}")  # Disabled for speed
+            with torch.no_grad():
+                wav = model.generate(chunk, **tts_args).detach().cpu()
+            # monitor_gpu_activity(f"After TTS chunk_{chunk_id_str}")  # Disabled for speed
+            if wav.dim() == 1:
+                wav = wav.unsqueeze(0)
+            # Retry if mid-energy drop is enabled and detected (check in memory)
+            if ENABLE_MID_DROP_CHECK and has_mid_energy_drop(wav, model.sr):
+                mid_drop_retries += 1
+                if mid_drop_retries >= max_mid_drop_retries:
+                    logging.info(f"⚠️ Mid-drop retry limit reached for {chunk_id_str}. Accepting audio.")
+                else:
+                    logging.info(f"⚠️ Mid-chunk noise detected in {chunk_id_str}. Retrying...")
+                    continue
+            # Convert tensor to AudioSegment for in-memory processing
+            import io
+            import soundfile as sf
+            from pydub import AudioSegment
+            # Convert wav tensor to AudioSegment (in memory)
+            wav_np = wav.squeeze().numpy()
+            with io.BytesIO() as wav_buffer:
+                sf.write(wav_buffer, wav_np, model.sr, format='wav')
+                wav_buffer.seek(0)
+                audio_segment = AudioSegment.from_wav(wav_buffer)
+            # Smart fade removed - replaced by precise audio trimming
+            # Audio health validation disabled for speed
+            # Note: Audio trimming will handle end-of-speech cleanup more precisely
+            # ASR validation (memory-based processing) - check user setting first
+            enable_asr_user = tts_params.get('enable_asr', False)
+            if (enable_asr_user or ENABLE_ASR) and asr_model is not None:
+                from modules.audio_processor import asr_f1_score
+                import io
+                import soundfile as sf
+                # monitor_gpu_activity(f"Before ASR chunk_{chunk_id_str}")  # Disabled for speed
+                try:
+                    # Process ASR completely in memory - no disk writes
+                    # Convert AudioSegment to numpy array for ASR
+                    samples = np.array(audio_segment.get_array_of_samples())
+                    if audio_segment.channels == 2:
+                        samples = samples.reshape((-1, 2)).mean(axis=1)
+                    # Normalize to float32 for ASR model
+                    audio_np = samples.astype(np.float32) / audio_segment.max_possible_amplitude
+                    # Use ASR model directly on numpy array (if supported)
+                    # Note: This depends on the ASR model's input capabilities
+                    result = asr_model.transcribe(audio_np)
+                    if not isinstance(result, dict) or "text" not in result:
+                        raise ValueError(f"Invalid ASR result type: {type(result)}")
+                    asr_text = result.get("text", "").strip()
+                    sim_ratio = asr_f1_score(punc_norm(chunk), asr_text)
+                except Exception as e:
+                    print(f"❌ ASR failed for {chunk_id_str}: {e}")
+                    log_run_func(f"ASR VALIDATION FAILED - Chunk {chunk_id_str}:\nExpected:\n{chunk}\nActual:\n<ASR Failure: {e}>\nSimilarity: -1.000\n" + "="*50, log_path)
+                    sim_ratio = -1.0
+                    continue
+                logging.info(f"ASR similarity for chunk {chunk_id_str}: {sim_ratio:.3f}")
+                if sim_ratio < 0.7:
+                    continue
+                # Track best valid match
+                best_sim = sim_ratio
+                best_asr_text = asr_text
+                # monitor_gpu_activity(f"After ASR chunk_{chunk_id_str}")  # Disabled for speed
+            # Success - we have processed audio in memory
+            final_audio = audio_segment
+            break
+        except Exception as e:
+            import traceback
+            logging.error(f"Exception during TTS attempt {attempt_num} for chunk {chunk_id_str}: {e}")
+            traceback.print_exc()
+            continue
+    if 'final_audio' not in locals():
+        logging.info(f"❌ Chunk {chunk_id_str} failed all attempts.")
+        return None, None
+    # Apply trimming and contextual silence in memory before final save
+    from modules.audio_processor import process_audio_with_trimming_and_silence
+    if boundary_type and boundary_type != "none":
+        final_audio = process_audio_with_trimming_and_silence(final_audio, boundary_type)
+        print(f"🔇 Added {boundary_type} silence to chunk {i+1:05}")
+    else:
+        # Apply trimming even without boundary type if enabled
+        if ENABLE_AUDIO_TRIMMING:
+            from modules.audio_processor import trim_audio_endpoint
+            final_audio = trim_audio_endpoint(final_audio)
+    # Note: ENABLE_CHUNK_END_SILENCE is now handled by punctuation-specific silence
+    # The new system provides more precise silence based on actual punctuation
+    # Final save - only disk write in entire process
+    final_path = audio_chunks_dir / f"chunk_{chunk_id_str}.wav"
+    final_audio.export(final_path, format="wav")
+    logging.info(f"✅ Saved final chunk: {final_path.name}")
+    # No intermediate file cleanup needed - all processing done in memory
+    # Log details - only log ASR failures
+    asr_active = enable_asr_user or ENABLE_ASR
+    if asr_active and best_sim < 0.8:
+        log_run_func(f"ASR VALIDATION FAILED - Chunk {chunk_id_str}:\nExpected:\n{chunk}\nActual:\n{best_asr_text}\nSimilarity: {best_sim:.3f}\n" + "="*50, log_path)
+    elif not asr_active:
+        log_run_func(f"Chunk {chunk_id_str}: Original text: {chunk}", log_path)
+    # Silence already added in memory above - no disk processing needed
+    # Enhanced regular cleanup (every chunk)
+    del wav
+    optimize_memory_usage()
+    # Additional per-chunk cleanup for long runs
+    if (i + 1) % 50 == 0:
+        torch.cuda.empty_cache()
+        gc.collect()
+    return i, final_path
+# ============================================================================
+# MAIN BOOK PROCESSING FUNCTION
+# ============================================================================
+from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
+from wrapper.chunk_loader import save_chunks
+def generate_enriched_chunks(text_file, output_dir, user_tts_params=None):
+    """Reads a text file, performs VADER sentiment analysis, and returns enriched chunks."""
+    analyzer = SentimentIntensityAnalyzer()
+    raw_text = text_file.read_text(encoding='utf-8')
+    cleaned = smart_punctuate(raw_text)
+    chunks = sentence_chunk_text(cleaned)
+    # Use user-provided parameters as base, or fall back to config defaults
+    if user_tts_params:
+        base_exaggeration = user_tts_params.get('exaggeration', BASE_EXAGGERATION)
+        base_cfg_weight = user_tts_params.get('cfg_weight', BASE_CFG_WEIGHT)
+        base_temperature = user_tts_params.get('temperature', BASE_TEMPERATURE)
+    else:
+        base_exaggeration = BASE_EXAGGERATION
+        base_cfg_weight = BASE_CFG_WEIGHT
+        base_temperature = BASE_TEMPERATURE
+    enriched = []
+    chunk_texts = [chunk_text for chunk_text, _ in chunks]
+    for i, (chunk_text, is_para_end) in enumerate(chunks):
+        sentiment_scores = analyzer.polarity_scores(chunk_text)
+        compound_score = sentiment_scores['compound']
+        exaggeration = base_exaggeration + (compound_score * VADER_EXAGGERATION_SENSITIVITY)
+        cfg_weight = base_cfg_weight + (compound_score * VADER_CFG_WEIGHT_SENSITIVITY)
+        temperature = base_temperature + (compound_score * VADER_TEMPERATURE_SENSITIVITY)
+        # Clamp values to defined min/max
+        exaggeration = round(max(TTS_PARAM_MIN_EXAGGERATION, min(exaggeration, TTS_PARAM_MAX_EXAGGERATION)), 2)
+        cfg_weight = round(max(TTS_PARAM_MIN_CFG_WEIGHT, min(cfg_weight, TTS_PARAM_MAX_CFG_WEIGHT)), 2)
+        temperature = round(max(TTS_PARAM_MIN_TEMPERATURE, min(temperature, TTS_PARAM_MAX_TEMPERATURE)), 2)
+        boundary_type = detect_content_boundaries(chunk_text, i, chunk_texts, is_para_end)
+        enriched.append({
+            "index": i,
+            "text": chunk_text,
+            "word_count": len(chunk_text.split()),
+            "boundary_type": boundary_type if boundary_type else "none",
+            "sentiment_compound": compound_score,
+            "tts_params": {
+                "exaggeration": exaggeration,
+                "cfg_weight": cfg_weight,
+                "temperature": temperature
+            }
+        })
+    output_json_path = output_dir / "chunks_info.json"
+    save_chunks(output_json_path, enriched)
+    return enriched
+def process_book_folder(book_dir, voice_path, tts_params, device, skip_cleanup=False):
+    """Enhanced book processing with batch processing to prevent hangs"""
+    print(f"🔍 DEBUG: Entering process_book_folder with book_dir='{book_dir}', voice_path='{voice_path}'")
+    from chatterbox.tts import punc_norm
+    print(f"🔍 DEBUG: Successfully imported punc_norm")
+    # Setup directories
+    print(f"🔍 DEBUG: Calling setup_book_directories...")
+    output_root, tts_dir, text_chunks_dir, audio_chunks_dir = setup_book_directories(book_dir)
+    print(f"🔍 DEBUG: Directory setup complete")
+    # Clean previous processing files (but skip for resume operations)
+    if skip_cleanup:
+        print(f"🔄 RESUME MODE: Skipping cleanup to preserve existing chunks")
+        print(f"📁 Preserving: {text_chunks_dir}, {audio_chunks_dir}")
+    else:
+        print(f"🧹 FRESH PROCESSING: Cleaning previous processing files...")
+        import glob
+        # Clear text chunks
+        for txt_file in text_chunks_dir.glob("*.txt"):
+            txt_file.unlink(missing_ok=True)
+        for json_file in text_chunks_dir.glob("*.json"):
+            json_file.unlink(missing_ok=True)
+        # Clear audio chunks
+        for wav_file in audio_chunks_dir.glob("*.wav"):
+            wav_file.unlink(missing_ok=True)
+        # Clear logs
+        for log_file in output_root.glob("*.log"):
+            log_file.unlink(missing_ok=True)
+        print(f"✅ Cleanup complete")
+    # Find book files
+    print(f"🔍 DEBUG: Calling find_book_files...")
+    book_files = find_book_files(book_dir)
+    text_files = [book_files['text']] if book_files['text'] else []
+    cover_file = book_files['cover']
+    nfo_file = book_files['nfo']
+    print(f"🔍 DEBUG: Found text files: {text_files}")
+    if not text_files:
+        logging.info(f"[{book_dir.name}] ERROR: No .txt files found in the book folder.")
+        return None, None, []
+    setup_logging(output_root)
+    # Generate enriched chunks with VADER analysis using user parameters
+    all_chunks = generate_enriched_chunks(text_files[0], text_chunks_dir, tts_params)
+    # Create run_log_lines
+    print(f"🔍 DEBUG: Creating run_log_lines...")
+    print(f"🔍 DEBUG: voice_path type: {type(voice_path)}, value: {voice_path}")
+    # Extract voice name for logging
+    voice_name_for_log = voice_path.stem if hasattr(voice_path, 'stem') else Path(voice_path).stem
+    run_log_lines = [
+        f"\n===== Processing: {book_dir.name} =====",
+        f"Voice: {voice_name_for_log}",
+        f"Started: {time.strftime('%Y-%m-%d %H:%M:%S')}",
+        f"Text files processed: {len(text_files)}",
+        f"Total chunks generated: {len(all_chunks)}"
+    ]
+    start_time = time.time()
+    total_chunks = len(all_chunks)
+    log_path = output_root / "chunk_validation.log"
+    total_audio_duration = 0.0
+    # Batch processing
+    print(f"📊 Processing {total_chunks} chunks in batches of {BATCH_SIZE}")
+    all_results = []
+    for batch_start in range(0, total_chunks, BATCH_SIZE):
+        batch_end = min(batch_start + BATCH_SIZE, total_chunks)
+        batch_chunks = all_chunks[batch_start:batch_end]
+        print(f"\n🔄 Processing batch: chunks {batch_start+1}-{batch_end}")
+        # Fresh model for each batch
+        model = load_optimized_model(device)
+        compatible_voice = ensure_voice_sample_compatibility(voice_path, output_dir=tts_dir)
+        model.prepare_conditionals(compatible_voice)
+        # Load ASR model once per batch if needed (check user settings first, then global config)
+        asr_model = None
+        enable_asr_user = tts_params.get('enable_asr', False)
+        if enable_asr_user or ENABLE_ASR:
+            import whisper
+            print(f"🎤 Loading Whisper ASR model for batch... (user setting: {enable_asr_user})")
+            # Use same device as TTS model, with fallback to CPU
+            asr_device = device if torch.cuda.is_available() and device == "cuda" else "cpu"
+            print(f"🎤 Loading ASR model on device: {asr_device}")
+            asr_model = whisper.load_model("base", device=asr_device)
+        futures = []
+        batch_results = []
+        # Dynamic worker allocation
+        user_max_workers = tts_params.get('max_workers', None)
+        optimal_workers = get_optimal_workers(user_max_workers)
+        print(f"🔧 Using {optimal_workers} workers for batch {batch_start+1}-{batch_end}")
+        with ThreadPoolExecutor(max_workers=optimal_workers) as executor:
+            for i, chunk_data in enumerate(batch_chunks):
+                global_chunk_index = batch_start + i
+                # Check for shutdown request
+                if shutdown_requested:
+                    print(f"\n⏹️ {YELLOW}Stopping submission of new chunks...{RESET}")
+                    break
+                # Handle both dictionary and tuple formats for chunk data
+                if isinstance(chunk_data, dict):
+                    chunk = chunk_data["text"]
+                    boundary_type = chunk_data.get("boundary_type", "none")
+                    # Use chunk-specific TTS params if available, otherwise fall back to global
+                    chunk_tts_params = chunk_data.get("tts_params", tts_params)
+                else:
+                    # Handle old tuple format (text, is_para_end) - convert to boundary_type
+                    chunk = chunk_data[0] if len(chunk_data) > 0 else str(chunk_data)
+                    # Convert old is_paragraph_end to boundary_type
+                    is_old_para_end = chunk_data[1] if len(chunk_data) > 1 else False
+                    boundary_type = "paragraph_end" if is_old_para_end else "none"
+                    chunk_tts_params = tts_params # Fallback for old format
+                # Handle both dictionary and tuple formats for backward compatibility
+                all_chunk_texts = []
+                for cd in all_chunks:
+                    if isinstance(cd, dict):
+                        all_chunk_texts.append(cd["text"])
+                    else:
+                        # Handle old tuple format (text, is_para_end)
+                        all_chunk_texts.append(cd[0] if len(cd) > 0 else str(cd))
+                futures.append(executor.submit(
+                    process_one_chunk,
+                    global_chunk_index, chunk, text_chunks_dir, audio_chunks_dir,
+                    voice_path, chunk_tts_params, start_time, total_chunks,
+                    punc_norm, book_dir.name, log_run, log_path, device,
+                    model, asr_model, all_chunk_texts, boundary_type
+                ))
+            # Wait for batch to complete
+            print(f"🔄 {CYAN}Waiting for batch {batch_start+1}-{batch_end} to complete...{RESET}")
+            completed_count = 0
+            for fut in as_completed(futures):
+                try:
+                    idx, wav_path = fut.result()
+                    if wav_path and wav_path.exists():
+                        # Measure actual audio duration for this chunk
+                        chunk_duration = get_chunk_audio_duration(wav_path)
+                        total_audio_duration += chunk_duration
+                        batch_results.append((idx, wav_path))
+                        # Update progress every 10 chunks within batch
+                        completed_count += 1
+                        if completed_count % 10 == 0:
+                            log_chunk_progress(batch_start + completed_count - 1, total_chunks, start_time, total_audio_duration)
+                except Exception as e:
+                    logging.error(f"Future failed in batch: {e}")
+        # Clean up model after batch
+        print(f"🧹 Cleaning up after batch {batch_start+1}-{batch_end}")
+        del model
+        if asr_model:
+            del asr_model
+        torch.cuda.empty_cache()
+        gc.collect()
+        time.sleep(2)
+        all_results.extend(batch_results)
+        print(f"✅ Batch {batch_start+1}-{batch_end} completed ({len(batch_results)} chunks)")
+    # Final processing
+    quarantine_dir = audio_chunks_dir / "quarantine"
+    pause_for_chunk_review(quarantine_dir)
+    # Collect final chunk paths
+    chunk_paths = get_audio_files_in_directory(audio_chunks_dir)
+    if not chunk_paths:
+        logging.info(f"{RED}❌ No valid audio chunks found. Skipping concatenation and conversion.{RESET}")
+        return None, None, []
+    # Calculate timing
+    elapsed_total = time.time() - start_time
+    elapsed_td = timedelta(seconds=int(elapsed_total))
+    total_audio_duration_final = sum(get_chunk_audio_duration(chunk_path) for chunk_path in chunk_paths)
+    audio_duration_td = timedelta(seconds=int(total_audio_duration_final))
+    realtime_factor = total_audio_duration_final / elapsed_total if elapsed_total > 0 else 0.0
+    print(f"\n⏱️ TTS Processing Complete:")
+    print(f"   Elapsed Time: {CYAN}{str(elapsed_td)}{RESET}")
+    print(f"   Audio Duration: {GREEN}{str(audio_duration_td)}{RESET}")
+    print(f"   Realtime Factor: {YELLOW}{realtime_factor:.2f}x{RESET}")
+    # Combine audio
+    voice_name = voice_path.stem if hasattr(voice_path, 'stem') else Path(voice_path).stem
+    combined_wav_path = output_root / f"{book_dir.name} [{voice_name}].wav"
+    print("\n💾 Saving WAV file...")
+    combine_audio_chunks(chunk_paths, combined_wav_path)
+    # M4B conversion with normalization
+    temp_m4b_path = output_root / "output.m4b"
+    final_m4b_path = output_root / f"{book_dir.name}[{voice_name}].m4b"
+    convert_to_m4b(combined_wav_path, temp_m4b_path)
+    add_metadata_to_m4b(temp_m4b_path, final_m4b_path, cover_file, nfo_file)
+    logging.info(f"Audiobook created: {final_m4b_path}")
+    # Add final info to run log
+    run_log_lines.extend([
+        f"Combined WAV: {combined_wav_path}",
+        "--- Generation Settings ---",
+        f"Batch Processing: Enabled ({BATCH_SIZE} chunks per batch)",
+        f"ASR Enabled: {enable_asr_user or ENABLE_ASR} (user: {enable_asr_user}, global: {ENABLE_ASR})",
+        f"Hum Detection: {ENABLE_HUM_DETECTION}",
+        f"Dynamic Workers: {USE_DYNAMIC_WORKERS}",
+        f"Voice used: {voice_name}",
+        f"Exaggeration: {tts_params['exaggeration']}",
+        f"CFG weight: {tts_params['cfg_weight']}",
+        f"Temperature: {tts_params['temperature']}",
+        f"Processing Time: {str(elapsed_td)}",
+        f"Audio Duration: {str(audio_duration_td)}",
+        f"Realtime Factor: {realtime_factor:.2f}x",
+        f"Total Chunks: {len(chunk_paths)}"
+    ])
+    # Write the run log
+    log_run("\n".join(run_log_lines), output_root / "run.log")
+    print(f"📝 Run log written to: {output_root / 'run.log'}")
+    return final_m4b_path, combined_wav_path, run_log_lines

src/chatterbox/models/t3/t3.py CHANGED Viewed

@@ -224,6 +224,7 @@ class T3(nn.Module):
         do_sample=True,
         temperature=0.8,
         top_p=0.8,
         length_penalty=1.0,
         repetition_penalty=2.0,
         cfg_weight=0,

         do_sample=True,
         temperature=0.8,
         top_p=0.8,
+        min_p=0.05,
         length_penalty=1.0,
         repetition_penalty=2.0,
         cfg_weight=0,

src/chatterbox/tts.py CHANGED Viewed

@@ -1,5 +1,7 @@
 from dataclasses import dataclass
 from pathlib import Path
 import librosa
 import torch
@@ -189,6 +191,50 @@ class ChatterboxTTS:
         return cls.from_local(Path(local_path).parent, device)
     def prepare_conditionals(self, wav_fpath, exaggeration=0.5):
         ## Load reference wav
         s3gen_ref_wav, _sr = librosa.load(wav_fpath, sr=S3GEN_SR)
@@ -214,6 +260,30 @@ class ChatterboxTTS:
         ).to(device=self.device)
         self.conds = Conditionals(t3_cond, s3gen_ref_dict)
     def generate(
         self,
         text,
@@ -278,4 +348,68 @@ class ChatterboxTTS:
             )
             wav = wav.squeeze(0).detach().cpu().numpy()
             watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
-        return torch.from_numpy(watermarked_wav).unsqueeze(0)

 from dataclasses import dataclass
 from pathlib import Path
+import os
+import logging
 import librosa
 import torch
         return cls.from_local(Path(local_path).parent, device)
     def prepare_conditionals(self, wav_fpath, exaggeration=0.5):
+        """Prepare voice conditionals with optional caching for performance optimization"""
+        # Try to import voice caching functions (with fallback for compatibility)
+        try:
+            from modules.tts_engine import (
+                get_voice_cache_key,
+                _voice_embedding_cache,
+                _cache_memory_usage,
+                estimate_cache_memory_mb,
+                get_available_memory,
+                clear_voice_embedding_cache
+            )
+            from config.config import (
+                ENABLE_VOICE_EMBEDDING_CACHE,
+                VOICE_CACHE_MEMORY_LIMIT_MB,
+                ENABLE_ADAPTIVE_VOICE_CACHE
+            )
+            caching_available = True
+        except ImportError:
+            caching_available = False
+            logging.warning("Voice embedding caching not available - using standard processing")
+        # Check cache if caching is enabled and available
+        if caching_available and ENABLE_VOICE_EMBEDDING_CACHE:
+            cache_key = get_voice_cache_key(wav_fpath, exaggeration)
+            # Check if we have cached embeddings
+            if cache_key in _voice_embedding_cache:
+                try:
+                    self.conds = _voice_embedding_cache[cache_key]
+                    logging.info("🚀 Using cached voice embeddings - significant speedup!")
+                    return
+                except Exception as e:
+                    logging.warning(f"⚠️ Cache retrieval failed: {e}, computing fresh embeddings")
+            # Check memory constraints before caching
+            available_memory = get_available_memory()
+            if ENABLE_ADAPTIVE_VOICE_CACHE and available_memory < 2048:  # Less than 2GB available
+                logging.warning("🧠 Low memory detected - disabling voice embedding cache")
+                caching_available = False
+        # Original embedding computation (always runs for new voices or cache misses)
+        logging.info("🎤 Computing voice embeddings (this may take a moment)")
         ## Load reference wav
         s3gen_ref_wav, _sr = librosa.load(wav_fpath, sr=S3GEN_SR)
         ).to(device=self.device)
         self.conds = Conditionals(t3_cond, s3gen_ref_dict)
+        # Cache the computed embeddings if caching is enabled
+        if caching_available and ENABLE_VOICE_EMBEDDING_CACHE:
+            try:
+                # Check memory usage before caching
+                global _cache_memory_usage
+                estimated_size = estimate_cache_memory_mb(self.conds)
+                if _cache_memory_usage + estimated_size <= VOICE_CACHE_MEMORY_LIMIT_MB:
+                    cache_key = get_voice_cache_key(wav_fpath, exaggeration)
+                    _voice_embedding_cache[cache_key] = self.conds
+                    _cache_memory_usage += estimated_size
+                    logging.info(f"💾 Voice embeddings cached ({estimated_size}MB, total: {_cache_memory_usage}MB)")
+                else:
+                    logging.warning("⚠️ Cache memory limit reached - clearing old cache")
+                    clear_voice_embedding_cache()
+                    # Try caching again after clearing
+                    cache_key = get_voice_cache_key(wav_fpath, exaggeration)
+                    _voice_embedding_cache[cache_key] = self.conds
+                    _cache_memory_usage = estimated_size
+                    logging.info(f"💾 Voice embeddings cached after cleanup ({estimated_size}MB)")
+            except Exception as e:
+                logging.warning(f"⚠️ Caching failed: {e}, continuing without cache")
     def generate(
         self,
         text,
             )
             wav = wav.squeeze(0).detach().cpu().numpy()
             watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
+        return torch.from_numpy(watermarked_wav).unsqueeze(0)
+    def generate_batch(
+        self,
+        texts: list[str],
+        audio_prompt_path=None,
+        exaggeration=0.5,
+        cfg_weight=0.5,
+        temperature=0.8,
+        min_p=0.05,
+        top_p=0.8,
+        repetition_penalty=2.0,
+    ):
+        if audio_prompt_path:
+            self.prepare_conditionals(audio_prompt_path, exaggeration=exaggeration)
+        else:
+            assert self.conds is not None, "Please `prepare_conditionals` first or specify `audio_prompt_path`"
+        if exaggeration != self.conds.t3.emotion_adv[0, 0, 0]:
+            _cond: T3Cond = self.conds.t3
+            self.conds.t3 = T3Cond(
+                speaker_emb=_cond.speaker_emb,
+                cond_prompt_speech_tokens=_cond.cond_prompt_speech_tokens,
+                emotion_adv=exaggeration * torch.ones(1, 1, 1),
+            ).to(device=self.device)
+        norm_texts = [punc_norm(text) for text in texts]
+        text_tokens = [self.tokenizer.text_to_tokens(text) for text in norm_texts]
+        max_len = max(t.shape[1] for t in text_tokens)
+        text_tokens_padded = torch.stack([F.pad(t, (0, max_len - t.shape[1]), value=self.t3.hp.stop_text_token) for t in text_tokens])
+        text_tokens_padded = text_tokens_padded.squeeze(1).to(self.device)
+        if cfg_weight > 0.0:
+            text_tokens_padded = torch.cat([text_tokens_padded, text_tokens_padded], dim=0)
+        sot = self.t3.hp.start_text_token
+        text_tokens_padded = F.pad(text_tokens_padded, (1, 0), value=sot)
+        with torch.inference_mode():
+            speech_tokens_batch = self.t3.inference(
+                t3_cond=self.conds.t3,
+                text_tokens=text_tokens_padded,
+                max_new_tokens=1000,
+                temperature=temperature,
+                cfg_weight=cfg_weight,
+                min_p=min_p,
+                top_p=top_p,
+                repetition_penalty=repetition_penalty,
+            )
+            wavs = []
+            for speech_tokens in speech_tokens_batch:
+                speech_tokens = drop_invalid_tokens(speech_tokens)
+                speech_tokens = speech_tokens[speech_tokens < 6561]
+                speech_tokens = speech_tokens.to(self.device)
+                wav, _ = self.s3gen.inference(
+                    speech_tokens=speech_tokens,
+                    ref_dict=self.conds.gen,
+                )
+                wav = wav.squeeze(0).detach().cpu().numpy()
+                watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
+                wavs.append(torch.from_numpy(watermarked_wav).unsqueeze(0))
+        return wavs

src/chatterbox/tts.py.20250811-120000.bak ADDED Viewed

	@@ -0,0 +1,281 @@

+from dataclasses import dataclass
+from pathlib import Path
+import librosa
+import torch
+import perth
+import torch.nn.functional as F
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+from .models.t3 import T3
+from .models.s3tokenizer import S3_SR, drop_invalid_tokens
+from .models.s3gen import S3GEN_SR, S3Gen
+from .models.tokenizers import EnTokenizer
+from .models.voice_encoder import VoiceEncoder
+from .models.t3.modules.cond_enc import T3Cond
+REPO_ID = "ResembleAI/chatterbox"
+def punc_norm(text: str) -> str:
+    """
+        Quick cleanup func for punctuation from LLMs or
+        containing chars not seen often in the dataset
+    """
+    if len(text) == 0:
+        return "You need to add some text for me to talk."
+    # Capitalise first letter
+    if text[0].islower():
+        text = text[0].upper() + text[1:]
+    # Remove multiple space chars
+    text = " ".join(text.split())
+    # Replace uncommon/llm punc
+    punc_to_replace = [
+        ("...", ", "),
+        ("…", ", "),
+        (":", ","),
+        (" - ", ", "),
+        (";", ", "),
+        ("—", "-"),
+        ("–", "-"),
+        (" ,", ","),
+        (""", '"'),
+        (""", '"'),
+        ("‘", "'"),
+        ("’", "'"),
+    ]
+    for old_char_sequence, new_char in punc_to_replace:
+        text = text.replace(old_char_sequence, new_char)
+    # Add full stop if no ending punc
+    text = text.rstrip(" ")
+    sentence_enders = {".", "!", "?", "-", ","}
+    # Check for punctuation at end, including inside quotes
+    has_ending_punct = False
+    if any(text.endswith(p) for p in sentence_enders):
+        has_ending_punct = True
+    elif len(text) >= 2 and text[-1] in ['"', "'"] and text[-2] in sentence_enders:
+        # Check for punctuation before closing quote: ?" or .'
+        has_ending_punct = True
+    if not has_ending_punct:
+        text += "."
+    return text
+@dataclass
+class Conditionals:
+    """
+    Conditionals for T3 and S3Gen
+    - T3 conditionals:
+        - speaker_emb
+        - clap_emb
+        - cond_prompt_speech_tokens
+        - cond_prompt_speech_emb
+        - emotion_adv
+    - S3Gen conditionals:
+        - prompt_token
+        - prompt_token_len
+        - prompt_feat
+        - prompt_feat_len
+        - embedding
+    """
+    t3: T3Cond
+    gen: dict
+    def to(self, device):
+        self.t3 = self.t3.to(device=device)
+        for k, v in self.gen.items():
+            if torch.is_tensor(v):
+                self.gen[k] = v.to(device=device)
+        return self
+    def save(self, fpath: Path):
+        arg_dict = dict(
+            t3=self.t3.__dict__,
+            gen=self.gen
+        )
+        torch.save(arg_dict, fpath)
+    @classmethod
+    def load(cls, fpath, map_location="cpu"):
+        if isinstance(map_location, str):
+            map_location = torch.device(map_location)
+        kwargs = torch.load(fpath, map_location=map_location, weights_only=True)
+        return cls(T3Cond(**kwargs['t3']), kwargs['gen'])
+class ChatterboxTTS:
+    ENC_COND_LEN = 6 * S3_SR
+    DEC_COND_LEN = 10 * S3GEN_SR
+    def __init__(
+        self,
+        t3: T3,
+        s3gen: S3Gen,
+        ve: VoiceEncoder,
+        tokenizer: EnTokenizer,
+        device: str,
+        conds: Conditionals = None,
+    ):
+        self.sr = S3GEN_SR  # sample rate of synthesized audio
+        self.t3 = t3
+        self.s3gen = s3gen
+        self.ve = ve
+        self.tokenizer = tokenizer
+        self.device = device
+        self.conds = conds
+        self.watermarker = perth.PerthImplicitWatermarker()
+    @classmethod
+    def from_local(cls, ckpt_dir, device) -> 'ChatterboxTTS':
+        ckpt_dir = Path(ckpt_dir)
+        # Always load to CPU first for non-CUDA devices to handle CUDA-saved models
+        if device in ["cpu", "mps"]:
+            map_location = torch.device('cpu')
+        else:
+            map_location = None
+        ve = VoiceEncoder()
+        ve.load_state_dict(
+            load_file(ckpt_dir / "ve.safetensors")
+        )
+        ve.to(device).eval()
+        t3 = T3()
+        t3_state = load_file(ckpt_dir / "t3_cfg.safetensors")
+        if "model" in t3_state.keys():
+            t3_state = t3_state["model"][0]
+        t3.load_state_dict(t3_state)
+        t3.to(device).eval()
+        s3gen = S3Gen()
+        s3gen.load_state_dict(
+            load_file(ckpt_dir / "s3gen.safetensors"), strict=False
+        )
+        s3gen.to(device).eval()
+        tokenizer = EnTokenizer(
+            str(ckpt_dir / "tokenizer.json")
+        )
+        conds = None
+        if (builtin_voice := ckpt_dir / "conds.pt").exists():
+            conds = Conditionals.load(builtin_voice, map_location=map_location).to(device)
+        return cls(t3, s3gen, ve, tokenizer, device, conds=conds)
+    @classmethod
+    def from_pretrained(cls, device) -> 'ChatterboxTTS':
+        # Check if MPS is available on macOS
+        if device == "mps" and not torch.backends.mps.is_available():
+            if not torch.backends.mps.is_built():
+                print("MPS not available because the current PyTorch install was not built with MPS enabled.")
+            else:
+                print("MPS not available because the current MacOS version is not 12.3+ and/or you do not have an MPS-enabled device on this machine.")
+            device = "cpu"
+        for fpath in ["ve.safetensors", "t3_cfg.safetensors", "s3gen.safetensors", "tokenizer.json", "conds.pt"]:
+            local_path = hf_hub_download(repo_id=REPO_ID, filename=fpath)
+        return cls.from_local(Path(local_path).parent, device)
+    def prepare_conditionals(self, wav_fpath, exaggeration=0.5):
+        ## Load reference wav
+        s3gen_ref_wav, _sr = librosa.load(wav_fpath, sr=S3GEN_SR)
+        ref_16k_wav = librosa.resample(s3gen_ref_wav, orig_sr=S3GEN_SR, target_sr=S3_SR)
+        s3gen_ref_wav = s3gen_ref_wav[:self.DEC_COND_LEN]
+        s3gen_ref_dict = self.s3gen.embed_ref(s3gen_ref_wav, S3GEN_SR, device=self.device)
+        # Speech cond prompt tokens
+        if plen := self.t3.hp.speech_cond_prompt_len:
+            s3_tokzr = self.s3gen.tokenizer
+            t3_cond_prompt_tokens, _ = s3_tokzr.forward([ref_16k_wav[:self.ENC_COND_LEN]], max_len=plen)
+            t3_cond_prompt_tokens = torch.atleast_2d(t3_cond_prompt_tokens).to(self.device)
+        # Voice-encoder speaker embedding
+        ve_embed = torch.from_numpy(self.ve.embeds_from_wavs([ref_16k_wav], sample_rate=S3_SR))
+        ve_embed = ve_embed.mean(axis=0, keepdim=True).to(self.device)
+        t3_cond = T3Cond(
+            speaker_emb=ve_embed,
+            cond_prompt_speech_tokens=t3_cond_prompt_tokens,
+            emotion_adv=exaggeration * torch.ones(1, 1, 1),
+        ).to(device=self.device)
+        self.conds = Conditionals(t3_cond, s3gen_ref_dict)
+    def generate(
+        self,
+        text,
+        audio_prompt_path=None,
+        exaggeration=0.5,
+        cfg_weight=0.5,
+        temperature=0.8,
+        min_p=0.05,
+        top_p=0.8,
+        repetition_penalty=2.0,
+    ):
+        if audio_prompt_path:
+            self.prepare_conditionals(audio_prompt_path, exaggeration=exaggeration)
+        else:
+            assert self.conds is not None, "Please `prepare_conditionals` first or specify `audio_prompt_path`"
+        # Update exaggeration if needed
+        if exaggeration != self.conds.t3.emotion_adv[0, 0, 0]:
+            _cond: T3Cond = self.conds.t3
+            self.conds.t3 = T3Cond(
+                speaker_emb=_cond.speaker_emb,
+                cond_prompt_speech_tokens=_cond.cond_prompt_speech_tokens,
+                emotion_adv=exaggeration * torch.ones(1, 1, 1),
+            ).to(device=self.device)
+        # Norm and tokenize text
+        text = punc_norm(text)
+        text_tokens = self.tokenizer.text_to_tokens(text).to(self.device)
+        if cfg_weight > 0.0:
+            text_tokens = torch.cat([text_tokens, text_tokens], dim=0)  # Need two seqs for CFG
+        sot = self.t3.hp.start_text_token
+        eot = self.t3.hp.stop_text_token
+        text_tokens = F.pad(text_tokens, (1, 0), value=sot)
+        text_tokens = F.pad(text_tokens, (0, 1), value=eot)
+        with torch.inference_mode():
+            speech_tokens = self.t3.inference(
+                t3_cond=self.conds.t3,
+                text_tokens=text_tokens,
+                max_new_tokens=1000,  # TODO: use the value in config
+                temperature=temperature,
+                cfg_weight=cfg_weight,
+                min_p=min_p,
+                top_p=top_p,
+                repetition_penalty=repetition_penalty,
+            )
+            # Extract only the conditional batch.
+            speech_tokens = speech_tokens[0]
+            # TODO: output becomes 1D
+            speech_tokens = drop_invalid_tokens(speech_tokens)
+            speech_tokens = speech_tokens[speech_tokens < 6561]
+            speech_tokens = speech_tokens.to(self.device)
+            wav, _ = self.s3gen.inference(
+                speech_tokens=speech_tokens,
+                ref_dict=self.conds.gen,
+            )
+            wav = wav.squeeze(0).detach().cpu().numpy()
+            watermarked_wav = self.watermarker.apply_watermark(wav, sample_rate=self.sr)
+        return torch.from_numpy(watermarked_wav).unsqueeze(0)

test_parallel_performance.py ADDED Viewed

	@@ -0,0 +1,235 @@

+#!/usr/bin/env python3
+"""
+Parallel Processing Performance Diagnostic Tool
+Test various theories about why HuggingFace deployment is slow
+"""
+import time
+import threading
+import multiprocessing
+import concurrent.futures
+import os
+import sys
+import psutil
+import torch
+from pathlib import Path
+def test_basic_multiprocessing():
+    """Test 1: Basic multiprocessing capability"""
+    print("=== TEST 1: Basic Multiprocessing ===")
+    def simple_task(n):
+        return n * n
+    # Sequential
+    start = time.time()
+    results_seq = [simple_task(i) for i in range(100)]
+    seq_time = time.time() - start
+    print(f"Sequential: {seq_time:.3f}s")
+    # Parallel
+    start = time.time()
+    with multiprocessing.Pool(processes=4) as pool:
+        results_par = pool.map(simple_task, range(100))
+    par_time = time.time() - start
+    print(f"Parallel (4 workers): {par_time:.3f}s")
+    print(f"Speedup: {seq_time/par_time:.2f}x")
+    print()
+def test_thread_vs_process():
+    """Test 2: Threading vs Processing"""
+    print("=== TEST 2: Threading vs Processing ===")
+    def cpu_task(n):
+        # CPU intensive task
+        total = 0
+        for i in range(n * 1000):
+            total += i * i
+        return total
+    tasks = [1000] * 8
+    # Sequential
+    start = time.time()
+    seq_results = [cpu_task(t) for t in tasks]
+    seq_time = time.time() - start
+    print(f"Sequential: {seq_time:.3f}s")
+    # Threading
+    start = time.time()
+    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+        thread_results = list(executor.map(cpu_task, tasks))
+    thread_time = time.time() - start
+    print(f"ThreadPool: {thread_time:.3f}s, speedup: {seq_time/thread_time:.2f}x")
+    # Processing
+    start = time.time()
+    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
+        process_results = list(executor.map(cpu_task, tasks))
+    process_time = time.time() - start
+    print(f"ProcessPool: {process_time:.3f}s, speedup: {seq_time/process_time:.2f}x")
+    print()
+def test_gpu_access():
+    """Test 3: GPU sharing capability"""
+    print("=== TEST 3: GPU Access ===")
+    if not torch.cuda.is_available():
+        print("No CUDA available - skipping GPU test")
+        print()
+        return
+    def gpu_task(worker_id):
+        try:
+            device = torch.device("cuda")
+            # Create a small tensor operation
+            x = torch.randn(1000, 1000, device=device)
+            y = torch.randn(1000, 1000, device=device)
+            start = time.time()
+            for _ in range(10):
+                z = torch.mm(x, y)
+            duration = time.time() - start
+            return f"Worker {worker_id}: {duration:.3f}s"
+        except Exception as e:
+            return f"Worker {worker_id}: ERROR - {e}"
+    # Sequential GPU access
+    start = time.time()
+    seq_results = [gpu_task(i) for i in range(4)]
+    seq_time = time.time() - start
+    print("Sequential GPU:")
+    for result in seq_results:
+        print(f"  {result}")
+    print(f"Total sequential time: {seq_time:.3f}s")
+    # Parallel GPU access
+    start = time.time()
+    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+        par_results = list(executor.map(gpu_task, range(4)))
+    par_time = time.time() - start
+    print("Parallel GPU:")
+    for result in par_results:
+        print(f"  {result}")
+    print(f"Total parallel time: {par_time:.3f}s")
+    print()
+def test_model_loading():
+    """Test 4: Model loading overhead"""
+    print("=== TEST 4: Model Loading Simulation ===")
+    # Simulate loading a heavy model
+    def load_model():
+        # Simulate model loading time
+        time.sleep(0.5)  # 500ms loading time
+        return {"model": "loaded", "size": "large"}
+    def task_with_model_loading(worker_id):
+        start = time.time()
+        model = load_model()  # Each worker loads model
+        processing_time = 0.1  # Simulate 100ms processing
+        time.sleep(processing_time)
+        total_time = time.time() - start
+        return f"Worker {worker_id}: {total_time:.3f}s"
+    # Test with model loading per worker
+    print("Each worker loads model:")
+    start = time.time()
+    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+        results = list(executor.map(task_with_model_loading, range(4)))
+    total_time = time.time() - start
+    for result in results:
+        print(f"  {result}")
+    print(f"Total time with per-worker loading: {total_time:.3f}s")
+    # Compare with shared model (simulation)
+    shared_load_time = 0.5  # Load once
+    processing_time = 0.1 * 4  # Process 4 items sequentially
+    simulated_shared_time = shared_load_time + processing_time
+    print(f"Simulated shared model time: {simulated_shared_time:.3f}s")
+    print(f"Overhead from per-worker loading: {total_time - simulated_shared_time:.3f}s")
+    print()
+def test_environment_info():
+    """Test 5: Environment information"""
+    print("=== TEST 5: Environment Info ===")
+    print(f"Python version: {sys.version}")
+    print(f"Platform: {sys.platform}")
+    print(f"CPU cores: {multiprocessing.cpu_count()}")
+    print(f"CPU usage: {psutil.cpu_percent()}%")
+    print(f"Memory: {psutil.virtual_memory().percent}% used")
+    if torch.cuda.is_available():
+        print(f"CUDA available: Yes")
+        print(f"CUDA devices: {torch.cuda.device_count()}")
+        print(f"Current device: {torch.cuda.current_device()}")
+        print(f"Device name: {torch.cuda.get_device_name()}")
+        if hasattr(torch.cuda, 'memory_summary'):
+            print("GPU Memory:")
+            print(torch.cuda.memory_summary(abbreviated=True))
+    else:
+        print("CUDA available: No")
+    # Check for environment variables that might affect multiprocessing
+    mp_vars = [
+        'OMP_NUM_THREADS', 'MKL_NUM_THREADS', 'OPENBLAS_NUM_THREADS',
+        'VECLIB_MAXIMUM_THREADS', 'NUMEXPR_NUM_THREADS'
+    ]
+    print("Threading environment variables:")
+    for var in mp_vars:
+        value = os.environ.get(var, 'Not set')
+        print(f"  {var}: {value}")
+    print()
+def test_worker_creation():
+    """Test 6: Worker creation monitoring"""
+    print("=== TEST 6: Worker Creation ===")
+    def monitored_task(worker_id):
+        pid = os.getpid()
+        tid = threading.get_ident()
+        return f"Worker {worker_id}: PID={pid}, TID={tid}"
+    print("Main process:")
+    print(f"  PID: {os.getpid()}")
+    print(f"  TID: {threading.get_ident()}")
+    print("ThreadPoolExecutor workers:")
+    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+        results = list(executor.map(monitored_task, range(4)))
+    for result in results:
+        print(f"  {result}")
+    print("ProcessPoolExecutor workers:")
+    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
+        results = list(executor.map(monitored_task, range(4)))
+    for result in results:
+        print(f"  {result}")
+    print()
+def main():
+    print("🔍 Parallel Processing Diagnostic Tool")
+    print("=" * 50)
+    print()
+    test_environment_info()
+    test_basic_multiprocessing()
+    test_thread_vs_process()
+    test_gpu_access()
+    test_model_loading()
+    test_worker_creation()
+    print("🏁 Diagnostic complete!")
+    print()
+    print("ANALYSIS:")
+    print("- If basic multiprocessing is slow: Environment blocks parallelism")
+    print("- If threading faster than processing: Use ThreadPoolExecutor")
+    print("- If GPU parallel time >> sequential: GPU contention issue")
+    print("- If model loading overhead high: Need model sharing strategy")
+    print("- If same PID for all workers: Using threads, not processes")
+if __name__ == "__main__":
+    main()

utils/generate_from_json (copy).py ADDED Viewed

	@@ -0,0 +1,146 @@

+#!/usr/bin/env python3
+"""
+Direct Audio Generation from JSON Tool
+This script allows for generating audiobook chunks directly from a pre-existing
+`chunks_info.json` file. It is intended for debugging and testing purposes,
+allowing a user to manually edit the TTS parameters in the JSON file and
+hear the results without the VADER analysis step.
+"""
+import torch
+from pathlib import Path
+import sys
+from concurrent.futures import ThreadPoolExecutor, as_completed
+import time
+from datetime import timedelta
+# Add project root to path to allow module imports
+project_root = Path(__file__).parent
+sys.path.append(str(project_root))
+from config.config import *
+from modules.tts_engine import load_optimized_model, process_one_chunk, prewarm_model_with_voice
+from modules.file_manager import setup_book_directories, list_voice_samples, ensure_voice_sample_compatibility
+from wrapper.chunk_loader import load_chunks
+from chatterbox.tts import punc_norm
+from modules.progress_tracker import log_chunk_progress, log_run
+def main():
+    """Main function to drive the generation process."""
+    print(f"{BOLD}{CYAN}--- Direct Audio Generation from JSON Tool ---\{RESET}")
+    # 1. Get Book Name
+    book_name = input("Enter the book name (e.g., 'london'): ").strip()
+    if not book_name:
+        print("❌ Book name cannot be empty.")
+        return
+    # 2. Locate and Load JSON
+    book_audio_dir = AUDIOBOOK_ROOT / book_name
+    json_path = book_audio_dir / "TTS" / "text_chunks" / "chunks_info.json"
+    if not json_path.exists():
+        print(f"❌ Error: JSON file not found at {json_path}")
+        print("Please ensure you have run the 'Prepare text file' option for this book first.")
+        return
+    print(f"📖 Loading chunks from: {json_path}")
+    all_chunks = load_chunks(str(json_path))
+    print(f"✅ Found {len(all_chunks)} chunks.")
+    # 3. Select Voice
+    voice_files = list_voice_samples()
+    if not voice_files:
+        print(f"❌ No voice samples found in {VOICE_SAMPLES_DIR}")
+        return
+    print("\nAvailable voices:")
+    for i, voice_file in enumerate(voice_files, 1):
+        print(f" [{i}] {voice_file.stem}")
+    while True:
+        try:
+            choice = input("Select voice number: ").strip()
+            idx = int(choice) - 1
+            if 0 <= idx < len(voice_files):
+                voice_path = voice_files[idx]
+                break
+            print("Invalid selection.")
+        except (ValueError, IndexError):
+            print("Invalid selection.")
+    # Ensure voice compatibility
+    voice_path = ensure_voice_sample_compatibility(voice_path)
+    # 4. Setup Environment
+    if torch.cuda.is_available():
+        device = "cuda"
+    elif torch.backends.mps.is_available():
+        device = "mps"
+    else:
+        device = "cpu"
+    print(f"\n🚀 Using device: {device}")
+    print(f"🎤 Using voice: {Path(voice_path).name}")
+    # 5. Load Model
+    model = load_optimized_model(device)
+    # 6. Pre-warm model to eliminate first chunk quality variations
+    print(f"🔥 Pre-warming model with voice sample: {Path(voice_path).name}")
+    from modules.tts_engine import prewarm_model_with_voice
+    compatible_voice = ensure_voice_sample_compatibility(voice_path)
+    # Use default TTS params for pre-warming since we don't have user params here
+    model = prewarm_model_with_voice(model, compatible_voice, None)
+    # 7. Process Chunks
+    output_root, tts_dir, text_chunks_dir, audio_chunks_dir = setup_book_directories(Path(TEXT_INPUT_ROOT) / book_name)
+    # Clean existing audio chunks
+    print("🧹 Clearing old audio chunks...")
+    for wav_file in audio_chunks_dir.glob("*.wav"):
+        wav_file.unlink()
+    start_time = time.time()
+    total_chunks = len(all_chunks)
+    log_path = output_root / "debug_generation.log"
+    print(f"\n🔄 Generating {total_chunks} chunks...")
+    with ThreadPoolExecutor(max_workers=1) as executor: # Force sequential processing
+        futures = []
+        for i, chunk_data in enumerate(all_chunks):
+            # Extract exaggeration from JSON, force others to default
+            chunk_tts_params = {
+                "exaggeration": chunk_data.get("tts_params", {}).get("exaggeration", DEFAULT_EXAGGERATION),
+                "cfg_weight": DEFAULT_CFG_WEIGHT,
+                "temperature": DEFAULT_TEMPERATURE
+            }
+            future = executor.submit(
+                process_one_chunk,
+                i, chunk_data['text'], text_chunks_dir, audio_chunks_dir,
+                voice_path, chunk_tts_params, start_time, total_chunks,
+                punc_norm, book_name, log_run, log_path, device,
+                model, None, chunk_data['is_paragraph_end'], all_chunks, chunk_data['boundary_type']
+            )
+            futures.append(future)
+        for future in as_completed(futures):
+            try:
+                result = future.result()
+                if result:
+                    idx, _ = result
+                    log_chunk_progress(idx, total_chunks, start_time, 0)
+            except Exception as e:
+                print(f"\n❌ An error occurred while processing a chunk: {e}")
+    elapsed_time = time.time() - start_time
+    print(f"\n{GREEN}✅ Generation Complete!{RESET}")
+    print(f"⏱️ Total time: {timedelta(seconds=int(elapsed_time))}")
+    print(f"🔊 Audio chunks are in: {audio_chunks_dir}")
+    print("You can now use Option 3 from the main menu to combine them.")
+if __name__ == "__main__":
+    main()