Spaces:

samarthnaikk
/

ttlm

Sleeping

Samarth Naik commited on Jan 1

Commit

be42ab9

1 Parent(s): 9edbda4

feat: Switch from Coqui TTS to Piper TTS for better performance

- Replace heavy Coqui TTS with lightweight Piper TTS
- Add support for multiple voice models and quality levels
- Implement speed control for speech synthesis
- Dramatically reduce Docker image size and build time
- Add voice discovery endpoint (/voices)
- Automatic model downloading on first use
- Update all documentation and test scripts
- Optimize for fast CPU-only inference on HF Spaces

Files changed (5) hide show

Dockerfile +12 -12
README.md +59 -48
app.py +170 -84
requirements.txt +0 -6
test_api.py +72 -18

Dockerfile CHANGED Viewed

@@ -3,30 +3,30 @@ FROM python:3.10-slim
 # Set working directory
 WORKDIR /app
-# Install system dependencies required for audio processing
 RUN apt-get update && apt-get install -y \
-    build-essential \
-    libsndfile1-dev \
-    ffmpeg \
-    git \
     wget \
     && apt-get clean \
     && rm -rf /var/lib/apt/lists/*
-# Upgrade pip and install wheel
-RUN pip install --upgrade pip setuptools wheel
-# Copy requirements first for better caching
 COPY requirements.txt .
-# Install Python dependencies with more robust approach
-RUN pip install --no-cache-dir torch torchaudio --index-url https://download.pytorch.org/whl/cpu
-RUN pip install --no-cache-dir TTS==0.22.0
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy application code
 COPY . .
 # Expose port for Hugging Face Spaces
 EXPOSE 7860

 # Set working directory
 WORKDIR /app
+# Install system dependencies required for Piper TTS
 RUN apt-get update && apt-get install -y \
     wget \
+    curl \
     && apt-get clean \
     && rm -rf /var/lib/apt/lists/*
+# Install Piper TTS binary
+RUN wget -O piper.tar.gz "https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz" \
+    && tar -xzf piper.tar.gz \
+    && mv piper/piper /usr/local/bin/ \
+    && chmod +x /usr/local/bin/piper \
+    && rm -rf piper.tar.gz piper
+# Copy requirements and install Python dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 # Copy application code
 COPY . .
+# Create models directory
+RUN mkdir -p ./piper_models
 # Expose port for Hugging Face Spaces
 EXPOSE 7860

README.md CHANGED Viewed

@@ -8,70 +8,73 @@ app_file: app.py
 pinned: false
 ---
-# Text-to-Speech API with Coqui TTS
-A production-ready Text-to-Speech API built with FastAPI and Coqui TTS, designed to run on Hugging Face Spaces.
 ## Features
-- **High-Quality TTS**: Uses Coqui's `xtts_v2` multilingual model
-- **Voice Cloning**: Optional speaker reference for voice cloning
-- **CPU Optimized**: Runs efficiently on CPU-only environments
-- **REST API**: Simple GET/POST endpoints
 - **Production Ready**: Proper error handling, logging, and health checks
 ## API Usage
 ### Simple GET Request
 ```bash
-curl "https://your-space-url/tts?text=Hello%20world&language=en"
 ```
 ### POST with JSON
 ```bash
 curl -X POST "https://your-space-url/tts" \
   -H "Content-Type: application/json" \
-  -d '{"text": "Hello world", "language": "en"}'
 ```
-### POST with Voice Cloning
 ```bash
 curl -X POST "https://your-space-url/tts" \
   -F "text=Hello world" \
-  -F "language=en" \
-  -F "speaker_wav=@path/to/speaker.wav"
 ```
 ## Endpoints
-- `GET /` - Health check
 - `GET /tts` - Simple text-to-speech conversion
-- `POST /tts` - Advanced TTS with optional voice cloning
 - `GET /health` - Detailed health status
-## Supported Languages
-The XTTS v2 model supports multiple languages including:
-- English (en)
-- Spanish (es)
-- French (fr)
-- German (de)
-- Italian (it)
-- Portuguese (pt)
-- Polish (pl)
-- Turkish (tr)
-- Russian (ru)
-- Dutch (nl)
-- Czech (cs)
-- Arabic (ar)
-- Chinese (zh-cn)
-- Japanese (ja)
-- Hungarian (hu)
-- Korean (ko)
 ## Response
-All endpoints return a WAV audio file that can be played directly in browsers or audio players.
 ## Local Development
@@ -79,31 +82,39 @@ All endpoints return a WAV audio file that can be played directly in browsers or
 # Install dependencies
 pip install -r requirements.txt
 # Run the application
 python app.py
 ```
 The API will be available at `http://localhost:7860`
-## Model Information
-This application uses the `tts_models/multilingual/multi-dataset/xtts_v2` model from Coqui TTS, which provides:
-- High-quality multilingual speech synthesis
-- Voice cloning capabilities
-- CPU-friendly inference
-- Support for 16+ languages
 ## Error Handling
 The API includes comprehensive error handling for:
 - Invalid text input
-- Unsupported file formats
-- Model loading failures
 - Audio generation errors
-## Performance Notes
-- Model loads once at startup (not per request)
-- Optimized for CPU inference
-- Temporary files are automatically cleaned up
-- Response streaming for large audio files

 pinned: false
 ---
+# Text-to-Speech API with Piper TTS
+A production-ready Text-to-Speech API built with FastAPI and Piper TTS, designed to run on Hugging Face Spaces.
 ## Features
+- **High-Quality TTS**: Uses Piper's neural TTS models
+- **Multiple Voices**: Support for various languages and voice styles
+- **Fast & Lightweight**: ONNX-based models for efficient CPU inference
 - **Production Ready**: Proper error handling, logging, and health checks
+- **Easy Deployment**: Optimized for containerized environments
 ## API Usage
 ### Simple GET Request
 ```bash
+curl "https://your-space-url/tts?text=Hello%20world&voice=en-us-amy-low"
 ```
 ### POST with JSON
 ```bash
 curl -X POST "https://your-space-url/tts" \
   -H "Content-Type: application/json" \
+  -d '{"text": "Hello world", "voice": "en-us-amy-medium", "speed": 1.0}'
 ```
+### POST with Form Data
 ```bash
 curl -X POST "https://your-space-url/tts" \
   -F "text=Hello world" \
+  -F "voice=en-us-ryan-low" \
+  -F "speed=1.2"
 ```
+## Available Voices
+Get the full list of available voices:
+```bash
+curl "https://your-space-url/voices"
+```
+### Supported Voices Include:
+- **English (US)**: `en-us-amy-low`, `en-us-amy-medium`, `en-us-ryan-low`, `en-us-ryan-medium`
+- **English (GB)**: `en-gb-alan-low`, `en-gb-alan-medium`
+- **German**: `de-de-thorsten-low`, `de-de-thorsten-medium`
+- **Spanish**: `es-es-marta-low`, `es-es-marta-medium`
+- **French**: `fr-fr-siwis-low`, `fr-fr-siwis-medium`
+*Note: `-low` voices are faster but lower quality, `-medium` voices have better quality but are slower.*
 ## Endpoints
+- `GET /` - Health check and available voices
+- `GET /voices` - List all available voices
 - `GET /tts` - Simple text-to-speech conversion
+- `POST /tts` - Advanced TTS with voice and speed control
 - `GET /health` - Detailed health status
+## Parameters
+- **text** (required): Text to convert to speech
+- **voice** (optional): Voice to use (default: `en-us-amy-low`)
+- **speed** (optional): Speech speed multiplier (default: 1.0, range: 0.5-2.0)
 ## Response
+All TTS endpoints return a WAV audio file that can be played directly in browsers or audio players.
 ## Local Development
 # Install dependencies
 pip install -r requirements.txt
+# Install Piper TTS binary (Linux/macOS)
+wget -O piper.tar.gz "https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz"
+tar -xzf piper.tar.gz
+sudo mv piper/piper /usr/local/bin/
+chmod +x /usr/local/bin/piper
 # Run the application
 python app.py
 ```
 The API will be available at `http://localhost:7860`
+## About Piper TTS
+This application uses [Piper TTS](https://github.com/rhasspy/piper) by Rhasspy, which provides:
+- High-quality neural text-to-speech
+- ONNX-based models for efficient CPU inference
+- Multiple languages and voice styles
+- Fast synthesis speeds
+- Small model sizes perfect for deployment
+## Performance Notes
+- Models are downloaded automatically on first use
+- Cached models for faster subsequent requests
+- Optimized for CPU inference
+- Temporary files are automatically cleaned up
+- Average synthesis time: ~1-3 seconds for typical sentences
 ## Error Handling
 The API includes comprehensive error handling for:
 - Invalid text input
+- Unsupported voice selection
+- Model download failures
 - Audio generation errors

app.py CHANGED Viewed

@@ -1,170 +1,235 @@
 """
-Text-to-Speech API using Coqui TTS
 Production-ready FastAPI application for Hugging Face Spaces
 """
 import os
 import tempfile
 import logging
 from pathlib import Path
 from typing import Optional
 from fastapi import FastAPI, HTTPException, UploadFile, File, Form
 from fastapi.responses import FileResponse
 from pydantic import BaseModel
 import uvicorn
-# Import TTS
-try:
-    from TTS.api import TTS
-except ImportError:
-    raise ImportError("TTS library not found. Please install coqui-tts: pip install coqui-tts")
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
 # Initialize FastAPI app
 app = FastAPI(
-    title="Text-to-Speech API",
-    description="Production-ready TTS API using Coqui TTS",
     version="1.0.0"
 )
-# Global TTS model variable
-tts_model = None
 # Request models
 class TTSRequest(BaseModel):
     text: str
-    language: Optional[str] = "en"
 @app.on_event("startup")
 async def startup_event():
     """
-    Load the TTS model once at startup to avoid loading it on every request.
-    Using the highest-quality open-source multilingual model.
     """
-    global tts_model
     try:
-        logger.info("Loading TTS model...")
-        # Using the high-quality multilingual model that works on CPU
-        model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
-        tts_model = TTS(model_name=model_name, progress_bar=False)
-        # Ensure we're using CPU (important for Hugging Face Spaces)
-        if hasattr(tts_model, 'to'):
-            tts_model.to("cpu")
-        logger.info("TTS model loaded successfully!")
     except Exception as e:
-        logger.error(f"Failed to load TTS model: {str(e)}")
         raise e
 @app.get("/")
 async def root():
     """Health check endpoint"""
     return {
         "status": "healthy",
-        "message": "Text-to-Speech API is running",
-        "model": "tts_models/multilingual/multi-dataset/xtts_v2"
     }
 @app.get("/tts")
-async def tts_get(text: str, language: str = "en"):
     """
     Simple GET endpoint for TTS
-    Usage: GET /tts?text=Hello%20world&language=en
     """
     if not text or len(text.strip()) == 0:
         raise HTTPException(status_code=400, detail="Text parameter is required")
-    return await generate_speech(text, language)
 @app.post("/tts")
 async def tts_post(
     request: TTSRequest = None,
     text: str = Form(None),
-    language: str = Form("en"),
-    speaker_wav: UploadFile = File(None)
 ):
     """
-    POST endpoint for TTS with optional voice cloning
-    Accepts JSON body or form data with optional speaker WAV file
     """
     # Handle different input formats
     if request:
         input_text = request.text
-        input_language = request.language
     elif text:
         input_text = text
-        input_language = language
     else:
         raise HTTPException(status_code=400, detail="Text is required")
     if not input_text or len(input_text.strip()) == 0:
         raise HTTPException(status_code=400, detail="Text cannot be empty")
-    # Handle speaker WAV file if provided
-    speaker_wav_path = None
-    if speaker_wav:
-        try:
-            # Save uploaded speaker file temporarily
-            speaker_suffix = Path(speaker_wav.filename).suffix if speaker_wav.filename else ".wav"
-            with tempfile.NamedTemporaryFile(delete=False, suffix=speaker_suffix) as tmp_speaker:
-                content = await speaker_wav.read()
-                tmp_speaker.write(content)
-                speaker_wav_path = tmp_speaker.name
-        except Exception as e:
-            logger.error(f"Error processing speaker WAV file: {str(e)}")
-            raise HTTPException(status_code=400, detail="Invalid speaker WAV file")
-    try:
-        return await generate_speech(input_text, input_language, speaker_wav_path)
-    finally:
-        # Clean up speaker file
-        if speaker_wav_path and os.path.exists(speaker_wav_path):
-            try:
-                os.unlink(speaker_wav_path)
-            except:
-                pass
-async def generate_speech(text: str, language: str = "en", speaker_wav_path: str = None):
     """
-    Generate speech from text using the loaded TTS model
     """
-    if not tts_model:
-        raise HTTPException(status_code=503, detail="TTS model not loaded")
     try:
         # Create temporary file for output
         with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
             output_path = tmp_file.name
-        logger.info(f"Generating speech for text: '{text[:50]}...' in language: {language}")
-        # Generate speech
-        if speaker_wav_path and os.path.exists(speaker_wav_path):
-            # Voice cloning with speaker reference
-            logger.info("Using voice cloning with speaker reference")
-            tts_model.tts_to_file(
-                text=text,
-                file_path=output_path,
-                speaker_wav=speaker_wav_path,
-                language=language
-            )
-        else:
-            # Standard TTS without voice cloning
-            tts_model.tts_to_file(
-                text=text,
-                file_path=output_path,
-                language=language
-            )
         # Verify the file was created and has content
         if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
@@ -183,6 +248,16 @@ async def generate_speech(text: str, language: str = "en", speaker_wav_path: str
             }
         )
     except Exception as e:
         logger.error(f"Error generating speech: {str(e)}")
         # Clean up output file on error
@@ -197,10 +272,21 @@ async def generate_speech(text: str, language: str = "en", speaker_wav_path: str
 @app.get("/health")
 async def health_check():
     """Detailed health check endpoint"""
     return {
-        "status": "healthy",
-        "model_loaded": tts_model is not None,
-        "model_name": "tts_models/multilingual/multi-dataset/xtts_v2"
     }

 """
+Text-to-Speech API using Piper TTS
 Production-ready FastAPI application for Hugging Face Spaces
 """
 import os
 import tempfile
 import logging
+import subprocess
 from pathlib import Path
 from typing import Optional
+import shutil
 from fastapi import FastAPI, HTTPException, UploadFile, File, Form
 from fastapi.responses import FileResponse
 from pydantic import BaseModel
 import uvicorn
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
 # Initialize FastAPI app
 app = FastAPI(
+    title="Text-to-Speech API with Piper",
+    description="Production-ready TTS API using Piper TTS",
     version="1.0.0"
 )
+# Available Piper voices
+AVAILABLE_VOICES = {
+    "en-us-amy-low": "English (US) - Amy (Low Quality, Fast)",
+    "en-us-amy-medium": "English (US) - Amy (Medium Quality)",
+    "en-us-ryan-low": "English (US) - Ryan (Low Quality, Fast)",
+    "en-us-ryan-medium": "English (US) - Ryan (Medium Quality)",
+    "en-gb-alan-low": "English (GB) - Alan (Low Quality, Fast)",
+    "en-gb-alan-medium": "English (GB) - Alan (Medium Quality)",
+    "de-de-thorsten-low": "German - Thorsten (Low Quality, Fast)",
+    "de-de-thorsten-medium": "German - Thorsten (Medium Quality)",
+    "es-es-marta-low": "Spanish - Marta (Low Quality, Fast)",
+    "es-es-marta-medium": "Spanish - Marta (Medium Quality)",
+    "fr-fr-siwis-low": "French - Siwis (Low Quality, Fast)",
+    "fr-fr-siwis-medium": "French - Siwis (Medium Quality)",
+}
+# Default voice
+DEFAULT_VOICE = "en-us-amy-low"
 # Request models
 class TTSRequest(BaseModel):
     text: str
+    voice: Optional[str] = DEFAULT_VOICE
+    speed: Optional[float] = 1.0
 @app.on_event("startup")
 async def startup_event():
     """
+    Initialize Piper TTS - download default model if needed
     """
     try:
+        logger.info("Initializing Piper TTS...")
+        # Check if piper is available
+        result = subprocess.run(["piper", "--help"], capture_output=True, text=True)
+        if result.returncode == 0:
+            logger.info("Piper TTS is available!")
+        else:
+            logger.error("Piper TTS not found in PATH")
+        # Create models directory
+        models_dir = Path("./piper_models")
+        models_dir.mkdir(exist_ok=True)
+        # Download default voice model if not exists
+        await download_voice_model(DEFAULT_VOICE)
+        logger.info("Piper TTS initialized successfully!")
     except Exception as e:
+        logger.error(f"Failed to initialize Piper TTS: {str(e)}")
         raise e
+async def download_voice_model(voice: str):
+    """Download Piper voice model if not already present"""
+    models_dir = Path("./piper_models")
+    model_file = models_dir / f"{voice}.onnx"
+    config_file = models_dir / f"{voice}.onnx.json"
+    if model_file.exists() and config_file.exists():
+        logger.info(f"Voice model {voice} already exists")
+        return
+    logger.info(f"Downloading voice model: {voice}")
+    # Piper model URLs (using official repository)
+    base_url = "https://github.com/rhasspy/piper/releases/download/2023.11.14-2"
+    try:
+        # Download model file
+        model_url = f"{base_url}/{voice}.onnx"
+        subprocess.run([
+            "wget", "-q", "-O", str(model_file), model_url
+        ], check=True)
+        # Download config file
+        config_url = f"{base_url}/{voice}.onnx.json"
+        subprocess.run([
+            "wget", "-q", "-O", str(config_file), config_url
+        ], check=True)
+        logger.info(f"Downloaded voice model: {voice}")
+    except subprocess.CalledProcessError as e:
+        logger.error(f"Failed to download voice model {voice}: {e}")
+        # Clean up partial downloads
+        model_file.unlink(missing_ok=True)
+        config_file.unlink(missing_ok=True)
+        raise HTTPException(status_code=500, detail=f"Failed to download voice model: {voice}")
 @app.get("/")
 async def root():
     """Health check endpoint"""
     return {
         "status": "healthy",
+        "message": "Text-to-Speech API with Piper is running",
+        "engine": "Piper TTS",
+        "available_voices": list(AVAILABLE_VOICES.keys()),
+        "default_voice": DEFAULT_VOICE
+    }
+@app.get("/voices")
+async def get_voices():
+    """Get available voices"""
+    return {
+        "voices": AVAILABLE_VOICES,
+        "default": DEFAULT_VOICE
     }
 @app.get("/tts")
+async def tts_get(
+    text: str,
+    voice: str = DEFAULT_VOICE,
+    speed: float = 1.0
+):
     """
     Simple GET endpoint for TTS
+    Usage: GET /tts?text=Hello%20world&voice=en-us-amy-low&speed=1.0
     """
     if not text or len(text.strip()) == 0:
         raise HTTPException(status_code=400, detail="Text parameter is required")
+    if voice not in AVAILABLE_VOICES:
+        raise HTTPException(status_code=400, detail=f"Voice '{voice}' not available. Use /voices to see available options.")
+    return await generate_speech(text, voice, speed)
 @app.post("/tts")
 async def tts_post(
     request: TTSRequest = None,
     text: str = Form(None),
+    voice: str = Form(DEFAULT_VOICE),
+    speed: float = Form(1.0)
 ):
     """
+    POST endpoint for TTS
+    Accepts JSON body or form data
     """
     # Handle different input formats
     if request:
         input_text = request.text
+        input_voice = request.voice or DEFAULT_VOICE
+        input_speed = request.speed or 1.0
     elif text:
         input_text = text
+        input_voice = voice
+        input_speed = speed
     else:
         raise HTTPException(status_code=400, detail="Text is required")
     if not input_text or len(input_text.strip()) == 0:
         raise HTTPException(status_code=400, detail="Text cannot be empty")
+    if input_voice not in AVAILABLE_VOICES:
+        raise HTTPException(status_code=400, detail=f"Voice '{input_voice}' not available. Use /voices to see available options.")
+    return await generate_speech(input_text, input_voice, input_speed)
+async def generate_speech(text: str, voice: str = DEFAULT_VOICE, speed: float = 1.0):
     """
+    Generate speech from text using Piper TTS
     """
     try:
+        # Ensure voice model is available
+        await download_voice_model(voice)
         # Create temporary file for output
         with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
             output_path = tmp_file.name
+        logger.info(f"Generating speech for text: '{text[:50]}...' with voice: {voice}")
+        # Prepare piper command
+        models_dir = Path("./piper_models")
+        model_file = models_dir / f"{voice}.onnx"
+        # Build piper command
+        cmd = [
+            "piper",
+            "--model", str(model_file),
+            "--output_file", output_path,
+        ]
+        # Add length scale for speed control (inverse of speed)
+        if speed != 1.0:
+            length_scale = 1.0 / speed
+            cmd.extend(["--length_scale", str(length_scale)])
+        # Run piper with text input
+        process = subprocess.run(
+            cmd,
+            input=text,
+            text=True,
+            capture_output=True,
+            check=True
+        )
         # Verify the file was created and has content
         if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
             }
         )
+    except subprocess.CalledProcessError as e:
+        logger.error(f"Piper command failed: {e.stderr}")
+        # Clean up output file on error
+        if 'output_path' in locals() and os.path.exists(output_path):
+            try:
+                os.unlink(output_path)
+            except:
+                pass
+        raise HTTPException(status_code=500, detail=f"TTS generation failed: {e.stderr}")
     except Exception as e:
         logger.error(f"Error generating speech: {str(e)}")
         # Clean up output file on error
 @app.get("/health")
 async def health_check():
     """Detailed health check endpoint"""
+    try:
+        # Check if piper is available
+        result = subprocess.run(["piper", "--version"], capture_output=True, text=True)
+        piper_available = result.returncode == 0
+        piper_version = result.stdout.strip() if piper_available else "Not available"
+    except:
+        piper_available = False
+        piper_version = "Not available"
     return {
+        "status": "healthy" if piper_available else "degraded",
+        "piper_available": piper_available,
+        "piper_version": piper_version,
+        "engine": "Piper TTS",
+        "available_voices": len(AVAILABLE_VOICES)
     }

requirements.txt CHANGED Viewed

@@ -5,11 +5,5 @@ uvicorn[standard]==0.24.0
 # File handling and HTTP
 python-multipart==0.0.6
-# Audio processing dependencies
-numpy>=1.21.0
-scipy>=1.7.0
-librosa>=0.9.0
-soundfile>=0.12.0
 # Essential utilities
 pydantic>=2.0.0

 # File handling and HTTP
 python-multipart==0.0.6
 # Essential utilities
 pydantic>=2.0.0

test_api.py CHANGED Viewed

@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Simple test script for the Text-to-Speech API
 Run this to test the API locally
 """
@@ -11,9 +11,15 @@ import os
 # Configuration
 API_BASE_URL = "http://localhost:7860"
 TEST_TEXTS = [
-    "Hello world, this is a test of the text to speech API.",
     "The quick brown fox jumps over the lazy dog.",
-    "Welcome to our production-ready TTS service!"
 ]
 def test_health_check():
@@ -24,12 +30,23 @@ def test_health_check():
         # Test root endpoint
         response = requests.get(f"{API_BASE_URL}/")
         print(f"GET / - Status: {response.status_code}")
-        print(f"Response: {response.json()}")
         # Test health endpoint
         response = requests.get(f"{API_BASE_URL}/health")
         print(f"GET /health - Status: {response.status_code}")
-        print(f"Response: {response.json()}")
     except requests.exceptions.ConnectionError:
         print("❌ Could not connect to the API. Make sure it's running on localhost:7860")
@@ -43,17 +60,19 @@ def test_get_endpoint():
     for i, text in enumerate(TEST_TEXTS):
         try:
             params = {
                 "text": text,
-                "language": "en"
             }
-            print(f"Testing text {i+1}: '{text[:30]}...'")
             response = requests.get(f"{API_BASE_URL}/tts", params=params)
             if response.status_code == 200:
                 # Save the audio file
-                filename = f"test_output_get_{i+1}.wav"
                 with open(filename, "wb") as f:
                     f.write(response.content)
                 print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
@@ -69,17 +88,19 @@ def test_post_endpoint():
     for i, text in enumerate(TEST_TEXTS):
         try:
             data = {
                 "text": text,
-                "language": "en"
             }
-            print(f"Testing text {i+1}: '{text[:30]}...'")
             response = requests.post(f"{API_BASE_URL}/tts", json=data)
             if response.status_code == 200:
                 # Save the audio file
-                filename = f"test_output_post_{i+1}.wav"
                 with open(filename, "wb") as f:
                     f.write(response.content)
                 print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
@@ -95,14 +116,15 @@ def test_form_endpoint():
     try:
         data = {
-            "text": "This is a test using form data submission.",
-            "language": "en"
         }
         response = requests.post(f"{API_BASE_URL}/tts", data=data)
         if response.status_code == 200:
-            filename = "test_output_form.wav"
             with open(filename, "wb") as f:
                 f.write(response.content)
             print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
@@ -112,11 +134,36 @@ def test_form_endpoint():
     except Exception as e:
         print(f"❌ Exception: {str(e)}")
 def cleanup_test_files():
     """Clean up generated test files"""
     print("\n🧹 Cleaning up test files...")
-    test_files = [f for f in os.listdir(".") if f.startswith("test_output_") and f.endswith(".wav")]
     for file in test_files:
         try:
@@ -126,7 +173,7 @@ def cleanup_test_files():
             print(f"Could not remove {file}: {str(e)}")
 if __name__ == "__main__":
-    print("🚀 Starting TTS API Test Suite")
     print("=" * 50)
     # Test health check first
@@ -134,17 +181,24 @@ if __name__ == "__main__":
         print("\n❌ Health check failed. Exiting.")
         exit(1)
-    # Wait a moment for the model to be ready
-    print("\n⏳ Waiting for model to be ready...")
     time.sleep(2)
     # Run tests
     test_get_endpoint()
     test_post_endpoint()
     test_form_endpoint()
     print("\n" + "=" * 50)
     print("✅ Test suite completed!")
     print("\nTo clean up test files, run:")
     print("python test_api.py --cleanup")

 #!/usr/bin/env python3
 """
+Simple test script for the Piper TTS API
 Run this to test the API locally
 """
 # Configuration
 API_BASE_URL = "http://localhost:7860"
 TEST_TEXTS = [
+    "Hello world, this is a test of the Piper text to speech API.",
     "The quick brown fox jumps over the lazy dog.",
+    "Welcome to our production-ready TTS service using Piper!"
+]
+VOICES_TO_TEST = [
+    "en-us-amy-low",
+    "en-us-ryan-low",
+    "en-gb-alan-low"
 ]
 def test_health_check():
         # Test root endpoint
         response = requests.get(f"{API_BASE_URL}/")
         print(f"GET / - Status: {response.status_code}")
+        if response.status_code == 200:
+            data = response.json()
+            print(f"Available voices: {len(data.get('available_voices', []))}")
         # Test health endpoint
         response = requests.get(f"{API_BASE_URL}/health")
         print(f"GET /health - Status: {response.status_code}")
+        if response.status_code == 200:
+            data = response.json()
+            print(f"Piper available: {data.get('piper_available')}")
+        # Test voices endpoint
+        response = requests.get(f"{API_BASE_URL}/voices")
+        print(f"GET /voices - Status: {response.status_code}")
+        if response.status_code == 200:
+            data = response.json()
+            print(f"Total voices available: {len(data.get('voices', {}))}")
     except requests.exceptions.ConnectionError:
         print("❌ Could not connect to the API. Make sure it's running on localhost:7860")
     for i, text in enumerate(TEST_TEXTS):
         try:
+            voice = VOICES_TO_TEST[i % len(VOICES_TO_TEST)]
             params = {
                 "text": text,
+                "voice": voice,
+                "speed": 1.0
             }
+            print(f"Testing text {i+1}: '{text[:30]}...' with voice '{voice}'")
             response = requests.get(f"{API_BASE_URL}/tts", params=params)
             if response.status_code == 200:
                 # Save the audio file
+                filename = f"test_output_get_{i+1}_{voice}.wav"
                 with open(filename, "wb") as f:
                     f.write(response.content)
                 print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
     for i, text in enumerate(TEST_TEXTS):
         try:
+            voice = VOICES_TO_TEST[i % len(VOICES_TO_TEST)]
             data = {
                 "text": text,
+                "voice": voice,
+                "speed": 1.2 if i % 2 else 0.9  # Test different speeds
             }
+            print(f"Testing text {i+1}: '{text[:30]}...' with voice '{voice}' at speed {data['speed']}")
             response = requests.post(f"{API_BASE_URL}/tts", json=data)
             if response.status_code == 200:
                 # Save the audio file
+                filename = f"test_output_post_{i+1}_{voice}_speed{data['speed']}.wav"
                 with open(filename, "wb") as f:
                     f.write(response.content)
                 print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
     try:
         data = {
+            "text": "This is a test using form data submission with Piper TTS.",
+            "voice": "en-us-amy-medium",
+            "speed": "0.8"
         }
         response = requests.post(f"{API_BASE_URL}/tts", data=data)
         if response.status_code == 200:
+            filename = "test_output_form_piper.wav"
             with open(filename, "wb") as f:
                 f.write(response.content)
             print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
     except Exception as e:
         print(f"❌ Exception: {str(e)}")
+def test_voice_variations():
+    """Test different voice qualities"""
+    print("\n🗣️ Testing voice quality variations...")
+    test_text = "This is a comparison of voice quality between low and medium quality models."
+    voices_to_compare = ["en-us-amy-low", "en-us-amy-medium"]
+    for voice in voices_to_compare:
+        try:
+            params = {"text": test_text, "voice": voice}
+            print(f"Testing voice: {voice}")
+            response = requests.get(f"{API_BASE_URL}/tts", params=params)
+            if response.status_code == 200:
+                filename = f"test_voice_comparison_{voice}.wav"
+                with open(filename, "wb") as f:
+                    f.write(response.content)
+                print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
+            else:
+                print(f"❌ Error: {response.status_code} - {response.text}")
+        except Exception as e:
+            print(f"❌ Exception: {str(e)}")
 def cleanup_test_files():
     """Clean up generated test files"""
     print("\n🧹 Cleaning up test files...")
+    test_files = [f for f in os.listdir(".") if f.startswith("test_") and f.endswith(".wav")]
     for file in test_files:
         try:
             print(f"Could not remove {file}: {str(e)}")
 if __name__ == "__main__":
+    print("🚀 Starting Piper TTS API Test Suite")
     print("=" * 50)
     # Test health check first
         print("\n❌ Health check failed. Exiting.")
         exit(1)
+    # Wait a moment for Piper to be ready
+    print("\n⏳ Waiting for Piper TTS to be ready...")
     time.sleep(2)
     # Run tests
     test_get_endpoint()
     test_post_endpoint()
     test_form_endpoint()
+    test_voice_variations()
     print("\n" + "=" * 50)
     print("✅ Test suite completed!")
+    print("\nGenerated files demonstrate:")
+    print("- Different voices (amy, ryan, alan)")
+    print("- Quality variations (low vs medium)")
+    print("- Speed variations (0.8x to 1.2x)")
+    print("- Various input methods (GET, POST JSON, POST form)")
     print("\nTo clean up test files, run:")
     print("python test_api.py --cleanup")