Spaces:

samarthnaikk
/

ttlm

Sleeping

Samarth Naik commited on Jan 1

Commit

3135113

1 Parent(s): a66eb6f

feat: Add production-ready Text-to-Speech API with Coqui TTS

- Implement FastAPI-based TTS API with xtts_v2 model
- Add support for voice cloning with speaker WAV files
- Include comprehensive error handling and logging
- Add GET and POST endpoints for flexible usage
- Configure for CPU-only inference on Hugging Face Spaces
- Add test suite and documentation
- Update HF Space config from Gradio to FastAPI

Files changed (4) hide show

README.md +104 -7
app.py +209 -0
requirements.txt +27 -0
test_api.py +154 -0

README.md CHANGED Viewed

@@ -1,12 +1,109 @@
 ---
-title: Ttlm
-emoji: 🐢
-colorFrom: green
-colorTo: red
-sdk: gradio
-sdk_version: 6.2.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Text-to-Speech API
+emoji: 🗣️
+colorFrom: blue
+colorTo: purple
+sdk: fastapi
 app_file: app.py
 pinned: false
 ---
+# Text-to-Speech API with Coqui TTS
+A production-ready Text-to-Speech API built with FastAPI and Coqui TTS, designed to run on Hugging Face Spaces.
+## Features
+- **High-Quality TTS**: Uses Coqui's `xtts_v2` multilingual model
+- **Voice Cloning**: Optional speaker reference for voice cloning
+- **CPU Optimized**: Runs efficiently on CPU-only environments
+- **REST API**: Simple GET/POST endpoints
+- **Production Ready**: Proper error handling, logging, and health checks
+## API Usage
+### Simple GET Request
+```bash
+curl "https://your-space-url/tts?text=Hello%20world&language=en"
+```
+### POST with JSON
+```bash
+curl -X POST "https://your-space-url/tts" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello world", "language": "en"}'
+```
+### POST with Voice Cloning
+```bash
+curl -X POST "https://your-space-url/tts" \
+  -F "text=Hello world" \
+  -F "language=en" \
+  -F "speaker_wav=@path/to/speaker.wav"
+```
+## Endpoints
+- `GET /` - Health check
+- `GET /tts` - Simple text-to-speech conversion
+- `POST /tts` - Advanced TTS with optional voice cloning
+- `GET /health` - Detailed health status
+## Supported Languages
+The XTTS v2 model supports multiple languages including:
+- English (en)
+- Spanish (es)
+- French (fr)
+- German (de)
+- Italian (it)
+- Portuguese (pt)
+- Polish (pl)
+- Turkish (tr)
+- Russian (ru)
+- Dutch (nl)
+- Czech (cs)
+- Arabic (ar)
+- Chinese (zh-cn)
+- Japanese (ja)
+- Hungarian (hu)
+- Korean (ko)
+## Response
+All endpoints return a WAV audio file that can be played directly in browsers or audio players.
+## Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run the application
+python app.py
+```
+The API will be available at `http://localhost:7860`
+## Model Information
+This application uses the `tts_models/multilingual/multi-dataset/xtts_v2` model from Coqui TTS, which provides:
+- High-quality multilingual speech synthesis
+- Voice cloning capabilities
+- CPU-friendly inference
+- Support for 16+ languages
+## Error Handling
+The API includes comprehensive error handling for:
+- Invalid text input
+- Unsupported file formats
+- Model loading failures
+- Audio generation errors
+## Performance Notes
+- Model loads once at startup (not per request)
+- Optimized for CPU inference
+- Temporary files are automatically cleaned up
+- Response streaming for large audio files

app.py ADDED Viewed

	@@ -0,0 +1,209 @@

+"""
+Text-to-Speech API using Coqui TTS
+Production-ready FastAPI application for Hugging Face Spaces
+"""
+import os
+import tempfile
+import logging
+from pathlib import Path
+from typing import Optional
+from fastapi import FastAPI, HTTPException, UploadFile, File, Form
+from fastapi.responses import FileResponse
+from pydantic import BaseModel
+import uvicorn
+# Import TTS
+try:
+    from TTS.api import TTS
+except ImportError:
+    raise ImportError("TTS library not found. Please install coqui-tts: pip install coqui-tts")
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Initialize FastAPI app
+app = FastAPI(
+    title="Text-to-Speech API",
+    description="Production-ready TTS API using Coqui TTS",
+    version="1.0.0"
+)
+# Global TTS model variable
+tts_model = None
+# Request models
+class TTSRequest(BaseModel):
+    text: str
+    language: Optional[str] = "en"
+@app.on_event("startup")
+async def startup_event():
+    """
+    Load the TTS model once at startup to avoid loading it on every request.
+    Using the highest-quality open-source multilingual model.
+    """
+    global tts_model
+    try:
+        logger.info("Loading TTS model...")
+        # Using the high-quality multilingual model that works on CPU
+        model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
+        tts_model = TTS(model_name=model_name, progress_bar=False)
+        # Ensure we're using CPU (important for Hugging Face Spaces)
+        if hasattr(tts_model, 'to'):
+            tts_model.to("cpu")
+        logger.info("TTS model loaded successfully!")
+    except Exception as e:
+        logger.error(f"Failed to load TTS model: {str(e)}")
+        raise e
+@app.get("/")
+async def root():
+    """Health check endpoint"""
+    return {
+        "status": "healthy",
+        "message": "Text-to-Speech API is running",
+        "model": "tts_models/multilingual/multi-dataset/xtts_v2"
+    }
+@app.get("/tts")
+async def tts_get(text: str, language: str = "en"):
+    """
+    Simple GET endpoint for TTS
+    Usage: GET /tts?text=Hello%20world&language=en
+    """
+    if not text or len(text.strip()) == 0:
+        raise HTTPException(status_code=400, detail="Text parameter is required")
+    return await generate_speech(text, language)
+@app.post("/tts")
+async def tts_post(
+    request: TTSRequest = None,
+    text: str = Form(None),
+    language: str = Form("en"),
+    speaker_wav: UploadFile = File(None)
+):
+    """
+    POST endpoint for TTS with optional voice cloning
+    Accepts JSON body or form data with optional speaker WAV file
+    """
+    # Handle different input formats
+    if request:
+        input_text = request.text
+        input_language = request.language
+    elif text:
+        input_text = text
+        input_language = language
+    else:
+        raise HTTPException(status_code=400, detail="Text is required")
+    if not input_text or len(input_text.strip()) == 0:
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+    # Handle speaker WAV file if provided
+    speaker_wav_path = None
+    if speaker_wav:
+        try:
+            # Save uploaded speaker file temporarily
+            speaker_suffix = Path(speaker_wav.filename).suffix if speaker_wav.filename else ".wav"
+            with tempfile.NamedTemporaryFile(delete=False, suffix=speaker_suffix) as tmp_speaker:
+                content = await speaker_wav.read()
+                tmp_speaker.write(content)
+                speaker_wav_path = tmp_speaker.name
+        except Exception as e:
+            logger.error(f"Error processing speaker WAV file: {str(e)}")
+            raise HTTPException(status_code=400, detail="Invalid speaker WAV file")
+    try:
+        return await generate_speech(input_text, input_language, speaker_wav_path)
+    finally:
+        # Clean up speaker file
+        if speaker_wav_path and os.path.exists(speaker_wav_path):
+            try:
+                os.unlink(speaker_wav_path)
+            except:
+                pass
+async def generate_speech(text: str, language: str = "en", speaker_wav_path: str = None):
+    """
+    Generate speech from text using the loaded TTS model
+    """
+    if not tts_model:
+        raise HTTPException(status_code=503, detail="TTS model not loaded")
+    try:
+        # Create temporary file for output
+        with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
+            output_path = tmp_file.name
+        logger.info(f"Generating speech for text: '{text[:50]}...' in language: {language}")
+        # Generate speech
+        if speaker_wav_path and os.path.exists(speaker_wav_path):
+            # Voice cloning with speaker reference
+            logger.info("Using voice cloning with speaker reference")
+            tts_model.tts_to_file(
+                text=text,
+                file_path=output_path,
+                speaker_wav=speaker_wav_path,
+                language=language
+            )
+        else:
+            # Standard TTS without voice cloning
+            tts_model.tts_to_file(
+                text=text,
+                file_path=output_path,
+                language=language
+            )
+        # Verify the file was created and has content
+        if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
+            raise Exception("Generated audio file is empty or was not created")
+        logger.info(f"Speech generated successfully, file size: {os.path.getsize(output_path)} bytes")
+        # Return the audio file
+        return FileResponse(
+            path=output_path,
+            media_type="audio/wav",
+            filename="generated_speech.wav",
+            headers={
+                "Content-Disposition": "attachment; filename=generated_speech.wav",
+                "Cache-Control": "no-cache"
+            }
+        )
+    except Exception as e:
+        logger.error(f"Error generating speech: {str(e)}")
+        # Clean up output file on error
+        if 'output_path' in locals() and os.path.exists(output_path):
+            try:
+                os.unlink(output_path)
+            except:
+                pass
+        raise HTTPException(status_code=500, detail=f"Failed to generate speech: {str(e)}")
+@app.get("/health")
+async def health_check():
+    """Detailed health check endpoint"""
+    return {
+        "status": "healthy",
+        "model_loaded": tts_model is not None,
+        "model_name": "tts_models/multilingual/multi-dataset/xtts_v2"
+    }
+if __name__ == "__main__":
+    # For local development
+    uvicorn.run(app, host="0.0.0.0", port=7860)

requirements.txt ADDED Viewed

	@@ -0,0 +1,27 @@

+# Core web framework
+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+# Text-to-Speech engine
+coqui-tts==0.21.1
+# File handling and HTTP
+python-multipart==0.0.6
+python-dateutil==2.8.2
+# Audio processing dependencies (required by Coqui TTS)
+numpy==1.24.3
+scipy==1.11.4
+librosa==0.10.1
+soundfile==0.12.1
+# Machine learning dependencies
+torch==2.0.1
+torchaudio==2.0.2
+# Hugging Face integration
+transformers==4.35.2
+# Utilities
+pydantic==2.5.0
+typing-extensions==4.8.0

test_api.py ADDED Viewed

	@@ -0,0 +1,154 @@

+#!/usr/bin/env python3
+"""
+Simple test script for the Text-to-Speech API
+Run this to test the API locally
+"""
+import requests
+import time
+import os
+# Configuration
+API_BASE_URL = "http://localhost:7860"
+TEST_TEXTS = [
+    "Hello world, this is a test of the text to speech API.",
+    "The quick brown fox jumps over the lazy dog.",
+    "Welcome to our production-ready TTS service!"
+]
+def test_health_check():
+    """Test the health check endpoints"""
+    print("🔍 Testing health check endpoints...")
+    try:
+        # Test root endpoint
+        response = requests.get(f"{API_BASE_URL}/")
+        print(f"GET / - Status: {response.status_code}")
+        print(f"Response: {response.json()}")
+        # Test health endpoint
+        response = requests.get(f"{API_BASE_URL}/health")
+        print(f"GET /health - Status: {response.status_code}")
+        print(f"Response: {response.json()}")
+    except requests.exceptions.ConnectionError:
+        print("❌ Could not connect to the API. Make sure it's running on localhost:7860")
+        return False
+    return True
+def test_get_endpoint():
+    """Test the GET TTS endpoint"""
+    print("\n🎤 Testing GET TTS endpoint...")
+    for i, text in enumerate(TEST_TEXTS):
+        try:
+            params = {
+                "text": text,
+                "language": "en"
+            }
+            print(f"Testing text {i+1}: '{text[:30]}...'")
+            response = requests.get(f"{API_BASE_URL}/tts", params=params)
+            if response.status_code == 200:
+                # Save the audio file
+                filename = f"test_output_get_{i+1}.wav"
+                with open(filename, "wb") as f:
+                    f.write(response.content)
+                print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
+            else:
+                print(f"❌ Error: {response.status_code} - {response.text}")
+        except Exception as e:
+            print(f"❌ Exception: {str(e)}")
+def test_post_endpoint():
+    """Test the POST TTS endpoint"""
+    print("\n🎵 Testing POST TTS endpoint...")
+    for i, text in enumerate(TEST_TEXTS):
+        try:
+            data = {
+                "text": text,
+                "language": "en"
+            }
+            print(f"Testing text {i+1}: '{text[:30]}...'")
+            response = requests.post(f"{API_BASE_URL}/tts", json=data)
+            if response.status_code == 200:
+                # Save the audio file
+                filename = f"test_output_post_{i+1}.wav"
+                with open(filename, "wb") as f:
+                    f.write(response.content)
+                print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
+            else:
+                print(f"❌ Error: {response.status_code} - {response.text}")
+        except Exception as e:
+            print(f"❌ Exception: {str(e)}")
+def test_form_endpoint():
+    """Test the POST TTS endpoint with form data"""
+    print("\n📋 Testing POST TTS endpoint with form data...")
+    try:
+        data = {
+            "text": "This is a test using form data submission.",
+            "language": "en"
+        }
+        response = requests.post(f"{API_BASE_URL}/tts", data=data)
+        if response.status_code == 200:
+            filename = "test_output_form.wav"
+            with open(filename, "wb") as f:
+                f.write(response.content)
+            print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
+        else:
+            print(f"❌ Error: {response.status_code} - {response.text}")
+    except Exception as e:
+        print(f"❌ Exception: {str(e)}")
+def cleanup_test_files():
+    """Clean up generated test files"""
+    print("\n🧹 Cleaning up test files...")
+    test_files = [f for f in os.listdir(".") if f.startswith("test_output_") and f.endswith(".wav")]
+    for file in test_files:
+        try:
+            os.remove(file)
+            print(f"Removed {file}")
+        except Exception as e:
+            print(f"Could not remove {file}: {str(e)}")
+if __name__ == "__main__":
+    print("🚀 Starting TTS API Test Suite")
+    print("=" * 50)
+    # Test health check first
+    if not test_health_check():
+        print("\n❌ Health check failed. Exiting.")
+        exit(1)
+    # Wait a moment for the model to be ready
+    print("\n⏳ Waiting for model to be ready...")
+    time.sleep(2)
+    # Run tests
+    test_get_endpoint()
+    test_post_endpoint()
+    test_form_endpoint()
+    print("\n" + "=" * 50)
+    print("✅ Test suite completed!")
+    print("\nTo clean up test files, run:")
+    print("python test_api.py --cleanup")
+    # Check if cleanup flag is provided
+    import sys
+    if "--cleanup" in sys.argv:
+        cleanup_test_files()