Spaces:
Sleeping
Sleeping
Samarth Naik
commited on
Commit
·
be42ab9
1
Parent(s):
9edbda4
feat: Switch from Coqui TTS to Piper TTS for better performance
Browse files- Replace heavy Coqui TTS with lightweight Piper TTS
- Add support for multiple voice models and quality levels
- Implement speed control for speech synthesis
- Dramatically reduce Docker image size and build time
- Add voice discovery endpoint (/voices)
- Automatic model downloading on first use
- Update all documentation and test scripts
- Optimize for fast CPU-only inference on HF Spaces
- Dockerfile +12 -12
- README.md +59 -48
- app.py +170 -84
- requirements.txt +0 -6
- test_api.py +72 -18
Dockerfile
CHANGED
|
@@ -3,30 +3,30 @@ FROM python:3.10-slim
|
|
| 3 |
# Set working directory
|
| 4 |
WORKDIR /app
|
| 5 |
|
| 6 |
-
# Install system dependencies required for
|
| 7 |
RUN apt-get update && apt-get install -y \
|
| 8 |
-
build-essential \
|
| 9 |
-
libsndfile1-dev \
|
| 10 |
-
ffmpeg \
|
| 11 |
-
git \
|
| 12 |
wget \
|
|
|
|
| 13 |
&& apt-get clean \
|
| 14 |
&& rm -rf /var/lib/apt/lists/*
|
| 15 |
|
| 16 |
-
#
|
| 17 |
-
RUN
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
# Copy requirements
|
| 20 |
COPY requirements.txt .
|
| 21 |
-
|
| 22 |
-
# Install Python dependencies with more robust approach
|
| 23 |
-
RUN pip install --no-cache-dir torch torchaudio --index-url https://download.pytorch.org/whl/cpu
|
| 24 |
-
RUN pip install --no-cache-dir TTS==0.22.0
|
| 25 |
RUN pip install --no-cache-dir -r requirements.txt
|
| 26 |
|
| 27 |
# Copy application code
|
| 28 |
COPY . .
|
| 29 |
|
|
|
|
|
|
|
|
|
|
| 30 |
# Expose port for Hugging Face Spaces
|
| 31 |
EXPOSE 7860
|
| 32 |
|
|
|
|
| 3 |
# Set working directory
|
| 4 |
WORKDIR /app
|
| 5 |
|
| 6 |
+
# Install system dependencies required for Piper TTS
|
| 7 |
RUN apt-get update && apt-get install -y \
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
wget \
|
| 9 |
+
curl \
|
| 10 |
&& apt-get clean \
|
| 11 |
&& rm -rf /var/lib/apt/lists/*
|
| 12 |
|
| 13 |
+
# Install Piper TTS binary
|
| 14 |
+
RUN wget -O piper.tar.gz "https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz" \
|
| 15 |
+
&& tar -xzf piper.tar.gz \
|
| 16 |
+
&& mv piper/piper /usr/local/bin/ \
|
| 17 |
+
&& chmod +x /usr/local/bin/piper \
|
| 18 |
+
&& rm -rf piper.tar.gz piper
|
| 19 |
|
| 20 |
+
# Copy requirements and install Python dependencies
|
| 21 |
COPY requirements.txt .
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
RUN pip install --no-cache-dir -r requirements.txt
|
| 23 |
|
| 24 |
# Copy application code
|
| 25 |
COPY . .
|
| 26 |
|
| 27 |
+
# Create models directory
|
| 28 |
+
RUN mkdir -p ./piper_models
|
| 29 |
+
|
| 30 |
# Expose port for Hugging Face Spaces
|
| 31 |
EXPOSE 7860
|
| 32 |
|
README.md
CHANGED
|
@@ -8,70 +8,73 @@ app_file: app.py
|
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# Text-to-Speech API with
|
| 12 |
|
| 13 |
-
A production-ready Text-to-Speech API built with FastAPI and
|
| 14 |
|
| 15 |
## Features
|
| 16 |
|
| 17 |
-
- **High-Quality TTS**: Uses
|
| 18 |
-
- **
|
| 19 |
-
- **
|
| 20 |
-
- **REST API**: Simple GET/POST endpoints
|
| 21 |
- **Production Ready**: Proper error handling, logging, and health checks
|
|
|
|
| 22 |
|
| 23 |
## API Usage
|
| 24 |
|
| 25 |
### Simple GET Request
|
| 26 |
```bash
|
| 27 |
-
curl "https://your-space-url/tts?text=Hello%20world&
|
| 28 |
```
|
| 29 |
|
| 30 |
### POST with JSON
|
| 31 |
```bash
|
| 32 |
curl -X POST "https://your-space-url/tts" \
|
| 33 |
-H "Content-Type: application/json" \
|
| 34 |
-
-d '{"text": "Hello world", "
|
| 35 |
```
|
| 36 |
|
| 37 |
-
### POST with
|
| 38 |
```bash
|
| 39 |
curl -X POST "https://your-space-url/tts" \
|
| 40 |
-F "text=Hello world" \
|
| 41 |
-
-F "
|
| 42 |
-
-F "
|
| 43 |
```
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
## Endpoints
|
| 46 |
|
| 47 |
-
- `GET /` - Health check
|
|
|
|
| 48 |
- `GET /tts` - Simple text-to-speech conversion
|
| 49 |
-
- `POST /tts` - Advanced TTS with
|
| 50 |
- `GET /health` - Detailed health status
|
| 51 |
|
| 52 |
-
##
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
-
|
| 56 |
-
-
|
| 57 |
-
- French (fr)
|
| 58 |
-
- German (de)
|
| 59 |
-
- Italian (it)
|
| 60 |
-
- Portuguese (pt)
|
| 61 |
-
- Polish (pl)
|
| 62 |
-
- Turkish (tr)
|
| 63 |
-
- Russian (ru)
|
| 64 |
-
- Dutch (nl)
|
| 65 |
-
- Czech (cs)
|
| 66 |
-
- Arabic (ar)
|
| 67 |
-
- Chinese (zh-cn)
|
| 68 |
-
- Japanese (ja)
|
| 69 |
-
- Hungarian (hu)
|
| 70 |
-
- Korean (ko)
|
| 71 |
|
| 72 |
## Response
|
| 73 |
|
| 74 |
-
All endpoints return a WAV audio file that can be played directly in browsers or audio players.
|
| 75 |
|
| 76 |
## Local Development
|
| 77 |
|
|
@@ -79,31 +82,39 @@ All endpoints return a WAV audio file that can be played directly in browsers or
|
|
| 79 |
# Install dependencies
|
| 80 |
pip install -r requirements.txt
|
| 81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
# Run the application
|
| 83 |
python app.py
|
| 84 |
```
|
| 85 |
|
| 86 |
The API will be available at `http://localhost:7860`
|
| 87 |
|
| 88 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
-
|
| 93 |
-
-
|
| 94 |
-
-
|
|
|
|
|
|
|
| 95 |
|
| 96 |
## Error Handling
|
| 97 |
|
| 98 |
The API includes comprehensive error handling for:
|
| 99 |
- Invalid text input
|
| 100 |
-
- Unsupported
|
| 101 |
-
- Model
|
| 102 |
- Audio generation errors
|
| 103 |
-
|
| 104 |
-
## Performance Notes
|
| 105 |
-
|
| 106 |
-
- Model loads once at startup (not per request)
|
| 107 |
-
- Optimized for CPU inference
|
| 108 |
-
- Temporary files are automatically cleaned up
|
| 109 |
-
- Response streaming for large audio files
|
|
|
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# Text-to-Speech API with Piper TTS
|
| 12 |
|
| 13 |
+
A production-ready Text-to-Speech API built with FastAPI and Piper TTS, designed to run on Hugging Face Spaces.
|
| 14 |
|
| 15 |
## Features
|
| 16 |
|
| 17 |
+
- **High-Quality TTS**: Uses Piper's neural TTS models
|
| 18 |
+
- **Multiple Voices**: Support for various languages and voice styles
|
| 19 |
+
- **Fast & Lightweight**: ONNX-based models for efficient CPU inference
|
|
|
|
| 20 |
- **Production Ready**: Proper error handling, logging, and health checks
|
| 21 |
+
- **Easy Deployment**: Optimized for containerized environments
|
| 22 |
|
| 23 |
## API Usage
|
| 24 |
|
| 25 |
### Simple GET Request
|
| 26 |
```bash
|
| 27 |
+
curl "https://your-space-url/tts?text=Hello%20world&voice=en-us-amy-low"
|
| 28 |
```
|
| 29 |
|
| 30 |
### POST with JSON
|
| 31 |
```bash
|
| 32 |
curl -X POST "https://your-space-url/tts" \
|
| 33 |
-H "Content-Type: application/json" \
|
| 34 |
+
-d '{"text": "Hello world", "voice": "en-us-amy-medium", "speed": 1.0}'
|
| 35 |
```
|
| 36 |
|
| 37 |
+
### POST with Form Data
|
| 38 |
```bash
|
| 39 |
curl -X POST "https://your-space-url/tts" \
|
| 40 |
-F "text=Hello world" \
|
| 41 |
+
-F "voice=en-us-ryan-low" \
|
| 42 |
+
-F "speed=1.2"
|
| 43 |
```
|
| 44 |
|
| 45 |
+
## Available Voices
|
| 46 |
+
|
| 47 |
+
Get the full list of available voices:
|
| 48 |
+
```bash
|
| 49 |
+
curl "https://your-space-url/voices"
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
### Supported Voices Include:
|
| 53 |
+
- **English (US)**: `en-us-amy-low`, `en-us-amy-medium`, `en-us-ryan-low`, `en-us-ryan-medium`
|
| 54 |
+
- **English (GB)**: `en-gb-alan-low`, `en-gb-alan-medium`
|
| 55 |
+
- **German**: `de-de-thorsten-low`, `de-de-thorsten-medium`
|
| 56 |
+
- **Spanish**: `es-es-marta-low`, `es-es-marta-medium`
|
| 57 |
+
- **French**: `fr-fr-siwis-low`, `fr-fr-siwis-medium`
|
| 58 |
+
|
| 59 |
+
*Note: `-low` voices are faster but lower quality, `-medium` voices have better quality but are slower.*
|
| 60 |
+
|
| 61 |
## Endpoints
|
| 62 |
|
| 63 |
+
- `GET /` - Health check and available voices
|
| 64 |
+
- `GET /voices` - List all available voices
|
| 65 |
- `GET /tts` - Simple text-to-speech conversion
|
| 66 |
+
- `POST /tts` - Advanced TTS with voice and speed control
|
| 67 |
- `GET /health` - Detailed health status
|
| 68 |
|
| 69 |
+
## Parameters
|
| 70 |
+
|
| 71 |
+
- **text** (required): Text to convert to speech
|
| 72 |
+
- **voice** (optional): Voice to use (default: `en-us-amy-low`)
|
| 73 |
+
- **speed** (optional): Speech speed multiplier (default: 1.0, range: 0.5-2.0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
## Response
|
| 76 |
|
| 77 |
+
All TTS endpoints return a WAV audio file that can be played directly in browsers or audio players.
|
| 78 |
|
| 79 |
## Local Development
|
| 80 |
|
|
|
|
| 82 |
# Install dependencies
|
| 83 |
pip install -r requirements.txt
|
| 84 |
|
| 85 |
+
# Install Piper TTS binary (Linux/macOS)
|
| 86 |
+
wget -O piper.tar.gz "https://github.com/rhasspy/piper/releases/download/2023.11.14-2/piper_linux_x86_64.tar.gz"
|
| 87 |
+
tar -xzf piper.tar.gz
|
| 88 |
+
sudo mv piper/piper /usr/local/bin/
|
| 89 |
+
chmod +x /usr/local/bin/piper
|
| 90 |
+
|
| 91 |
# Run the application
|
| 92 |
python app.py
|
| 93 |
```
|
| 94 |
|
| 95 |
The API will be available at `http://localhost:7860`
|
| 96 |
|
| 97 |
+
## About Piper TTS
|
| 98 |
+
|
| 99 |
+
This application uses [Piper TTS](https://github.com/rhasspy/piper) by Rhasspy, which provides:
|
| 100 |
+
- High-quality neural text-to-speech
|
| 101 |
+
- ONNX-based models for efficient CPU inference
|
| 102 |
+
- Multiple languages and voice styles
|
| 103 |
+
- Fast synthesis speeds
|
| 104 |
+
- Small model sizes perfect for deployment
|
| 105 |
|
| 106 |
+
## Performance Notes
|
| 107 |
+
|
| 108 |
+
- Models are downloaded automatically on first use
|
| 109 |
+
- Cached models for faster subsequent requests
|
| 110 |
+
- Optimized for CPU inference
|
| 111 |
+
- Temporary files are automatically cleaned up
|
| 112 |
+
- Average synthesis time: ~1-3 seconds for typical sentences
|
| 113 |
|
| 114 |
## Error Handling
|
| 115 |
|
| 116 |
The API includes comprehensive error handling for:
|
| 117 |
- Invalid text input
|
| 118 |
+
- Unsupported voice selection
|
| 119 |
+
- Model download failures
|
| 120 |
- Audio generation errors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
CHANGED
|
@@ -1,170 +1,235 @@
|
|
| 1 |
"""
|
| 2 |
-
Text-to-Speech API using
|
| 3 |
Production-ready FastAPI application for Hugging Face Spaces
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import tempfile
|
| 8 |
import logging
|
|
|
|
| 9 |
from pathlib import Path
|
| 10 |
from typing import Optional
|
|
|
|
| 11 |
|
| 12 |
from fastapi import FastAPI, HTTPException, UploadFile, File, Form
|
| 13 |
from fastapi.responses import FileResponse
|
| 14 |
from pydantic import BaseModel
|
| 15 |
import uvicorn
|
| 16 |
|
| 17 |
-
# Import TTS
|
| 18 |
-
try:
|
| 19 |
-
from TTS.api import TTS
|
| 20 |
-
except ImportError:
|
| 21 |
-
raise ImportError("TTS library not found. Please install coqui-tts: pip install coqui-tts")
|
| 22 |
-
|
| 23 |
# Configure logging
|
| 24 |
logging.basicConfig(level=logging.INFO)
|
| 25 |
logger = logging.getLogger(__name__)
|
| 26 |
|
| 27 |
# Initialize FastAPI app
|
| 28 |
app = FastAPI(
|
| 29 |
-
title="Text-to-Speech API",
|
| 30 |
-
description="Production-ready TTS API using
|
| 31 |
version="1.0.0"
|
| 32 |
)
|
| 33 |
|
| 34 |
-
#
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
# Request models
|
| 38 |
class TTSRequest(BaseModel):
|
| 39 |
text: str
|
| 40 |
-
|
|
|
|
| 41 |
|
| 42 |
|
| 43 |
@app.on_event("startup")
|
| 44 |
async def startup_event():
|
| 45 |
"""
|
| 46 |
-
|
| 47 |
-
Using the highest-quality open-source multilingual model.
|
| 48 |
"""
|
| 49 |
-
global tts_model
|
| 50 |
try:
|
| 51 |
-
logger.info("
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
-
if hasattr(tts_model, 'to'):
|
| 58 |
-
tts_model.to("cpu")
|
| 59 |
|
| 60 |
-
logger.info("TTS model loaded successfully!")
|
| 61 |
except Exception as e:
|
| 62 |
-
logger.error(f"Failed to
|
| 63 |
raise e
|
| 64 |
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
@app.get("/")
|
| 67 |
async def root():
|
| 68 |
"""Health check endpoint"""
|
| 69 |
return {
|
| 70 |
"status": "healthy",
|
| 71 |
-
"message": "Text-to-Speech API is running",
|
| 72 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
}
|
| 74 |
|
| 75 |
|
| 76 |
@app.get("/tts")
|
| 77 |
-
async def tts_get(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
"""
|
| 79 |
Simple GET endpoint for TTS
|
| 80 |
-
Usage: GET /tts?text=Hello%20world&
|
| 81 |
"""
|
| 82 |
if not text or len(text.strip()) == 0:
|
| 83 |
raise HTTPException(status_code=400, detail="Text parameter is required")
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
|
| 88 |
@app.post("/tts")
|
| 89 |
async def tts_post(
|
| 90 |
request: TTSRequest = None,
|
| 91 |
text: str = Form(None),
|
| 92 |
-
|
| 93 |
-
|
| 94 |
):
|
| 95 |
"""
|
| 96 |
-
POST endpoint for TTS
|
| 97 |
-
Accepts JSON body or form data
|
| 98 |
"""
|
| 99 |
# Handle different input formats
|
| 100 |
if request:
|
| 101 |
input_text = request.text
|
| 102 |
-
|
|
|
|
| 103 |
elif text:
|
| 104 |
input_text = text
|
| 105 |
-
|
|
|
|
| 106 |
else:
|
| 107 |
raise HTTPException(status_code=400, detail="Text is required")
|
| 108 |
|
| 109 |
if not input_text or len(input_text.strip()) == 0:
|
| 110 |
raise HTTPException(status_code=400, detail="Text cannot be empty")
|
| 111 |
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
if speaker_wav:
|
| 115 |
-
try:
|
| 116 |
-
# Save uploaded speaker file temporarily
|
| 117 |
-
speaker_suffix = Path(speaker_wav.filename).suffix if speaker_wav.filename else ".wav"
|
| 118 |
-
with tempfile.NamedTemporaryFile(delete=False, suffix=speaker_suffix) as tmp_speaker:
|
| 119 |
-
content = await speaker_wav.read()
|
| 120 |
-
tmp_speaker.write(content)
|
| 121 |
-
speaker_wav_path = tmp_speaker.name
|
| 122 |
-
except Exception as e:
|
| 123 |
-
logger.error(f"Error processing speaker WAV file: {str(e)}")
|
| 124 |
-
raise HTTPException(status_code=400, detail="Invalid speaker WAV file")
|
| 125 |
|
| 126 |
-
|
| 127 |
-
return await generate_speech(input_text, input_language, speaker_wav_path)
|
| 128 |
-
finally:
|
| 129 |
-
# Clean up speaker file
|
| 130 |
-
if speaker_wav_path and os.path.exists(speaker_wav_path):
|
| 131 |
-
try:
|
| 132 |
-
os.unlink(speaker_wav_path)
|
| 133 |
-
except:
|
| 134 |
-
pass
|
| 135 |
|
| 136 |
|
| 137 |
-
async def generate_speech(text: str,
|
| 138 |
"""
|
| 139 |
-
Generate speech from text using
|
| 140 |
"""
|
| 141 |
-
if not tts_model:
|
| 142 |
-
raise HTTPException(status_code=503, detail="TTS model not loaded")
|
| 143 |
-
|
| 144 |
try:
|
|
|
|
|
|
|
|
|
|
| 145 |
# Create temporary file for output
|
| 146 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
|
| 147 |
output_path = tmp_file.name
|
| 148 |
|
| 149 |
-
logger.info(f"Generating speech for text: '{text[:50]}...'
|
| 150 |
-
|
| 151 |
-
#
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
# Verify the file was created and has content
|
| 170 |
if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
|
|
@@ -183,6 +248,16 @@ async def generate_speech(text: str, language: str = "en", speaker_wav_path: str
|
|
| 183 |
}
|
| 184 |
)
|
| 185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
except Exception as e:
|
| 187 |
logger.error(f"Error generating speech: {str(e)}")
|
| 188 |
# Clean up output file on error
|
|
@@ -197,10 +272,21 @@ async def generate_speech(text: str, language: str = "en", speaker_wav_path: str
|
|
| 197 |
@app.get("/health")
|
| 198 |
async def health_check():
|
| 199 |
"""Detailed health check endpoint"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
return {
|
| 201 |
-
"status": "healthy",
|
| 202 |
-
"
|
| 203 |
-
"
|
|
|
|
|
|
|
| 204 |
}
|
| 205 |
|
| 206 |
|
|
|
|
| 1 |
"""
|
| 2 |
+
Text-to-Speech API using Piper TTS
|
| 3 |
Production-ready FastAPI application for Hugging Face Spaces
|
| 4 |
"""
|
| 5 |
|
| 6 |
import os
|
| 7 |
import tempfile
|
| 8 |
import logging
|
| 9 |
+
import subprocess
|
| 10 |
from pathlib import Path
|
| 11 |
from typing import Optional
|
| 12 |
+
import shutil
|
| 13 |
|
| 14 |
from fastapi import FastAPI, HTTPException, UploadFile, File, Form
|
| 15 |
from fastapi.responses import FileResponse
|
| 16 |
from pydantic import BaseModel
|
| 17 |
import uvicorn
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
# Configure logging
|
| 20 |
logging.basicConfig(level=logging.INFO)
|
| 21 |
logger = logging.getLogger(__name__)
|
| 22 |
|
| 23 |
# Initialize FastAPI app
|
| 24 |
app = FastAPI(
|
| 25 |
+
title="Text-to-Speech API with Piper",
|
| 26 |
+
description="Production-ready TTS API using Piper TTS",
|
| 27 |
version="1.0.0"
|
| 28 |
)
|
| 29 |
|
| 30 |
+
# Available Piper voices
|
| 31 |
+
AVAILABLE_VOICES = {
|
| 32 |
+
"en-us-amy-low": "English (US) - Amy (Low Quality, Fast)",
|
| 33 |
+
"en-us-amy-medium": "English (US) - Amy (Medium Quality)",
|
| 34 |
+
"en-us-ryan-low": "English (US) - Ryan (Low Quality, Fast)",
|
| 35 |
+
"en-us-ryan-medium": "English (US) - Ryan (Medium Quality)",
|
| 36 |
+
"en-gb-alan-low": "English (GB) - Alan (Low Quality, Fast)",
|
| 37 |
+
"en-gb-alan-medium": "English (GB) - Alan (Medium Quality)",
|
| 38 |
+
"de-de-thorsten-low": "German - Thorsten (Low Quality, Fast)",
|
| 39 |
+
"de-de-thorsten-medium": "German - Thorsten (Medium Quality)",
|
| 40 |
+
"es-es-marta-low": "Spanish - Marta (Low Quality, Fast)",
|
| 41 |
+
"es-es-marta-medium": "Spanish - Marta (Medium Quality)",
|
| 42 |
+
"fr-fr-siwis-low": "French - Siwis (Low Quality, Fast)",
|
| 43 |
+
"fr-fr-siwis-medium": "French - Siwis (Medium Quality)",
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
# Default voice
|
| 47 |
+
DEFAULT_VOICE = "en-us-amy-low"
|
| 48 |
|
| 49 |
# Request models
|
| 50 |
class TTSRequest(BaseModel):
|
| 51 |
text: str
|
| 52 |
+
voice: Optional[str] = DEFAULT_VOICE
|
| 53 |
+
speed: Optional[float] = 1.0
|
| 54 |
|
| 55 |
|
| 56 |
@app.on_event("startup")
|
| 57 |
async def startup_event():
|
| 58 |
"""
|
| 59 |
+
Initialize Piper TTS - download default model if needed
|
|
|
|
| 60 |
"""
|
|
|
|
| 61 |
try:
|
| 62 |
+
logger.info("Initializing Piper TTS...")
|
| 63 |
+
|
| 64 |
+
# Check if piper is available
|
| 65 |
+
result = subprocess.run(["piper", "--help"], capture_output=True, text=True)
|
| 66 |
+
if result.returncode == 0:
|
| 67 |
+
logger.info("Piper TTS is available!")
|
| 68 |
+
else:
|
| 69 |
+
logger.error("Piper TTS not found in PATH")
|
| 70 |
+
|
| 71 |
+
# Create models directory
|
| 72 |
+
models_dir = Path("./piper_models")
|
| 73 |
+
models_dir.mkdir(exist_ok=True)
|
| 74 |
+
|
| 75 |
+
# Download default voice model if not exists
|
| 76 |
+
await download_voice_model(DEFAULT_VOICE)
|
| 77 |
|
| 78 |
+
logger.info("Piper TTS initialized successfully!")
|
|
|
|
|
|
|
| 79 |
|
|
|
|
| 80 |
except Exception as e:
|
| 81 |
+
logger.error(f"Failed to initialize Piper TTS: {str(e)}")
|
| 82 |
raise e
|
| 83 |
|
| 84 |
|
| 85 |
+
async def download_voice_model(voice: str):
|
| 86 |
+
"""Download Piper voice model if not already present"""
|
| 87 |
+
models_dir = Path("./piper_models")
|
| 88 |
+
model_file = models_dir / f"{voice}.onnx"
|
| 89 |
+
config_file = models_dir / f"{voice}.onnx.json"
|
| 90 |
+
|
| 91 |
+
if model_file.exists() and config_file.exists():
|
| 92 |
+
logger.info(f"Voice model {voice} already exists")
|
| 93 |
+
return
|
| 94 |
+
|
| 95 |
+
logger.info(f"Downloading voice model: {voice}")
|
| 96 |
+
|
| 97 |
+
# Piper model URLs (using official repository)
|
| 98 |
+
base_url = "https://github.com/rhasspy/piper/releases/download/2023.11.14-2"
|
| 99 |
+
|
| 100 |
+
try:
|
| 101 |
+
# Download model file
|
| 102 |
+
model_url = f"{base_url}/{voice}.onnx"
|
| 103 |
+
subprocess.run([
|
| 104 |
+
"wget", "-q", "-O", str(model_file), model_url
|
| 105 |
+
], check=True)
|
| 106 |
+
|
| 107 |
+
# Download config file
|
| 108 |
+
config_url = f"{base_url}/{voice}.onnx.json"
|
| 109 |
+
subprocess.run([
|
| 110 |
+
"wget", "-q", "-O", str(config_file), config_url
|
| 111 |
+
], check=True)
|
| 112 |
+
|
| 113 |
+
logger.info(f"Downloaded voice model: {voice}")
|
| 114 |
+
|
| 115 |
+
except subprocess.CalledProcessError as e:
|
| 116 |
+
logger.error(f"Failed to download voice model {voice}: {e}")
|
| 117 |
+
# Clean up partial downloads
|
| 118 |
+
model_file.unlink(missing_ok=True)
|
| 119 |
+
config_file.unlink(missing_ok=True)
|
| 120 |
+
raise HTTPException(status_code=500, detail=f"Failed to download voice model: {voice}")
|
| 121 |
+
|
| 122 |
+
|
| 123 |
@app.get("/")
|
| 124 |
async def root():
|
| 125 |
"""Health check endpoint"""
|
| 126 |
return {
|
| 127 |
"status": "healthy",
|
| 128 |
+
"message": "Text-to-Speech API with Piper is running",
|
| 129 |
+
"engine": "Piper TTS",
|
| 130 |
+
"available_voices": list(AVAILABLE_VOICES.keys()),
|
| 131 |
+
"default_voice": DEFAULT_VOICE
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
@app.get("/voices")
|
| 136 |
+
async def get_voices():
|
| 137 |
+
"""Get available voices"""
|
| 138 |
+
return {
|
| 139 |
+
"voices": AVAILABLE_VOICES,
|
| 140 |
+
"default": DEFAULT_VOICE
|
| 141 |
}
|
| 142 |
|
| 143 |
|
| 144 |
@app.get("/tts")
|
| 145 |
+
async def tts_get(
|
| 146 |
+
text: str,
|
| 147 |
+
voice: str = DEFAULT_VOICE,
|
| 148 |
+
speed: float = 1.0
|
| 149 |
+
):
|
| 150 |
"""
|
| 151 |
Simple GET endpoint for TTS
|
| 152 |
+
Usage: GET /tts?text=Hello%20world&voice=en-us-amy-low&speed=1.0
|
| 153 |
"""
|
| 154 |
if not text or len(text.strip()) == 0:
|
| 155 |
raise HTTPException(status_code=400, detail="Text parameter is required")
|
| 156 |
|
| 157 |
+
if voice not in AVAILABLE_VOICES:
|
| 158 |
+
raise HTTPException(status_code=400, detail=f"Voice '{voice}' not available. Use /voices to see available options.")
|
| 159 |
+
|
| 160 |
+
return await generate_speech(text, voice, speed)
|
| 161 |
|
| 162 |
|
| 163 |
@app.post("/tts")
|
| 164 |
async def tts_post(
|
| 165 |
request: TTSRequest = None,
|
| 166 |
text: str = Form(None),
|
| 167 |
+
voice: str = Form(DEFAULT_VOICE),
|
| 168 |
+
speed: float = Form(1.0)
|
| 169 |
):
|
| 170 |
"""
|
| 171 |
+
POST endpoint for TTS
|
| 172 |
+
Accepts JSON body or form data
|
| 173 |
"""
|
| 174 |
# Handle different input formats
|
| 175 |
if request:
|
| 176 |
input_text = request.text
|
| 177 |
+
input_voice = request.voice or DEFAULT_VOICE
|
| 178 |
+
input_speed = request.speed or 1.0
|
| 179 |
elif text:
|
| 180 |
input_text = text
|
| 181 |
+
input_voice = voice
|
| 182 |
+
input_speed = speed
|
| 183 |
else:
|
| 184 |
raise HTTPException(status_code=400, detail="Text is required")
|
| 185 |
|
| 186 |
if not input_text or len(input_text.strip()) == 0:
|
| 187 |
raise HTTPException(status_code=400, detail="Text cannot be empty")
|
| 188 |
|
| 189 |
+
if input_voice not in AVAILABLE_VOICES:
|
| 190 |
+
raise HTTPException(status_code=400, detail=f"Voice '{input_voice}' not available. Use /voices to see available options.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 191 |
|
| 192 |
+
return await generate_speech(input_text, input_voice, input_speed)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
|
| 195 |
+
async def generate_speech(text: str, voice: str = DEFAULT_VOICE, speed: float = 1.0):
|
| 196 |
"""
|
| 197 |
+
Generate speech from text using Piper TTS
|
| 198 |
"""
|
|
|
|
|
|
|
|
|
|
| 199 |
try:
|
| 200 |
+
# Ensure voice model is available
|
| 201 |
+
await download_voice_model(voice)
|
| 202 |
+
|
| 203 |
# Create temporary file for output
|
| 204 |
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_file:
|
| 205 |
output_path = tmp_file.name
|
| 206 |
|
| 207 |
+
logger.info(f"Generating speech for text: '{text[:50]}...' with voice: {voice}")
|
| 208 |
+
|
| 209 |
+
# Prepare piper command
|
| 210 |
+
models_dir = Path("./piper_models")
|
| 211 |
+
model_file = models_dir / f"{voice}.onnx"
|
| 212 |
+
|
| 213 |
+
# Build piper command
|
| 214 |
+
cmd = [
|
| 215 |
+
"piper",
|
| 216 |
+
"--model", str(model_file),
|
| 217 |
+
"--output_file", output_path,
|
| 218 |
+
]
|
| 219 |
+
|
| 220 |
+
# Add length scale for speed control (inverse of speed)
|
| 221 |
+
if speed != 1.0:
|
| 222 |
+
length_scale = 1.0 / speed
|
| 223 |
+
cmd.extend(["--length_scale", str(length_scale)])
|
| 224 |
+
|
| 225 |
+
# Run piper with text input
|
| 226 |
+
process = subprocess.run(
|
| 227 |
+
cmd,
|
| 228 |
+
input=text,
|
| 229 |
+
text=True,
|
| 230 |
+
capture_output=True,
|
| 231 |
+
check=True
|
| 232 |
+
)
|
| 233 |
|
| 234 |
# Verify the file was created and has content
|
| 235 |
if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
|
|
|
|
| 248 |
}
|
| 249 |
)
|
| 250 |
|
| 251 |
+
except subprocess.CalledProcessError as e:
|
| 252 |
+
logger.error(f"Piper command failed: {e.stderr}")
|
| 253 |
+
# Clean up output file on error
|
| 254 |
+
if 'output_path' in locals() and os.path.exists(output_path):
|
| 255 |
+
try:
|
| 256 |
+
os.unlink(output_path)
|
| 257 |
+
except:
|
| 258 |
+
pass
|
| 259 |
+
raise HTTPException(status_code=500, detail=f"TTS generation failed: {e.stderr}")
|
| 260 |
+
|
| 261 |
except Exception as e:
|
| 262 |
logger.error(f"Error generating speech: {str(e)}")
|
| 263 |
# Clean up output file on error
|
|
|
|
| 272 |
@app.get("/health")
|
| 273 |
async def health_check():
|
| 274 |
"""Detailed health check endpoint"""
|
| 275 |
+
try:
|
| 276 |
+
# Check if piper is available
|
| 277 |
+
result = subprocess.run(["piper", "--version"], capture_output=True, text=True)
|
| 278 |
+
piper_available = result.returncode == 0
|
| 279 |
+
piper_version = result.stdout.strip() if piper_available else "Not available"
|
| 280 |
+
except:
|
| 281 |
+
piper_available = False
|
| 282 |
+
piper_version = "Not available"
|
| 283 |
+
|
| 284 |
return {
|
| 285 |
+
"status": "healthy" if piper_available else "degraded",
|
| 286 |
+
"piper_available": piper_available,
|
| 287 |
+
"piper_version": piper_version,
|
| 288 |
+
"engine": "Piper TTS",
|
| 289 |
+
"available_voices": len(AVAILABLE_VOICES)
|
| 290 |
}
|
| 291 |
|
| 292 |
|
requirements.txt
CHANGED
|
@@ -5,11 +5,5 @@ uvicorn[standard]==0.24.0
|
|
| 5 |
# File handling and HTTP
|
| 6 |
python-multipart==0.0.6
|
| 7 |
|
| 8 |
-
# Audio processing dependencies
|
| 9 |
-
numpy>=1.21.0
|
| 10 |
-
scipy>=1.7.0
|
| 11 |
-
librosa>=0.9.0
|
| 12 |
-
soundfile>=0.12.0
|
| 13 |
-
|
| 14 |
# Essential utilities
|
| 15 |
pydantic>=2.0.0
|
|
|
|
| 5 |
# File handling and HTTP
|
| 6 |
python-multipart==0.0.6
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
# Essential utilities
|
| 9 |
pydantic>=2.0.0
|
test_api.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
-
Simple test script for the
|
| 4 |
Run this to test the API locally
|
| 5 |
"""
|
| 6 |
|
|
@@ -11,9 +11,15 @@ import os
|
|
| 11 |
# Configuration
|
| 12 |
API_BASE_URL = "http://localhost:7860"
|
| 13 |
TEST_TEXTS = [
|
| 14 |
-
"Hello world, this is a test of the text to speech API.",
|
| 15 |
"The quick brown fox jumps over the lazy dog.",
|
| 16 |
-
"Welcome to our production-ready TTS service!"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
]
|
| 18 |
|
| 19 |
def test_health_check():
|
|
@@ -24,12 +30,23 @@ def test_health_check():
|
|
| 24 |
# Test root endpoint
|
| 25 |
response = requests.get(f"{API_BASE_URL}/")
|
| 26 |
print(f"GET / - Status: {response.status_code}")
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
# Test health endpoint
|
| 30 |
response = requests.get(f"{API_BASE_URL}/health")
|
| 31 |
print(f"GET /health - Status: {response.status_code}")
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
except requests.exceptions.ConnectionError:
|
| 35 |
print("❌ Could not connect to the API. Make sure it's running on localhost:7860")
|
|
@@ -43,17 +60,19 @@ def test_get_endpoint():
|
|
| 43 |
|
| 44 |
for i, text in enumerate(TEST_TEXTS):
|
| 45 |
try:
|
|
|
|
| 46 |
params = {
|
| 47 |
"text": text,
|
| 48 |
-
"
|
|
|
|
| 49 |
}
|
| 50 |
|
| 51 |
-
print(f"Testing text {i+1}: '{text[:30]}...'")
|
| 52 |
response = requests.get(f"{API_BASE_URL}/tts", params=params)
|
| 53 |
|
| 54 |
if response.status_code == 200:
|
| 55 |
# Save the audio file
|
| 56 |
-
filename = f"test_output_get_{i+1}.wav"
|
| 57 |
with open(filename, "wb") as f:
|
| 58 |
f.write(response.content)
|
| 59 |
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
|
@@ -69,17 +88,19 @@ def test_post_endpoint():
|
|
| 69 |
|
| 70 |
for i, text in enumerate(TEST_TEXTS):
|
| 71 |
try:
|
|
|
|
| 72 |
data = {
|
| 73 |
"text": text,
|
| 74 |
-
"
|
|
|
|
| 75 |
}
|
| 76 |
|
| 77 |
-
print(f"Testing text {i+1}: '{text[:30]}...'")
|
| 78 |
response = requests.post(f"{API_BASE_URL}/tts", json=data)
|
| 79 |
|
| 80 |
if response.status_code == 200:
|
| 81 |
# Save the audio file
|
| 82 |
-
filename = f"test_output_post_{i+1}.wav"
|
| 83 |
with open(filename, "wb") as f:
|
| 84 |
f.write(response.content)
|
| 85 |
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
|
@@ -95,14 +116,15 @@ def test_form_endpoint():
|
|
| 95 |
|
| 96 |
try:
|
| 97 |
data = {
|
| 98 |
-
"text": "This is a test using form data submission.",
|
| 99 |
-
"
|
|
|
|
| 100 |
}
|
| 101 |
|
| 102 |
response = requests.post(f"{API_BASE_URL}/tts", data=data)
|
| 103 |
|
| 104 |
if response.status_code == 200:
|
| 105 |
-
filename = "
|
| 106 |
with open(filename, "wb") as f:
|
| 107 |
f.write(response.content)
|
| 108 |
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
|
@@ -112,11 +134,36 @@ def test_form_endpoint():
|
|
| 112 |
except Exception as e:
|
| 113 |
print(f"❌ Exception: {str(e)}")
|
| 114 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
def cleanup_test_files():
|
| 116 |
"""Clean up generated test files"""
|
| 117 |
print("\n🧹 Cleaning up test files...")
|
| 118 |
|
| 119 |
-
test_files = [f for f in os.listdir(".") if f.startswith("
|
| 120 |
|
| 121 |
for file in test_files:
|
| 122 |
try:
|
|
@@ -126,7 +173,7 @@ def cleanup_test_files():
|
|
| 126 |
print(f"Could not remove {file}: {str(e)}")
|
| 127 |
|
| 128 |
if __name__ == "__main__":
|
| 129 |
-
print("🚀 Starting TTS API Test Suite")
|
| 130 |
print("=" * 50)
|
| 131 |
|
| 132 |
# Test health check first
|
|
@@ -134,17 +181,24 @@ if __name__ == "__main__":
|
|
| 134 |
print("\n❌ Health check failed. Exiting.")
|
| 135 |
exit(1)
|
| 136 |
|
| 137 |
-
# Wait a moment for
|
| 138 |
-
print("\n⏳ Waiting for
|
| 139 |
time.sleep(2)
|
| 140 |
|
| 141 |
# Run tests
|
| 142 |
test_get_endpoint()
|
| 143 |
test_post_endpoint()
|
| 144 |
test_form_endpoint()
|
|
|
|
| 145 |
|
| 146 |
print("\n" + "=" * 50)
|
| 147 |
print("✅ Test suite completed!")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
print("\nTo clean up test files, run:")
|
| 149 |
print("python test_api.py --cleanup")
|
| 150 |
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
+
Simple test script for the Piper TTS API
|
| 4 |
Run this to test the API locally
|
| 5 |
"""
|
| 6 |
|
|
|
|
| 11 |
# Configuration
|
| 12 |
API_BASE_URL = "http://localhost:7860"
|
| 13 |
TEST_TEXTS = [
|
| 14 |
+
"Hello world, this is a test of the Piper text to speech API.",
|
| 15 |
"The quick brown fox jumps over the lazy dog.",
|
| 16 |
+
"Welcome to our production-ready TTS service using Piper!"
|
| 17 |
+
]
|
| 18 |
+
|
| 19 |
+
VOICES_TO_TEST = [
|
| 20 |
+
"en-us-amy-low",
|
| 21 |
+
"en-us-ryan-low",
|
| 22 |
+
"en-gb-alan-low"
|
| 23 |
]
|
| 24 |
|
| 25 |
def test_health_check():
|
|
|
|
| 30 |
# Test root endpoint
|
| 31 |
response = requests.get(f"{API_BASE_URL}/")
|
| 32 |
print(f"GET / - Status: {response.status_code}")
|
| 33 |
+
if response.status_code == 200:
|
| 34 |
+
data = response.json()
|
| 35 |
+
print(f"Available voices: {len(data.get('available_voices', []))}")
|
| 36 |
|
| 37 |
# Test health endpoint
|
| 38 |
response = requests.get(f"{API_BASE_URL}/health")
|
| 39 |
print(f"GET /health - Status: {response.status_code}")
|
| 40 |
+
if response.status_code == 200:
|
| 41 |
+
data = response.json()
|
| 42 |
+
print(f"Piper available: {data.get('piper_available')}")
|
| 43 |
+
|
| 44 |
+
# Test voices endpoint
|
| 45 |
+
response = requests.get(f"{API_BASE_URL}/voices")
|
| 46 |
+
print(f"GET /voices - Status: {response.status_code}")
|
| 47 |
+
if response.status_code == 200:
|
| 48 |
+
data = response.json()
|
| 49 |
+
print(f"Total voices available: {len(data.get('voices', {}))}")
|
| 50 |
|
| 51 |
except requests.exceptions.ConnectionError:
|
| 52 |
print("❌ Could not connect to the API. Make sure it's running on localhost:7860")
|
|
|
|
| 60 |
|
| 61 |
for i, text in enumerate(TEST_TEXTS):
|
| 62 |
try:
|
| 63 |
+
voice = VOICES_TO_TEST[i % len(VOICES_TO_TEST)]
|
| 64 |
params = {
|
| 65 |
"text": text,
|
| 66 |
+
"voice": voice,
|
| 67 |
+
"speed": 1.0
|
| 68 |
}
|
| 69 |
|
| 70 |
+
print(f"Testing text {i+1}: '{text[:30]}...' with voice '{voice}'")
|
| 71 |
response = requests.get(f"{API_BASE_URL}/tts", params=params)
|
| 72 |
|
| 73 |
if response.status_code == 200:
|
| 74 |
# Save the audio file
|
| 75 |
+
filename = f"test_output_get_{i+1}_{voice}.wav"
|
| 76 |
with open(filename, "wb") as f:
|
| 77 |
f.write(response.content)
|
| 78 |
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
|
|
|
| 88 |
|
| 89 |
for i, text in enumerate(TEST_TEXTS):
|
| 90 |
try:
|
| 91 |
+
voice = VOICES_TO_TEST[i % len(VOICES_TO_TEST)]
|
| 92 |
data = {
|
| 93 |
"text": text,
|
| 94 |
+
"voice": voice,
|
| 95 |
+
"speed": 1.2 if i % 2 else 0.9 # Test different speeds
|
| 96 |
}
|
| 97 |
|
| 98 |
+
print(f"Testing text {i+1}: '{text[:30]}...' with voice '{voice}' at speed {data['speed']}")
|
| 99 |
response = requests.post(f"{API_BASE_URL}/tts", json=data)
|
| 100 |
|
| 101 |
if response.status_code == 200:
|
| 102 |
# Save the audio file
|
| 103 |
+
filename = f"test_output_post_{i+1}_{voice}_speed{data['speed']}.wav"
|
| 104 |
with open(filename, "wb") as f:
|
| 105 |
f.write(response.content)
|
| 106 |
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
|
|
|
| 116 |
|
| 117 |
try:
|
| 118 |
data = {
|
| 119 |
+
"text": "This is a test using form data submission with Piper TTS.",
|
| 120 |
+
"voice": "en-us-amy-medium",
|
| 121 |
+
"speed": "0.8"
|
| 122 |
}
|
| 123 |
|
| 124 |
response = requests.post(f"{API_BASE_URL}/tts", data=data)
|
| 125 |
|
| 126 |
if response.status_code == 200:
|
| 127 |
+
filename = "test_output_form_piper.wav"
|
| 128 |
with open(filename, "wb") as f:
|
| 129 |
f.write(response.content)
|
| 130 |
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
|
|
|
| 134 |
except Exception as e:
|
| 135 |
print(f"❌ Exception: {str(e)}")
|
| 136 |
|
| 137 |
+
def test_voice_variations():
|
| 138 |
+
"""Test different voice qualities"""
|
| 139 |
+
print("\n🗣️ Testing voice quality variations...")
|
| 140 |
+
|
| 141 |
+
test_text = "This is a comparison of voice quality between low and medium quality models."
|
| 142 |
+
voices_to_compare = ["en-us-amy-low", "en-us-amy-medium"]
|
| 143 |
+
|
| 144 |
+
for voice in voices_to_compare:
|
| 145 |
+
try:
|
| 146 |
+
params = {"text": test_text, "voice": voice}
|
| 147 |
+
print(f"Testing voice: {voice}")
|
| 148 |
+
|
| 149 |
+
response = requests.get(f"{API_BASE_URL}/tts", params=params)
|
| 150 |
+
|
| 151 |
+
if response.status_code == 200:
|
| 152 |
+
filename = f"test_voice_comparison_{voice}.wav"
|
| 153 |
+
with open(filename, "wb") as f:
|
| 154 |
+
f.write(response.content)
|
| 155 |
+
print(f"✅ Audio saved as {filename} ({len(response.content)} bytes)")
|
| 156 |
+
else:
|
| 157 |
+
print(f"❌ Error: {response.status_code} - {response.text}")
|
| 158 |
+
|
| 159 |
+
except Exception as e:
|
| 160 |
+
print(f"❌ Exception: {str(e)}")
|
| 161 |
+
|
| 162 |
def cleanup_test_files():
|
| 163 |
"""Clean up generated test files"""
|
| 164 |
print("\n🧹 Cleaning up test files...")
|
| 165 |
|
| 166 |
+
test_files = [f for f in os.listdir(".") if f.startswith("test_") and f.endswith(".wav")]
|
| 167 |
|
| 168 |
for file in test_files:
|
| 169 |
try:
|
|
|
|
| 173 |
print(f"Could not remove {file}: {str(e)}")
|
| 174 |
|
| 175 |
if __name__ == "__main__":
|
| 176 |
+
print("🚀 Starting Piper TTS API Test Suite")
|
| 177 |
print("=" * 50)
|
| 178 |
|
| 179 |
# Test health check first
|
|
|
|
| 181 |
print("\n❌ Health check failed. Exiting.")
|
| 182 |
exit(1)
|
| 183 |
|
| 184 |
+
# Wait a moment for Piper to be ready
|
| 185 |
+
print("\n⏳ Waiting for Piper TTS to be ready...")
|
| 186 |
time.sleep(2)
|
| 187 |
|
| 188 |
# Run tests
|
| 189 |
test_get_endpoint()
|
| 190 |
test_post_endpoint()
|
| 191 |
test_form_endpoint()
|
| 192 |
+
test_voice_variations()
|
| 193 |
|
| 194 |
print("\n" + "=" * 50)
|
| 195 |
print("✅ Test suite completed!")
|
| 196 |
+
print("\nGenerated files demonstrate:")
|
| 197 |
+
print("- Different voices (amy, ryan, alan)")
|
| 198 |
+
print("- Quality variations (low vs medium)")
|
| 199 |
+
print("- Speed variations (0.8x to 1.2x)")
|
| 200 |
+
print("- Various input methods (GET, POST JSON, POST form)")
|
| 201 |
+
|
| 202 |
print("\nTo clean up test files, run:")
|
| 203 |
print("python test_api.py --cleanup")
|
| 204 |
|