Spaces:
Sleeping
Sleeping
metadata
title: Text-to-Speech API
emoji: 🗣️
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
Text-to-Speech API with Coqui TTS
A minimal Text-to-Speech API built with FastAPI and Coqui TTS VITS model, designed for Hugging Face Spaces.
Features
- High-Quality TTS: Uses Coqui's VITS model (
tts_models/en/ljspeech/vits) - Simple API: Clean GET and POST endpoints
- Automatic Model Loading: Downloads model automatically on first startup
- CPU Optimized: Runs efficiently on CPU-only environments
- Browser Compatible: Returns WAV files playable in browsers
Model Information
This app uses the LJSpeech VITS model from Coqui TTS:
- Model:
tts_models/en/ljspeech/vits - Language: English
- Voice: Female (LJSpeech dataset)
- Quality: High-quality neural synthesis
API Usage
Simple GET Request
curl "https://your-space-url/tts?text=Hello%20world"
POST with JSON
curl -X POST "https://your-space-url/tts" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world, this is a test."}'
POST with Form Data
curl -X POST "https://your-space-url/tts" \
-F "text=Hello world"
Endpoints
GET /- Health check and model informationGET /tts- Text-to-speech conversion (GET)POST /tts- Text-to-speech conversion (POST)GET /health- Detailed health status
Response
All TTS endpoints return a WAV audio file that can be:
- Played directly in web browsers
- Downloaded as
speech.wav - Used in audio players and applications
Local Development
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
The API will be available at http://localhost:7860
Example Usage in Browser
You can test the API directly in your browser:
https://your-space-url/tts?text=Welcome%20to%20the%20text%20to%20speech%20API
Deployment Notes
- Hugging Face Spaces: Optimized for CPU-only inference
- Model Loading: VITS model downloads automatically (~50MB)
- Memory Usage: Approximately 500MB RAM
- Response Time: 2-5 seconds for typical sentences
Error Handling
The API provides clear error messages for:
- Missing or empty text input
- Model loading failures
- Audio generation errors