ttlm / README.md
Samarth Naik
feat: Switch to minimal Coqui TTS VITS model implementation
1059c3e
metadata
title: Text-to-Speech API
emoji: 🗣️
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

Text-to-Speech API with Coqui TTS

A minimal Text-to-Speech API built with FastAPI and Coqui TTS VITS model, designed for Hugging Face Spaces.

Features

  • High-Quality TTS: Uses Coqui's VITS model (tts_models/en/ljspeech/vits)
  • Simple API: Clean GET and POST endpoints
  • Automatic Model Loading: Downloads model automatically on first startup
  • CPU Optimized: Runs efficiently on CPU-only environments
  • Browser Compatible: Returns WAV files playable in browsers

Model Information

This app uses the LJSpeech VITS model from Coqui TTS:

  • Model: tts_models/en/ljspeech/vits
  • Language: English
  • Voice: Female (LJSpeech dataset)
  • Quality: High-quality neural synthesis

API Usage

Simple GET Request

curl "https://your-space-url/tts?text=Hello%20world"

POST with JSON

curl -X POST "https://your-space-url/tts" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world, this is a test."}'

POST with Form Data

curl -X POST "https://your-space-url/tts" \
  -F "text=Hello world"

Endpoints

  • GET / - Health check and model information
  • GET /tts - Text-to-speech conversion (GET)
  • POST /tts - Text-to-speech conversion (POST)
  • GET /health - Detailed health status

Response

All TTS endpoints return a WAV audio file that can be:

  • Played directly in web browsers
  • Downloaded as speech.wav
  • Used in audio players and applications

Local Development

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

The API will be available at http://localhost:7860

Example Usage in Browser

You can test the API directly in your browser:

https://your-space-url/tts?text=Welcome%20to%20the%20text%20to%20speech%20API

Deployment Notes

  • Hugging Face Spaces: Optimized for CPU-only inference
  • Model Loading: VITS model downloads automatically (~50MB)
  • Memory Usage: Approximately 500MB RAM
  • Response Time: 2-5 seconds for typical sentences

Error Handling

The API provides clear error messages for:

  • Missing or empty text input
  • Model loading failures
  • Audio generation errors