Spaces:
Sleeping
Sleeping
metadata
title: TTS API
emoji: π£οΈ
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
Text-to-Speech (TTS) FastAPI Service
This is a production-ready Text-to-Speech API built with FastAPI and edge-tts. It is optimized for asynchronous generation, handles long texts by chunking them, and is structured for deployment on GPU-enabled platforms like Vast.ai.
Project Structure
project/
βββ app.py # FastAPI application
βββ requirements.txt # Dependencies
βββ Dockerfile # Docker configuration
βββ .dockerignore # Docker ignore file
βββ models/ # Directory for local models (if added later)
βββ outputs/ # Temporary output directory for generated audio
βββ temp/ # Temporary chunk processing directory
βββ utils/ # Utility modules
β βββ config.py # Configuration and settings
β βββ text_utils.py # Text cleaning and chunking logic
β βββ audio_utils.py # TTS generation and audio concatenation logic
βββ README.md # Documentation
Features
- FastAPI / Uvicorn: High-performance asynchronous API.
- Robust Chunking: Automatically chunks long text inputs without breaking sentences.
- Edge-TTS Integration: Uses Microsoft Edge's neural TTS service.
- GPU Readiness: Includes
torchand memory cleanup (torch.cuda.empty_cache()) for compatibility with Vast.ai templates, allowing easy drop-in of local GPU models later. - Temp File Cleanup: Automatically cleans up temporary chunks and output files after they are served.
How to Run Locally
Prerequisites
- Python 3.10+
ffmpeginstalled on your system.
Steps
- Install dependencies:
pip install -r requirements.txt - Run the application:
uvicorn app:app --host 0.0.0.0 --port 8000 --reload - Access the API documentation at
http://localhost:8000/docs.
How to Build and Run the Docker Image
- Build the Docker image:
docker build -t tts-fastapi . - Run the Docker container:
(Note:docker run -p 8000:8000 --gpus all tts-fastapi--gpus allis optional if you are strictly usingedge-ttsbut recommended if you are deploying to a GPU-enabled instance and plan to use local PyTorch models).
How to Deploy on Vast.ai
- Spin up an instance on Vast.ai.
- Under "Template Configuration", select a base image such as
nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04or simply check the box for "Use Custom Image" and enterpython:3.10or a custom Docker Hub image if you have pushed yours. - If using an unconfigured instance, SSH in and run:
git clone <your-repo-url> cd tts-repo docker build -t tts-fastapi . docker run -d -p 8000:8000 --gpus all tts-fastapi - Ensure port
8000is mapped and exposed in your Vast.ai instance settings so you can reach the API externally.
API Usage
POST /generate
Endpoint: /generate
Content-Type: application/json
Request Body:
{
"text": "Hello! This is a test of the Text to Speech API. It can handle very long text by splitting it appropriately.",
"lang": "en",
"voice_id": "preset_1"
}
Parameters:
text(string, required): The text to synthesize.lang(string, optional): Language code (e.g.,"en","ar"). Defaults to"en".voice_id(string, optional): A preset voice from the registry (preset_1topreset_5). Defaults to"preset_1".custom_voice_name(string, optional): A specific Edge-TTS voice name (e.g.,"en-US-AriaNeural").
Response:
Returns the generated audio file (audio/wav).
Example using curl
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{"text":"Welcome to the text to speech service.","voice_id":"preset_2"}' \
--output generated_speech.wav