Spaces:

USF00
/

TTS_Deploy

Sleeping

App Files Files Community

TTS_Deploy / README.md

USF00

Add Hugging Face Spaces configuration metadata

2813108 about 2 months ago

preview code

Raw

History Blame Contribute Delete

4.04 kB

	---
	title: TTS API
	emoji: 🗣️
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	---
	# Text-to-Speech (TTS) FastAPI Service

	This is a production-ready Text-to-Speech API built with FastAPI and `edge-tts`. It is optimized for asynchronous generation, handles long texts by chunking them, and is structured for deployment on GPU-enabled platforms like Vast.ai.

	## Project Structure

	```
	project/
	├── app.py # FastAPI application
	├── requirements.txt # Dependencies
	├── Dockerfile # Docker configuration
	├── .dockerignore # Docker ignore file
	├── models/ # Directory for local models (if added later)
	├── outputs/ # Temporary output directory for generated audio
	├── temp/ # Temporary chunk processing directory
	├── utils/ # Utility modules
	│ ├── config.py # Configuration and settings
	│ ├── text_utils.py # Text cleaning and chunking logic
	│ └── audio_utils.py # TTS generation and audio concatenation logic
	└── README.md # Documentation
	```

	## Features
	- FastAPI / Uvicorn: High-performance asynchronous API.
	- Robust Chunking: Automatically chunks long text inputs without breaking sentences.
	- Edge-TTS Integration: Uses Microsoft Edge's neural TTS service.
	- GPU Readiness: Includes `torch` and memory cleanup (`torch.cuda.empty_cache()`) for compatibility with Vast.ai templates, allowing easy drop-in of local GPU models later.
	- Temp File Cleanup: Automatically cleans up temporary chunks and output files after they are served.

	## How to Run Locally

	### Prerequisites
	- Python 3.10+
	- `ffmpeg` installed on your system.

	### Steps
	1. Install dependencies:
	```bash
	pip install -r requirements.txt
	```
	2. Run the application:
	```bash
	uvicorn app:app --host 0.0.0.0 --port 8000 --reload
	```
	3. Access the API documentation at `http://localhost:8000/docs`.

	## How to Build and Run the Docker Image

	1. Build the Docker image:
	```bash
	docker build -t tts-fastapi .
	```
	2. Run the Docker container:
	```bash
	docker run -p 8000:8000 --gpus all tts-fastapi
	```
	(Note: `--gpus all` is optional if you are strictly using `edge-tts` but recommended if you are deploying to a GPU-enabled instance and plan to use local PyTorch models).

	## How to Deploy on Vast.ai

	1. Spin up an instance on Vast.ai.
	2. Under "Template Configuration", select a base image such as `nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04` or simply check the box for "Use Custom Image" and enter `python:3.10` or a custom Docker Hub image if you have pushed yours.
	3. If using an unconfigured instance, SSH in and run:
	```bash
	git clone <your-repo-url>
	cd tts-repo
	docker build -t tts-fastapi .
	docker run -d -p 8000:8000 --gpus all tts-fastapi
	```
	4. Ensure port `8000` is mapped and exposed in your Vast.ai instance settings so you can reach the API externally.

	## API Usage

	### `POST /generate`

	Endpoint: `/generate`
	Content-Type: `application/json`

	Request Body:
	```json
	{
	"text": "Hello! This is a test of the Text to Speech API. It can handle very long text by splitting it appropriately.",
	"lang": "en",
	"voice_id": "preset_1"
	}
	```

	Parameters:
	- `text` (string, required): The text to synthesize.
	- `lang` (string, optional): Language code (e.g., `"en"`, `"ar"`). Defaults to `"en"`.
	- `voice_id` (string, optional): A preset voice from the registry (`preset_1` to `preset_5`). Defaults to `"preset_1"`.
	- `custom_voice_name` (string, optional): A specific Edge-TTS voice name (e.g., `"en-US-AriaNeural"`).

	Response:
	Returns the generated audio file (`audio/wav`).

	### Example using `curl`

	```bash
	curl -X POST "http://localhost:8000/generate" \
	-H "Content-Type: application/json" \
	-d '{"text":"Welcome to the text to speech service.","voice_id":"preset_2"}' \
	--output generated_speech.wav
	```