# Chatterbox TTS FastAPI This API provides a **FastAPI**-based web service for the Chatterbox TTS text-to-speech system, designed to be compatible with OpenAI's TTS API format. ## Features - **OpenAI-compatible API**: Uses similar endpoint structure to OpenAI's text-to-speech API - **FastAPI Performance**: High-performance async API with automatic documentation - **Type Safety**: Full Pydantic validation for requests and responses - **Interactive Documentation**: Automatic Swagger UI and ReDoc generation - **Automatic text chunking**: Automatically breaks long text into manageable chunks to handle character limits - **Voice cloning**: Uses the pre-specified `voice-sample.mp3` file for voice conditioning - **Async Support**: Non-blocking request handling with better concurrency - **Error handling**: Comprehensive error handling with appropriate HTTP status codes - **Health monitoring**: Health check endpoint for monitoring service status - **Environment-based configuration**: Fully configurable via environment variables - **Docker support**: Ready for containerized deployment ## Setup ### Prerequisites 1. Ensure you have the Chatterbox TTS package installed: ```bash pip install chatterbox-tts ``` 2. Install FastAPI and other required dependencies: ```bash pip install fastapi uvicorn[standard] torchaudio requests python-dotenv ``` 3. Ensure you have a `voice-sample.mp3` file in the project root directory for voice conditioning ### Configuration Copy the example environment file and customize it: ```bash cp .env.example .env nano .env # Edit with your preferred settings ``` Key environment variables: - `PORT=4123` - API server port - `EXAGGERATION=0.5` - Default emotion intensity (0.25-2.0) - `CFG_WEIGHT=0.5` - Default pace control (0.0-1.0) - `TEMPERATURE=0.8` - Default sampling temperature (0.05-5.0) - `VOICE_SAMPLE_PATH=./voice-sample.mp3` - Path to voice sample file - `DEVICE=auto` - Device selection (auto/cuda/mps/cpu) See `.env.example` for all available options. ### Running the API Start the API server: ```bash # Method 1: Direct uvicorn (recommended for development) uvicorn app.main:app --host 0.0.0.0 --port 4123 # Method 2: Using the main script python main.py # Method 3: With auto-reload for development uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload ``` The server will: - Automatically detect the best available device (CUDA, MPS, or CPU) - Load the Chatterbox TTS model asynchronously - Start the FastAPI server on `http://localhost:4123` (or your configured port) - Provide interactive documentation at `/docs` and `/redoc` ### API Documentation Once running, you can access: - **Interactive API Docs (Swagger UI)**: http://localhost:4123/docs - **Alternative Documentation (ReDoc)**: http://localhost:4123/redoc - **OpenAPI Schema**: http://localhost:4123/openapi.json ## API Endpoints ### 1. Text-to-Speech Generation **POST** `/v1/audio/speech` Generate speech from text using the Chatterbox TTS model. **Request Body (Pydantic Model):** ```json { "input": "Text to convert to speech", "voice": "alloy", // Ignored - uses voice-sample.mp3 "response_format": "wav", // Ignored - always returns WAV "speed": 1.0, // Ignored - use model's built-in parameters "exaggeration": 0.7, // Optional - override default (0.25-2.0) "cfg_weight": 0.4, // Optional - override default (0.0-1.0) "temperature": 0.9 // Optional - override default (0.05-5.0) } ``` **Validation:** - `input`: Required, 1-3000 characters, automatically trimmed - `exaggeration`: Optional, 0.25-2.0 range validation - `cfg_weight`: Optional, 0.0-1.0 range validation - `temperature`: Optional, 0.05-5.0 range validation **Response:** - Content-Type: `audio/wav` - Binary audio data in WAV format via StreamingResponse **Example:** ```bash curl -X POST http://localhost:4123/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input": "Hello, this is a test of the text to speech system."}' \ --output speech.wav ``` **With custom parameters:** ```bash curl -X POST http://localhost:4123/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3}' \ --output dramatic.wav ``` ### 2. Health Check **GET** `/health` Check if the API is running and the model is loaded. **Response (HealthResponse model):** ```json { "status": "healthy", "model_loaded": true, "device": "cuda", "config": { "max_chunk_length": 280, "max_total_length": 3000, "voice_sample_path": "./voice-sample.mp3", "default_exaggeration": 0.5, "default_cfg_weight": 0.5, "default_temperature": 0.8 } } ``` ### 3. List Models **GET** `/v1/models` List available models (OpenAI API compatibility). **Response (ModelsResponse model):** ```json { "object": "list", "data": [ { "id": "chatterbox-tts-1", "object": "model", "created": 1677649963, "owned_by": "resemble-ai" } ] } ``` ### 4. Configuration Info **GET** `/config` Get current configuration (useful for debugging). **Response (ConfigResponse model):** ```json { "server": { "host": "0.0.0.0", "port": 4123 }, "model": { "device": "cuda", "voice_sample_path": "./voice-sample.mp3", "model_cache_dir": "./models" }, "defaults": { "exaggeration": 0.5, "cfg_weight": 0.5, "temperature": 0.8, "max_chunk_length": 280, "max_total_length": 3000 } } ``` ### 5. API Documentation Endpoints **GET** `/docs` - Interactive Swagger UI documentation **GET** `/redoc` - Alternative ReDoc documentation **GET** `/openapi.json` - OpenAPI schema specification ## Text Processing ### Automatic Chunking The API automatically handles long text inputs by: 1. **Character limit**: Splits text longer than the configured chunk size (default: 280 characters) 2. **Sentence preservation**: Attempts to split at sentence boundaries (`.`, `!`, `?`) 3. **Fallback splitting**: If sentences are too long, splits at commas, semicolons, or other natural breaks 4. **Audio concatenation**: Seamlessly combines audio from multiple chunks ### Maximum Limits - **Soft limit**: Configurable characters per chunk (default: 280) - **Hard limit**: Configurable total characters (default: 3000) - **Automatic processing**: No manual intervention required ## Error Handling FastAPI provides enhanced error handling with automatic validation: - **422 Unprocessable Entity**: Invalid input validation (Pydantic errors) - **400 Bad Request**: Business logic errors (text too long, etc.) - **500 Internal Server Error**: Model or processing errors **Error Response Format:** ```json { "error": { "message": "Missing required field: 'input'", "type": "invalid_request_error" } } ``` **Validation Error Example:** ```json { "detail": [ { "type": "greater_equal", "loc": ["body", "exaggeration"], "msg": "Input should be greater than or equal to 0.25", "input": 0.1 } ] } ``` ## Testing Use the enhanced test script to verify the API functionality: ```bash python tests/test_api.py ``` The test script will: - Test health check endpoint - Test models endpoint - Test API documentation endpoints (new!) - Generate speech for various text lengths - Test custom parameter validation - Test error handling with validation - Save generated audio files as `test_output_*.wav` ## Configuration You can configure the API through environment variables or by modifying `.env.example`: ```bash # Server Configuration PORT=4123 HOST=0.0.0.0 # TTS Model Settings EXAGGERATION=0.5 # Emotion intensity (0.25-2.0) CFG_WEIGHT=0.5 # Pace control (0.0-1.0) TEMPERATURE=0.8 # Sampling temperature (0.05-5.0) # Text Processing MAX_CHUNK_LENGTH=280 # Characters per chunk MAX_TOTAL_LENGTH=3000 # Total character limit # Voice and Model Settings VOICE_SAMPLE_PATH=./voice-sample.mp3 DEVICE=auto # auto/cuda/mps/cpu MODEL_CACHE_DIR=./models ``` ### Parameter Effects **Exaggeration (0.25-2.0):** - `0.3-0.4`: Very neutral, professional - `0.5`: Neutral (default) - `0.7-0.8`: More expressive - `1.0+`: Very dramatic (may be unstable) **CFG Weight (0.0-1.0):** - `0.2-0.3`: Faster speech - `0.5`: Balanced (default) - `0.7-0.8`: Slower, more deliberate **Temperature (0.05-5.0):** - `0.4-0.6`: More consistent - `0.8`: Balanced (default) - `1.0+`: More creative/random ## Docker Deployment For Docker deployment, see [DOCKER_README.md](DOCKER_README.md) for complete instructions. **Quick start with Docker Compose:** ```bash cp .env.example .env # Customize as needed docker compose up -d ``` **Quick start with Docker:** ```bash docker build -t chatterbox-tts . docker run -d -p 4123:4123 \ -v ./voice-sample.mp3:/app/voice-sample.mp3:ro \ -e EXAGGERATION=0.7 \ chatterbox-tts ``` ## Performance Notes **FastAPI Benefits:** - **Async performance**: Better handling of concurrent requests - **Faster JSON serialization**: ~25% faster than Flask - **Type validation**: Prevents invalid requests at the API level - **Auto documentation**: No manual API doc maintenance **Hardware Recommendations:** - **Model loading**: The model is loaded once at startup (can take 30-60 seconds) - **First request**: May be slower due to initial model warm-up - **Subsequent requests**: Should be faster due to model caching - **Memory usage**: Varies by device (GPU recommended for best performance) - **Concurrent requests**: FastAPI async support allows better multi-request handling ## Integration Examples ### Python with requests ```python import requests # Basic request response = requests.post( "http://localhost:4123/v1/audio/speech", json={"input": "Hello world!"} ) with open("output.wav", "wb") as f: f.write(response.content) # With custom parameters and validation response = requests.post( "http://localhost:4123/v1/audio/speech", json={ "input": "Exciting news!", "exaggeration": 0.8, "cfg_weight": 0.4, "temperature": 1.0 } ) # Handle validation errors if response.status_code == 422: print("Validation error:", response.json()) ``` ### JavaScript/Node.js ```javascript const response = await fetch('http://localhost:4123/v1/audio/speech', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ input: 'Hello world!', exaggeration: 0.7, }), }); if (response.status === 422) { const error = await response.json(); console.log('Validation error:', error); } else { const audioBuffer = await response.arrayBuffer(); // Save or play the audio buffer } ``` ### cURL ```bash # Basic usage curl -X POST http://localhost:4123/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input": "Your text here"}' \ --output output.wav # With custom parameters curl -X POST http://localhost:4123/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input": "Dramatic text!", "exaggeration": 1.0, "cfg_weight": 0.3}' \ --output dramatic.wav # Test the interactive documentation curl http://localhost:4123/docs ``` ## Development Features ### FastAPI Development Tools - **Auto-reload**: Use `--reload` flag for development - **Interactive testing**: Use `/docs` for live API testing - **Type hints**: Full IDE support with Pydantic models - **Validation**: Automatic request/response validation - **OpenAPI**: Machine-readable API specification ### Development Mode ```bash # Start with auto-reload uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload # Or with verbose logging uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug ``` ## Troubleshooting ### Common Issues 1. **Model not loading**: Ensure Chatterbox TTS is properly installed 2. **Voice sample missing**: Verify `voice-sample.mp3` exists at the configured path 3. **CUDA out of memory**: Try using CPU device (`DEVICE=cpu`) 4. **Slow performance**: GPU recommended; ensure CUDA/MPS is available 5. **Port conflicts**: Change `PORT` environment variable to an available port 6. **Uvicorn not found**: Install with `pip install uvicorn[standard]` ### FastAPI Specific Issues **Startup Issues:** ```bash # Check if uvicorn is installed uvicorn --version # Run with verbose logging uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug # Alternative startup method python main.py ``` **Validation Errors:** Visit `/docs` to see the interactive API documentation and test your requests. ### Checking Configuration ```bash # Check if API is running curl http://localhost:4123/health # View current configuration curl http://localhost:4123/config # Check API documentation curl http://localhost:4123/openapi.json # Test with simple text curl -X POST http://localhost:4123/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input": "Test"}' \ --output test.wav ``` ## Migration from Flask If you're migrating from the previous Flask version: 1. **Dependencies**: Update to `fastapi` and `uvicorn` instead of `flask` 2. **Startup**: Use `uvicorn app.main:app` instead of `python api.py` 3. **Documentation**: Visit `/docs` for interactive API testing 4. **Validation**: Error responses now use HTTP 422 for validation errors 5. **Performance**: Expect 25-40% better performance for JSON responses All existing API endpoints and request/response formats remain compatible.