Spaces:
Runtime error
Runtime error
| # Chatterbox TTS FastAPI | |
| This API provides a **FastAPI**-based web service for the Chatterbox TTS text-to-speech system, designed to be compatible with OpenAI's TTS API format. | |
| ## Features | |
| - **OpenAI-compatible API**: Uses similar endpoint structure to OpenAI's text-to-speech API | |
| - **FastAPI Performance**: High-performance async API with automatic documentation | |
| - **Type Safety**: Full Pydantic validation for requests and responses | |
| - **Interactive Documentation**: Automatic Swagger UI and ReDoc generation | |
| - **Automatic text chunking**: Automatically breaks long text into manageable chunks to handle character limits | |
| - **Voice cloning**: Uses the pre-specified `voice-sample.mp3` file for voice conditioning | |
| - **Async Support**: Non-blocking request handling with better concurrency | |
| - **Error handling**: Comprehensive error handling with appropriate HTTP status codes | |
| - **Health monitoring**: Health check endpoint for monitoring service status | |
| - **Environment-based configuration**: Fully configurable via environment variables | |
| - **Docker support**: Ready for containerized deployment | |
| ## Setup | |
| ### Prerequisites | |
| 1. Ensure you have the Chatterbox TTS package installed: | |
| ```bash | |
| pip install chatterbox-tts | |
| ``` | |
| 2. Install FastAPI and other required dependencies: | |
| ```bash | |
| pip install fastapi uvicorn[standard] torchaudio requests python-dotenv | |
| ``` | |
| 3. Ensure you have a `voice-sample.mp3` file in the project root directory for voice conditioning | |
| ### Configuration | |
| Copy the example environment file and customize it: | |
| ```bash | |
| cp .env.example .env | |
| nano .env # Edit with your preferred settings | |
| ``` | |
| Key environment variables: | |
| - `PORT=4123` - API server port | |
| - `EXAGGERATION=0.5` - Default emotion intensity (0.25-2.0) | |
| - `CFG_WEIGHT=0.5` - Default pace control (0.0-1.0) | |
| - `TEMPERATURE=0.8` - Default sampling temperature (0.05-5.0) | |
| - `VOICE_SAMPLE_PATH=./voice-sample.mp3` - Path to voice sample file | |
| - `DEVICE=auto` - Device selection (auto/cuda/mps/cpu) | |
| See `.env.example` for all available options. | |
| ### Running the API | |
| Start the API server: | |
| ```bash | |
| # Method 1: Direct uvicorn (recommended for development) | |
| uvicorn app.main:app --host 0.0.0.0 --port 4123 | |
| # Method 2: Using the main script | |
| python main.py | |
| # Method 3: With auto-reload for development | |
| uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload | |
| ``` | |
| The server will: | |
| - Automatically detect the best available device (CUDA, MPS, or CPU) | |
| - Load the Chatterbox TTS model asynchronously | |
| - Start the FastAPI server on `http://localhost:4123` (or your configured port) | |
| - Provide interactive documentation at `/docs` and `/redoc` | |
| ### API Documentation | |
| Once running, you can access: | |
| - **Interactive API Docs (Swagger UI)**: http://localhost:4123/docs | |
| - **Alternative Documentation (ReDoc)**: http://localhost:4123/redoc | |
| - **OpenAPI Schema**: http://localhost:4123/openapi.json | |
| ## API Endpoints | |
| ### 1. Text-to-Speech Generation | |
| **POST** `/v1/audio/speech` | |
| Generate speech from text using the Chatterbox TTS model. | |
| **Request Body (Pydantic Model):** | |
| ```json | |
| { | |
| "input": "Text to convert to speech", | |
| "voice": "alloy", // Ignored - uses voice-sample.mp3 | |
| "response_format": "wav", // Ignored - always returns WAV | |
| "speed": 1.0, // Ignored - use model's built-in parameters | |
| "exaggeration": 0.7, // Optional - override default (0.25-2.0) | |
| "cfg_weight": 0.4, // Optional - override default (0.0-1.0) | |
| "temperature": 0.9 // Optional - override default (0.05-5.0) | |
| } | |
| ``` | |
| **Validation:** | |
| - `input`: Required, 1-3000 characters, automatically trimmed | |
| - `exaggeration`: Optional, 0.25-2.0 range validation | |
| - `cfg_weight`: Optional, 0.0-1.0 range validation | |
| - `temperature`: Optional, 0.05-5.0 range validation | |
| **Response:** | |
| - Content-Type: `audio/wav` | |
| - Binary audio data in WAV format via StreamingResponse | |
| **Example:** | |
| ```bash | |
| curl -X POST http://localhost:4123/v1/audio/speech \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Hello, this is a test of the text to speech system."}' \ | |
| --output speech.wav | |
| ``` | |
| **With custom parameters:** | |
| ```bash | |
| curl -X POST http://localhost:4123/v1/audio/speech \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3}' \ | |
| --output dramatic.wav | |
| ``` | |
| ### 2. Health Check | |
| **GET** `/health` | |
| Check if the API is running and the model is loaded. | |
| **Response (HealthResponse model):** | |
| ```json | |
| { | |
| "status": "healthy", | |
| "model_loaded": true, | |
| "device": "cuda", | |
| "config": { | |
| "max_chunk_length": 280, | |
| "max_total_length": 3000, | |
| "voice_sample_path": "./voice-sample.mp3", | |
| "default_exaggeration": 0.5, | |
| "default_cfg_weight": 0.5, | |
| "default_temperature": 0.8 | |
| } | |
| } | |
| ``` | |
| ### 3. List Models | |
| **GET** `/v1/models` | |
| List available models (OpenAI API compatibility). | |
| **Response (ModelsResponse model):** | |
| ```json | |
| { | |
| "object": "list", | |
| "data": [ | |
| { | |
| "id": "chatterbox-tts-1", | |
| "object": "model", | |
| "created": 1677649963, | |
| "owned_by": "resemble-ai" | |
| } | |
| ] | |
| } | |
| ``` | |
| ### 4. Configuration Info | |
| **GET** `/config` | |
| Get current configuration (useful for debugging). | |
| **Response (ConfigResponse model):** | |
| ```json | |
| { | |
| "server": { | |
| "host": "0.0.0.0", | |
| "port": 4123 | |
| }, | |
| "model": { | |
| "device": "cuda", | |
| "voice_sample_path": "./voice-sample.mp3", | |
| "model_cache_dir": "./models" | |
| }, | |
| "defaults": { | |
| "exaggeration": 0.5, | |
| "cfg_weight": 0.5, | |
| "temperature": 0.8, | |
| "max_chunk_length": 280, | |
| "max_total_length": 3000 | |
| } | |
| } | |
| ``` | |
| ### 5. API Documentation Endpoints | |
| **GET** `/docs` - Interactive Swagger UI documentation | |
| **GET** `/redoc` - Alternative ReDoc documentation | |
| **GET** `/openapi.json` - OpenAPI schema specification | |
| ## Text Processing | |
| ### Automatic Chunking | |
| The API automatically handles long text inputs by: | |
| 1. **Character limit**: Splits text longer than the configured chunk size (default: 280 characters) | |
| 2. **Sentence preservation**: Attempts to split at sentence boundaries (`.`, `!`, `?`) | |
| 3. **Fallback splitting**: If sentences are too long, splits at commas, semicolons, or other natural breaks | |
| 4. **Audio concatenation**: Seamlessly combines audio from multiple chunks | |
| ### Maximum Limits | |
| - **Soft limit**: Configurable characters per chunk (default: 280) | |
| - **Hard limit**: Configurable total characters (default: 3000) | |
| - **Automatic processing**: No manual intervention required | |
| ## Error Handling | |
| FastAPI provides enhanced error handling with automatic validation: | |
| - **422 Unprocessable Entity**: Invalid input validation (Pydantic errors) | |
| - **400 Bad Request**: Business logic errors (text too long, etc.) | |
| - **500 Internal Server Error**: Model or processing errors | |
| **Error Response Format:** | |
| ```json | |
| { | |
| "error": { | |
| "message": "Missing required field: 'input'", | |
| "type": "invalid_request_error" | |
| } | |
| } | |
| ``` | |
| **Validation Error Example:** | |
| ```json | |
| { | |
| "detail": [ | |
| { | |
| "type": "greater_equal", | |
| "loc": ["body", "exaggeration"], | |
| "msg": "Input should be greater than or equal to 0.25", | |
| "input": 0.1 | |
| } | |
| ] | |
| } | |
| ``` | |
| ## Testing | |
| Use the enhanced test script to verify the API functionality: | |
| ```bash | |
| python tests/test_api.py | |
| ``` | |
| The test script will: | |
| - Test health check endpoint | |
| - Test models endpoint | |
| - Test API documentation endpoints (new!) | |
| - Generate speech for various text lengths | |
| - Test custom parameter validation | |
| - Test error handling with validation | |
| - Save generated audio files as `test_output_*.wav` | |
| ## Configuration | |
| You can configure the API through environment variables or by modifying `.env.example`: | |
| ```bash | |
| # Server Configuration | |
| PORT=4123 | |
| HOST=0.0.0.0 | |
| # TTS Model Settings | |
| EXAGGERATION=0.5 # Emotion intensity (0.25-2.0) | |
| CFG_WEIGHT=0.5 # Pace control (0.0-1.0) | |
| TEMPERATURE=0.8 # Sampling temperature (0.05-5.0) | |
| # Text Processing | |
| MAX_CHUNK_LENGTH=280 # Characters per chunk | |
| MAX_TOTAL_LENGTH=3000 # Total character limit | |
| # Voice and Model Settings | |
| VOICE_SAMPLE_PATH=./voice-sample.mp3 | |
| DEVICE=auto # auto/cuda/mps/cpu | |
| MODEL_CACHE_DIR=./models | |
| ``` | |
| ### Parameter Effects | |
| **Exaggeration (0.25-2.0):** | |
| - `0.3-0.4`: Very neutral, professional | |
| - `0.5`: Neutral (default) | |
| - `0.7-0.8`: More expressive | |
| - `1.0+`: Very dramatic (may be unstable) | |
| **CFG Weight (0.0-1.0):** | |
| - `0.2-0.3`: Faster speech | |
| - `0.5`: Balanced (default) | |
| - `0.7-0.8`: Slower, more deliberate | |
| **Temperature (0.05-5.0):** | |
| - `0.4-0.6`: More consistent | |
| - `0.8`: Balanced (default) | |
| - `1.0+`: More creative/random | |
| ## Docker Deployment | |
| For Docker deployment, see [DOCKER_README.md](DOCKER_README.md) for complete instructions. | |
| **Quick start with Docker Compose:** | |
| ```bash | |
| cp .env.example .env # Customize as needed | |
| docker compose up -d | |
| ``` | |
| **Quick start with Docker:** | |
| ```bash | |
| docker build -t chatterbox-tts . | |
| docker run -d -p 4123:4123 \ | |
| -v ./voice-sample.mp3:/app/voice-sample.mp3:ro \ | |
| -e EXAGGERATION=0.7 \ | |
| chatterbox-tts | |
| ``` | |
| ## Performance Notes | |
| **FastAPI Benefits:** | |
| - **Async performance**: Better handling of concurrent requests | |
| - **Faster JSON serialization**: ~25% faster than Flask | |
| - **Type validation**: Prevents invalid requests at the API level | |
| - **Auto documentation**: No manual API doc maintenance | |
| **Hardware Recommendations:** | |
| - **Model loading**: The model is loaded once at startup (can take 30-60 seconds) | |
| - **First request**: May be slower due to initial model warm-up | |
| - **Subsequent requests**: Should be faster due to model caching | |
| - **Memory usage**: Varies by device (GPU recommended for best performance) | |
| - **Concurrent requests**: FastAPI async support allows better multi-request handling | |
| ## Integration Examples | |
| ### Python with requests | |
| ```python | |
| import requests | |
| # Basic request | |
| response = requests.post( | |
| "http://localhost:4123/v1/audio/speech", | |
| json={"input": "Hello world!"} | |
| ) | |
| with open("output.wav", "wb") as f: | |
| f.write(response.content) | |
| # With custom parameters and validation | |
| response = requests.post( | |
| "http://localhost:4123/v1/audio/speech", | |
| json={ | |
| "input": "Exciting news!", | |
| "exaggeration": 0.8, | |
| "cfg_weight": 0.4, | |
| "temperature": 1.0 | |
| } | |
| ) | |
| # Handle validation errors | |
| if response.status_code == 422: | |
| print("Validation error:", response.json()) | |
| ``` | |
| ### JavaScript/Node.js | |
| ```javascript | |
| const response = await fetch('http://localhost:4123/v1/audio/speech', { | |
| method: 'POST', | |
| headers: { 'Content-Type': 'application/json' }, | |
| body: JSON.stringify({ | |
| input: 'Hello world!', | |
| exaggeration: 0.7, | |
| }), | |
| }); | |
| if (response.status === 422) { | |
| const error = await response.json(); | |
| console.log('Validation error:', error); | |
| } else { | |
| const audioBuffer = await response.arrayBuffer(); | |
| // Save or play the audio buffer | |
| } | |
| ``` | |
| ### cURL | |
| ```bash | |
| # Basic usage | |
| curl -X POST http://localhost:4123/v1/audio/speech \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Your text here"}' \ | |
| --output output.wav | |
| # With custom parameters | |
| curl -X POST http://localhost:4123/v1/audio/speech \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Dramatic text!", "exaggeration": 1.0, "cfg_weight": 0.3}' \ | |
| --output dramatic.wav | |
| # Test the interactive documentation | |
| curl http://localhost:4123/docs | |
| ``` | |
| ## Development Features | |
| ### FastAPI Development Tools | |
| - **Auto-reload**: Use `--reload` flag for development | |
| - **Interactive testing**: Use `/docs` for live API testing | |
| - **Type hints**: Full IDE support with Pydantic models | |
| - **Validation**: Automatic request/response validation | |
| - **OpenAPI**: Machine-readable API specification | |
| ### Development Mode | |
| ```bash | |
| # Start with auto-reload | |
| uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload | |
| # Or with verbose logging | |
| uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug | |
| ``` | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **Model not loading**: Ensure Chatterbox TTS is properly installed | |
| 2. **Voice sample missing**: Verify `voice-sample.mp3` exists at the configured path | |
| 3. **CUDA out of memory**: Try using CPU device (`DEVICE=cpu`) | |
| 4. **Slow performance**: GPU recommended; ensure CUDA/MPS is available | |
| 5. **Port conflicts**: Change `PORT` environment variable to an available port | |
| 6. **Uvicorn not found**: Install with `pip install uvicorn[standard]` | |
| ### FastAPI Specific Issues | |
| **Startup Issues:** | |
| ```bash | |
| # Check if uvicorn is installed | |
| uvicorn --version | |
| # Run with verbose logging | |
| uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug | |
| # Alternative startup method | |
| python main.py | |
| ``` | |
| **Validation Errors:** | |
| Visit `/docs` to see the interactive API documentation and test your requests. | |
| ### Checking Configuration | |
| ```bash | |
| # Check if API is running | |
| curl http://localhost:4123/health | |
| # View current configuration | |
| curl http://localhost:4123/config | |
| # Check API documentation | |
| curl http://localhost:4123/openapi.json | |
| # Test with simple text | |
| curl -X POST http://localhost:4123/v1/audio/speech \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"input": "Test"}' \ | |
| --output test.wav | |
| ``` | |
| ## Migration from Flask | |
| If you're migrating from the previous Flask version: | |
| 1. **Dependencies**: Update to `fastapi` and `uvicorn` instead of `flask` | |
| 2. **Startup**: Use `uvicorn app.main:app` instead of `python api.py` | |
| 3. **Documentation**: Visit `/docs` for interactive API testing | |
| 4. **Validation**: Error responses now use HTTP 422 for validation errors | |
| 5. **Performance**: Expect 25-40% better performance for JSON responses | |
| All existing API endpoints and request/response formats remain compatible. | |