nuviq / README.md
jameszokah's picture
Update README.md to include project metadata and configuration reference for Nuviq
073e556
metadata
title: Nuviq
emoji: 🏒
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Chatterbox API TTS header

Chatterbox TTS API

GitHub stars GitHub forks GitHub issues GitHub last commit Discord

FastAPI-powered REST API for Chatterbox TTS, providing OpenAI-compatible text-to-speech endpoints with voice cloning capabilities and additional features on top of the chatterbox-tts base package.

Features

πŸš€ OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
⚑ FastAPI Performance - High-performance async API with automatic documentation
🎨 React Frontend - Includes an optional, ready-to-use web interface
🎭 Voice Cloning - Use your own voice samples for personalized speech
🎀 Voice Upload - Upload custom voice files per request or use configured default
πŸ“ Smart Text Processing - Automatic chunking for long texts
πŸ“Š Real-time Status - Monitor TTS progress, statistics, and request history
🐳 Docker Ready - Full containerization support
βš™οΈ Configurable - Extensive environment variable configuration
πŸŽ›οΈ Parameter Control - Real-time adjustment of speech characteristics
πŸ“š Auto Documentation - Interactive API docs at /docs and /redoc
πŸ”§ Type Safety - Full Pydantic validation for requests and responses
🧠 Memory Management - Advanced memory monitoring and automatic cleanup

Quick Start

Local Installation

Option A: Using uv (Recommended - Faster & Better Dependencies)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies with uv (automatically creates venv)
uv sync

# Copy and customize environment variables
cp .env.example .env

# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
uv run main.py

πŸ’‘ Why uv? Users report better compatibility with chatterbox-tts, 25-40% faster installs, and superior dependency resolution. See migration guide β†’

Option B: Using pip (Traditional)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Setup environment β€” using Python 3.11
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and customize environment variables
cp .env.example .env

# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3

# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
python main.py

Ran into issues? Check the troubleshooting section

Docker (Recommended)

# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Use Docker-optimized environment variables
cp .env.example.docker .env  # Docker-specific paths, ready to use
# Or: cp .env.example .env    # Local development paths, needs customization

# Choose your deployment method:

# API Only (default)
docker compose -f docker/docker-compose.yml up -d             # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d          # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d         # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d      # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d         # CPU-only

# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d             # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d         # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d      # uv + GPU + Frontend

# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f

# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Chatterbox TTS!"}' \
  --output test.wav
πŸš€ Running with the Web UI (Full Stack)

This project includes an optional React-based web UI. Use Docker Compose profiles to easily opt in or out of the frontend:

With Docker Compose Profiles

# API only (default behavior)
docker compose -f docker/docker-compose.yml up -d

# API + Frontend + Web UI (with --profile frontend)
docker compose -f docker/docker-compose.yml --profile frontend up -d

# Or use the convenient helper script for fullstack:
python start.py fullstack

# Same pattern works with all deployment variants:
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d    # GPU + Frontend
docker compose -f docker/docker-compose.uv.yml --profile frontend up -d     # uv + Frontend
docker compose -f docker/docker-compose.cpu.yml --profile frontend up -d    # CPU + Frontend

Local Development

For local development, you can run the API and frontend separately:

# Start the API first (follow earlier instructions)
# Then run the frontend:
cd frontend && npm install && npm run dev

Click the link provided from Vite to access the web UI.

Build for Production

Build the frontend for production deployment:

cd frontend && npm install && npm run build

You can then access it directly from your local file system at /dist/index.html.

Port Configuration

  • API Only: Accessible at http://localhost:4123 (direct API access)
  • With Frontend: Web UI at http://localhost:4321, API requests routed via proxy

The frontend uses a reverse proxy to route requests, so when running with --profile frontend, the web interface will be available at http://localhost:4321 while the API runs behind the proxy.

Screenshots of Frontend (Web UI)

Chatterbox TTS API - Frontend - Dark Mode Chatterbox TTS API - Frontend - Light Mode
Chatterbox TTS API - Frontend Processing - Dark Mode Chatterbox TTS API - Frontend Processing - Light Mode

πŸ–ΌοΈ View screenshot of full frontend web UI β€” light mode / dark mode

API Usage

Basic Text-to-Speech (Default Voice)

This endpoint works for both the API-only and full-stack setups.

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Your text here"}' \
  --output speech.wav

Using Custom Parameters (JSON)

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3, "temperature": 0.9}' \
  --output dramatic.wav

Custom Voice Upload

Upload your own voice sample for personalized speech:

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Hello with my custom voice!" \
  -F "exaggeration=0.8" \
  -F "voice_file=@my_voice.mp3" \
  --output custom_voice_speech.wav

With Custom Parameters and Voice Upload

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Dramatic speech!" \
  -F "exaggeration=1.2" \
  -F "cfg_weight=0.3" \
  -F "temperature=0.9" \
  -F "voice_file=@dramatic_voice.wav" \
  --output dramatic.wav

🎡 Real-time Audio Streaming

The API supports real-time audio streaming for lower latency and better user experience. Audio chunks are generated and sent as they're ready, allowing you to start playing audio before complete generation.

Quick Start

# Basic streaming
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "This streams in real-time!"}' \
  --output streaming.wav

# Real-time playback
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "Play as it generates!"}' \
  | ffplay -f wav -i pipe:0 -autoexit -nodisp

πŸš€ Complete Streaming Documentation β†’

For comprehensive streaming features including:

  • Advanced chunking strategies (sentence, paragraph, word, fixed)
  • Quality presets (fast, balanced, high)
  • Configurable parameters and performance tuning
  • Real-time progress monitoring
  • Python, JavaScript, and cURL examples
  • Integration patterns for different use cases

Key Benefits:

  • ⚑ Lower latency - Start hearing audio in 1-2 seconds
  • 🎯 Better UX - No waiting for complete generation
  • πŸ’Ύ Memory efficient - Process chunks individually
  • πŸŽ›οΈ Configurable - Choose speed vs quality trade-offs
🐍 Python Examples

Default Voice (JSON)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Upload Endpoint (Default Voice)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech/upload",
    data={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Custom Voice Upload

import requests

with open("my_voice.mp3", "rb") as voice_file:
    response = requests.post(
        "http://localhost:4123/v1/audio/speech/upload",
        data={
            "input": "Hello with my custom voice!",
            "exaggeration": 0.8,
            "temperature": 1.0
        },
        files={
            "voice_file": ("my_voice.mp3", voice_file, "audio/mpeg")
        }
    )

with open("custom_output.wav", "wb") as f:
    f.write(response.content)

Basic Streaming Example

import requests

# Stream audio generation in real-time
response = requests.post(
    "http://localhost:4123/v1/audio/speech/stream",
    json={
        "input": "This will stream as it's generated!",
        "exaggeration": 0.8
    },
    stream=True  # Enable streaming mode
)

with open("streaming_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)
            print(f"Received chunk: {len(chunk)} bytes")

πŸ“š Complete Streaming Examples & Documentation β†’

Including real-time playback, progress monitoring, custom voice uploads, and advanced integration patterns.

Voice File Requirements

Supported Formats:

  • MP3 (.mp3)
  • WAV (.wav)
  • FLAC (.flac)
  • M4A (.m4a)
  • OGG (.ogg)

Requirements:

  • Maximum file size: 10MB
  • Recommended duration: 10-30 seconds of clear speech
  • Avoid background noise for best results
  • Higher quality audio produces better voice cloning

Configuration

The project provides two environment example files:

  • .env.example - For local development (uses ./models, ./voice-sample.mp3)
  • .env.example.docker - For Docker deployment (uses /cache, /app/voice-sample.mp3)

Choose the appropriate one for your setup:

# For local development
cp .env.example .env

# For Docker deployment
cp .env.example.docker .env

Key environment variables (see the example files for full list):

Variable Default Description
PORT 4123 API server port
EXAGGERATION 0.5 Emotion intensity (0.25-2.0)
CFG_WEIGHT 0.5 Pace control (0.0-1.0)
TEMPERATURE 0.8 Sampling randomness (0.05-5.0)
VOICE_SAMPLE_PATH ./voice-sample.mp3 Voice sample for cloning
DEVICE auto Device (auto/cuda/mps/cpu)
🎭 Voice Cloning

Replace the default voice sample:

# Replace the default voice sample
cp your-voice.mp3 voice-sample.mp3

# Or set a custom path
echo "VOICE_SAMPLE_PATH=/path/to/your/voice.mp3" >> .env

For best results:

  • Use 10-30 seconds of clear speech
  • Avoid background noise
  • Prefer WAV or high-quality MP3
🐳 Docker Deployment

Development

docker compose -f docker/docker-compose.yml up

Production

# Create production environment
cp .env.example.docker .env
nano .env  # Set production values

# Deploy
docker compose -f docker/docker-compose.yml up -d

With GPU Support

# Use GPU-enabled compose file
# Ensure NVIDIA Container Toolkit is installed
docker compose -f docker/docker-compose.gpu.yml up -d
πŸ“š API Reference

API Endpoints

Endpoint Method Description
/audio/speech POST Generate speech from text (complete)
/audio/speech/upload POST Generate speech with voice upload
/audio/speech/stream POST Stream speech generation (docs)
/audio/speech/stream/upload POST Stream speech with voice upload (docs)
/health GET Health check and status
/config GET Current configuration
/v1/models GET Available models (OpenAI compat)
/status GET TTS processing status & progress
/status/progress GET Real-time progress (lightweight)
/status/statistics GET Processing statistics
/status/history GET Recent request history
/info GET Complete API information
/docs GET Interactive API documentation
/redoc GET Alternative API documentation

Parameters Reference

Speech Generation Parameters

Exaggeration (0.25-2.0)

  • 0.3-0.4: Professional, neutral
  • 0.5: Default balanced
  • 0.7-0.8: More expressive
  • 1.0+: Very dramatic

CFG Weight (0.0-1.0)

  • 0.2-0.3: Faster speech
  • 0.5: Default pace
  • 0.7-0.8: Slower, deliberate

Temperature (0.05-5.0)

  • 0.4-0.6: More consistent
  • 0.8: Default balance
  • 1.0+: More creative/random
🧠 Memory Management

The API includes advanced memory management to prevent memory leaks and optimize performance:

Memory Management Features

  • Automatic Cleanup: Periodic garbage collection and tensor cleanup
  • CUDA Memory Management: Automatic GPU cache clearing
  • Memory Monitoring: Real-time memory usage tracking
  • Manual Controls: API endpoints for manual cleanup operations

Memory Configuration

Variable Default Description
MEMORY_CLEANUP_INTERVAL 5 Cleanup memory every N requests
CUDA_CACHE_CLEAR_INTERVAL 3 Clear CUDA cache every N requests
ENABLE_MEMORY_MONITORING true Enable detailed memory logging

Memory Monitoring Endpoints

# Get memory status
curl http://localhost:4123/memory

# Trigger manual cleanup
curl "http://localhost:4123/memory?cleanup=true&force_cuda_clear=true"

# Reset memory tracking (with confirmation)
curl -X POST "http://localhost:4123/memory/reset?confirm=true"

Real-time Status Tracking

Monitor TTS processing in real-time:

# Check current processing status
curl "http://localhost:4123/v1/status/progress"

# Get detailed status with memory and stats
curl "http://localhost:4123/v1/status?include_memory=true&include_stats=true"

# View processing statistics
curl "http://localhost:4123/v1/status/statistics"

# Check request history
curl "http://localhost:4123/v1/status/history?limit=5"

# Get comprehensive API information
curl "http://localhost:4123/info"

Status Response Example:

{
  "is_processing": true,
  "status": "generating_audio",
  "current_step": "Generating audio for chunk 2/4",
  "current_chunk": 2,
  "total_chunks": 4,
  "progress_percentage": 50.0,
  "duration_seconds": 2.5,
  "text_preview": "Your text being processed..."
}

See Status API Documentation for complete details.

Memory Testing

Run the memory management test suite:

# Test memory patterns and cleanup
python tests/test_memory.py  # or: uv run tests/test_memory.py

# Monitor memory during testing
watch -n 1 'curl -s http://localhost:4123/memory | jq .memory_info'

Memory Optimization Tips

For High-Volume Production:

MEMORY_CLEANUP_INTERVAL=3
CUDA_CACHE_CLEAR_INTERVAL=2
ENABLE_MEMORY_MONITORING=false  # Reduce logging overhead
MAX_CHUNK_LENGTH=200             # Smaller chunks for less memory usage

For Development/Debugging:

MEMORY_CLEANUP_INTERVAL=1
CUDA_CACHE_CLEAR_INTERVAL=1
ENABLE_MEMORY_MONITORING=true

Memory Leak Prevention:

  • Tensors are automatically moved to CPU before deletion
  • Gradient tracking is disabled during inference
  • Audio chunks are cleaned up after concatenation
  • CUDA cache is periodically cleared
  • Python garbage collection is triggered regularly
πŸ§ͺ Testing

Run the test script to verify the API functionality:

python tests/test_api.py

The test script will:

  • Test health check endpoint
  • Test models endpoint
  • Test API documentation endpoints (new!)
  • Generate speech for various text lengths
  • Test custom parameter validation
  • Test error handling with validation
  • Save generated audio files as test_output_*.wav
⚑ Performance

FastAPI Benefits:

  • Async support: Better concurrent request handling
  • Faster serialization: JSON responses ~25% faster than Flask
  • Type validation: Pydantic models prevent invalid requests
  • Auto documentation: No manual API doc maintenance

Hardware Recommendations:

  • CPU: Works but slower, reduce chunk size for better memory usage
  • GPU: Recommended for production, significantly faster
  • Memory: 4GB minimum, 8GB+ recommended
  • Concurrency: Async support allows better multi-request handling
πŸ”§ Troubleshooting

Common Issues

CUDA/CPU Compatibility Error

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

This happens because chatterbox-tts models require PyTorch with CUDA support, even when running on CPU. Solutions:

# Option 1: Use default setup (now includes CUDA-enabled PyTorch)
docker compose -f docker/docker-compose.yml up -d

# Option 2: Use explicit CUDA setup (traditional)
docker compose -f docker/docker-compose.gpu.yml up -d

# Option 3: Use uv + GPU setup (recommended for GPU users)
docker compose -f docker/docker-compose.uv.gpu.yml up -d

# Option 4: Use CPU-only setup (may have compatibility issues)
docker compose -f docker/docker-compose.cpu.yml up -d

# Option 5: Clear model cache and retry with CUDA-enabled setup
docker volume rm chatterbox-tts-api_chatterbox-models
docker compose -f docker/docker-compose.yml up -d --build

# Option 6: Try uv for better dependency resolution
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123

For local development, install PyTorch with CUDA support:

# With pip
pip uninstall torch torchvision torchaudio
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install chatterbox-tts

# With uv (handles this automatically)
uv sync

Windows Users, using pip & having issues:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install --force-reinstall typing_extensions

Port conflicts

# Change port
echo "PORT=4124" >> .env

GPU not detected

# Force CPU mode
echo "DEVICE=cpu" >> .env

Out of memory

# Reduce chunk size
echo "MAX_CHUNK_LENGTH=200" >> .env

Model download fails

# Clear cache and retry
rm -rf models/
uvicorn app.main:app --host 0.0.0.0 --port 4123  # or: uv run main.py

FastAPI startup issues

# Check if uvicorn is installed
uvicorn --version

# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug

# Alternative startup method
python main.py
πŸ’» Development

Project Structure

This project follows a clean, modular architecture for maintainability:

app/                     # FastAPI backend application
β”œβ”€β”€ __init__.py           # Main package
β”œβ”€β”€ config.py            # Configuration management
β”œβ”€β”€ main.py              # FastAPI application
β”œβ”€β”€ models/              # Pydantic models
β”‚   β”œβ”€β”€ requests.py      # Request models
β”‚   └── responses.py     # Response models
β”œβ”€β”€ core/                # Core functionality
β”‚   β”œβ”€β”€ memory.py        # Memory management
β”‚   β”œβ”€β”€ text_processing.py # Text processing utilities
β”‚   └── tts_model.py     # TTS model management
└── api/                 # API endpoints
    β”œβ”€β”€ router.py        # Main router
    └── endpoints/       # Individual endpoint modules
        β”œβ”€β”€ speech.py    # TTS endpoint
        β”œβ”€β”€ health.py    # Health check
        β”œβ”€β”€ models.py    # Model listing
        β”œβ”€β”€ memory.py    # Memory management
        └── config.py    # Configuration

frontend/                # React frontend application
β”œβ”€β”€ src/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ nginx.conf          # Integrated proxy configuration
└── package.json

docker/                  # Docker files consolidated
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ Dockerfile.uv       # uv-optimized image
β”œβ”€β”€ Dockerfile.gpu      # GPU-enabled image
β”œβ”€β”€ Dockerfile.cpu      # CPU-only image
β”œβ”€β”€ Dockerfile.uv.gpu   # uv + GPU image
β”œβ”€β”€ docker-compose.yml  # Standard deployment
β”œβ”€β”€ docker-compose.uv.yml # uv deployment
β”œβ”€β”€ docker-compose.gpu.yml # GPU deployment
β”œβ”€β”€ docker-compose.uv.gpu.yml # uv + GPU deployment
└── docker-compose.cpu.yml # CPU-only deployment

tests/                   # Test suite
β”œβ”€β”€ test_api.py         # API tests
└── test_memory.py      # Memory tests

main.py                  # Main entry point
start.py                 # Development helper script

Quick Start Scripts

# Development mode with auto-reload
python start.py dev

# Production mode
python start.py prod

# Full Stack mode with UI (using Docker)
python start.py fullstack

# Run tests
python start.py test

# View project structure
python start.py info

Local Development

# Install in development mode (pip)
pip install -e .

# Or with uv
uv sync --extra dev

# Start with auto-reload (FastAPI development)
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload

# Or use the main script
python main.py

# Or use the development helper
python start.py dev

Testing

# Run API tests
python tests/test_api.py  # or: uv run tests/test_api.py

# Run memory tests
python tests/test_memory.py

# Test specific endpoint
curl http://localhost:4123/health

# Check API documentation
curl http://localhost:4123/openapi.json

FastAPI Development Features

  • Auto-reload: Use --reload flag for development
  • Interactive docs: Visit /docs for live API testing
  • Type hints: Full IDE support with Pydantic models
  • Validation: Automatic request/response validation
  • Modular structure: Easy to extend and maintain
🀝 Contributing
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Ensure FastAPI docs are updated
  6. Submit a pull request

Support