vlengine-chatterbox / README.md
CherithCutestory's picture
First chatterbox engine container
9e71d18
metadata
title: VoxLibris Chatterbox TTS Engine
emoji: 🗣️
colorFrom: purple
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false

VoxLibris Chatterbox TTS Engine

A HuggingFace Space that serves Chatterbox TTS as a REST API, implementing the VoxLibris TTS Engine API Contract.

Endpoints

POST /GetEngineDetails

Returns engine capabilities, supported emotions, and voice cloning support.

POST /ConvertTextToSpeech

Converts text to speech with voice cloning. Requires a voice_to_clone_sample (base64-encoded WAV). Supports emotion-driven expressiveness via the exaggeration parameter, mapped automatically from VoxLibris emotions.

GET /health

Returns model loading status.

Authentication

Set the API_KEY secret in your HuggingFace Space settings. Requests must include Authorization: Bearer <your-key> header. Leave API_KEY unset to disable authentication.

Voice Cloning

Chatterbox is a voice-cloning TTS engine — every request requires a reference voice sample. Send a base64-encoded WAV file in the voice_to_clone_sample field. A 6-15 second clear speech sample works best.

Emotion Support

Chatterbox controls expressiveness through its exaggeration parameter (0.0-1.0). The engine automatically maps VoxLibris emotions to appropriate exaggeration levels:

Emotion Exaggeration Description
neutral 0.50 Normal, conversational
calm 0.40 Subdued, relaxed
happy 0.70 Cheerful, upbeat
sad 0.60 Somber, downcast
angry 0.85 Intense, forceful
fear 0.75 Tense, urgent
excited 0.90 High energy, enthusiastic
surprise 0.80 Startled, astonished

The intensity parameter (1-100) scales the exaggeration further.

Limits

  • Maximum 300 characters per request (longer text is truncated at word boundary)
  • Output: 24kHz mono 16-bit WAV

Deployment

  1. Create a new HuggingFace Space with Docker SDK
  2. Upload the contents of this folder
  3. Set the API_KEY secret in Space settings (optional)
  4. The model downloads automatically on first startup (~500 MB)
  5. Requires GPU (T4 minimum recommended)
  6. Register the Space URL in VoxLibris Settings under TTS Engine Management