Spaces:

CherithCutestory
/

vlengine-chatterbox

Paused

App Files Files Community

vlengine-chatterbox / README.md

CherithCutestory

First chatterbox engine container

9e71d18 3 months ago

preview code

raw

history blame contribute delete

2.52 kB

	---
	title: VoxLibris Chatterbox TTS Engine
	emoji: 🗣️
	colorFrom: purple
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# VoxLibris Chatterbox TTS Engine

	A HuggingFace Space that serves [Chatterbox TTS](https://github.com/resemble-ai/chatterbox)
	as a REST API, implementing the
	[VoxLibris TTS Engine API Contract](https://github.com/your-repo/docs/tts-api-contract.md).

	## Endpoints

	### POST /GetEngineDetails

	Returns engine capabilities, supported emotions, and voice cloning support.

	### POST /ConvertTextToSpeech

	Converts text to speech with voice cloning. Requires a `voice_to_clone_sample`
	(base64-encoded WAV). Supports emotion-driven expressiveness via the exaggeration
	parameter, mapped automatically from VoxLibris emotions.

	### GET /health

	Returns model loading status.

	## Authentication

	Set the `API_KEY` secret in your HuggingFace Space settings.
	Requests must include `Authorization: Bearer <your-key>` header.
	Leave `API_KEY` unset to disable authentication.

	## Voice Cloning

	Chatterbox is a voice-cloning TTS engine — every request requires a reference
	voice sample. Send a base64-encoded WAV file in the `voice_to_clone_sample`
	field. A 6-15 second clear speech sample works best.

	## Emotion Support

	Chatterbox controls expressiveness through its `exaggeration` parameter (0.0-1.0).
	The engine automatically maps VoxLibris emotions to appropriate exaggeration levels:

	\| Emotion \| Exaggeration \| Description \|
	\|-----------\|-------------\|---------------------------\|
	\| neutral \| 0.50 \| Normal, conversational \|
	\| calm \| 0.40 \| Subdued, relaxed \|
	\| happy \| 0.70 \| Cheerful, upbeat \|
	\| sad \| 0.60 \| Somber, downcast \|
	\| angry \| 0.85 \| Intense, forceful \|
	\| fear \| 0.75 \| Tense, urgent \|
	\| excited \| 0.90 \| High energy, enthusiastic \|
	\| surprise \| 0.80 \| Startled, astonished \|

	The `intensity` parameter (1-100) scales the exaggeration further.

	## Limits

	- Maximum 300 characters per request (longer text is truncated at word boundary)
	- Output: 24kHz mono 16-bit WAV

	## Deployment

	1. Create a new HuggingFace Space with Docker SDK
	2. Upload the contents of this folder
	3. Set the `API_KEY` secret in Space settings (optional)
	4. The model downloads automatically on first startup (~500 MB)
	5. Requires GPU (T4 minimum recommended)
	6. Register the Space URL in VoxLibris Settings under TTS Engine Management