Pocket-TTS

Sleeping

App Files Files Community

Pocket-TTS / README.md

Nymbo

Update README.md

ad9ea82 verified about 1 month ago

preview code

raw

history blame contribute delete

2.2 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

metadata

title: Pocket-TTS 100M
emoji: 🔊
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: High quality, efficient voice cloning. Just 100M parameters.

Pocket-TTS

A lightweight text-to-speech application built with kyutai/pocket-tts and Gradio.

Features

Fast CPU inference — ~6x faster than real-time on modern CPUs
Low latency — ~200ms to first audio chunk
Streaming output — Audio plays as it generates
Voice cloning — Use custom voice samples (MP3, WAV, FLAC, etc.)
Pre-computed embeddings — Voices work without voice cloning auth on HF Spaces

Quick Start

pip install -r requirements.txt
python app.py

Open http://127.0.0.1:7860 in your browser.

Adding Custom Voices

Drop audio files (MP3, WAV, etc.) into the voices/ directory
Restart the app
Embeddings are created automatically on first boot (requires HF auth locally)
Once created, embeddings are saved to embeddings/ and work without auth

Structure

Pocket-TTS/
├── app.py
├── requirements.txt
├── voices/           # Your custom voice audio files
│   └── my_voice.mp3
└── embeddings/       # Auto-generated (commit these for HF Spaces)
    └── my_voice.safetensors

HuggingFace Spaces Deployment

Option 1: Pre-commit embeddings (no auth needed on Space)

Run the app locally first (with HF auth) to generate embeddings
Commit both voices/ and embeddings/ directories
The Space will use pre-computed embeddings

Option 2: Auto-create embeddings on Space (requires valid token)

Accept terms at https://huggingface.co/kyutai/pocket-tts
Add HF_TOKEN secret in Space settings (must be a valid token)
Embeddings are created automatically on first boot

Model Info

Model: kyutai/pocket-tts
Parameters: 100M
Language: English only
Sample rate: 24kHz

License

See the kyutai/pocket-tts model card for licensing information.