Pocket-TTS

Sleeping

App Files Files Community

Pocket-TTS / README.md

Nymbo

Update README.md

ad9ea82 verified about 1 month ago

preview code

raw

history blame contribute delete

2.2 kB

	---
	title: Pocket-TTS 100M
	emoji: 🔊
	colorFrom: green
	colorTo: blue
	sdk: gradio
	sdk_version: 6.2.0
	app_file: app.py
	pinned: true
	license: apache-2.0
	short_description: High quality, efficient voice cloning. Just 100M parameters.
	---

	# Pocket-TTS

	A lightweight text-to-speech application built with [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) and Gradio.

	## Features

	- Fast CPU inference — ~6x faster than real-time on modern CPUs
	- Low latency — ~200ms to first audio chunk
	- Streaming output — Audio plays as it generates
	- Voice cloning — Use custom voice samples (MP3, WAV, FLAC, etc.)
	- Pre-computed embeddings — Voices work without voice cloning auth on HF Spaces

	## Quick Start

	```bash
	pip install -r requirements.txt
	python app.py
	```

	Open http://127.0.0.1:7860 in your browser.

	## Adding Custom Voices

	1. Drop audio files (MP3, WAV, etc.) into the `voices/` directory
	2. Restart the app
	3. Embeddings are created automatically on first boot (requires HF auth locally)
	4. Once created, embeddings are saved to `embeddings/` and work without auth

	### Structure

	```
	Pocket-TTS/
	├── app.py
	├── requirements.txt
	├── voices/ # Your custom voice audio files
	│ └── my_voice.mp3
	└── embeddings/ # Auto-generated (commit these for HF Spaces)
	└── my_voice.safetensors
	```

	## HuggingFace Spaces Deployment

	Option 1: Pre-commit embeddings (no auth needed on Space)

	1. Run the app locally first (with HF auth) to generate embeddings
	2. Commit both `voices/` and `embeddings/` directories
	3. The Space will use pre-computed embeddings

	Option 2: Auto-create embeddings on Space (requires valid token)

	1. Accept terms at https://huggingface.co/kyutai/pocket-tts
	2. Add `HF_TOKEN` secret in Space settings (must be a valid token)
	3. Embeddings are created automatically on first boot

	## Model Info

	- Model: [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts)
	- Parameters: 100M
	- Language: English only
	- Sample rate: 24kHz

	## License

	See the [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) model card for licensing information.