Spaces:
Sleeping
Sleeping
| title: Pocket-TTS 100M | |
| emoji: π | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 6.2.0 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: High quality, efficient voice cloning. Just 100M parameters. | |
| # Pocket-TTS | |
| A lightweight text-to-speech application built with [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) and Gradio. | |
| ## Features | |
| - **Fast CPU inference** β ~6x faster than real-time on modern CPUs | |
| - **Low latency** β ~200ms to first audio chunk | |
| - **Streaming output** β Audio plays as it generates | |
| - **Voice cloning** β Use custom voice samples (MP3, WAV, FLAC, etc.) | |
| - **Pre-computed embeddings** β Voices work without voice cloning auth on HF Spaces | |
| ## Quick Start | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| Open http://127.0.0.1:7860 in your browser. | |
| ## Adding Custom Voices | |
| 1. Drop audio files (MP3, WAV, etc.) into the `voices/` directory | |
| 2. Restart the app | |
| 3. Embeddings are created automatically on first boot (requires HF auth locally) | |
| 4. Once created, embeddings are saved to `embeddings/` and work without auth | |
| ### Structure | |
| ``` | |
| Pocket-TTS/ | |
| βββ app.py | |
| βββ requirements.txt | |
| βββ voices/ # Your custom voice audio files | |
| β βββ my_voice.mp3 | |
| βββ embeddings/ # Auto-generated (commit these for HF Spaces) | |
| βββ my_voice.safetensors | |
| ``` | |
| ## HuggingFace Spaces Deployment | |
| **Option 1: Pre-commit embeddings (no auth needed on Space)** | |
| 1. Run the app locally first (with HF auth) to generate embeddings | |
| 2. Commit both `voices/` and `embeddings/` directories | |
| 3. The Space will use pre-computed embeddings | |
| **Option 2: Auto-create embeddings on Space (requires valid token)** | |
| 1. Accept terms at https://huggingface.co/kyutai/pocket-tts | |
| 2. Add `HF_TOKEN` secret in Space settings (must be a valid token) | |
| 3. Embeddings are created automatically on first boot | |
| ## Model Info | |
| - **Model**: [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) | |
| - **Parameters**: 100M | |
| - **Language**: English only | |
| - **Sample rate**: 24kHz | |
| ## License | |
| See the [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) model card for licensing information. | |