File size: 2,197 Bytes
f944d36
ad9ea82
20b63d2
 
77dfc08
f944d36
20b63d2
f944d36
5492fcb
f944d36
 
 
 
4dea557
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
title: Pocket-TTS 100M
emoji: πŸ”Š
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: High quality, efficient voice cloning. Just 100M parameters.
---

# Pocket-TTS

A lightweight text-to-speech application built with [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) and Gradio.

## Features

- **Fast CPU inference** β€” ~6x faster than real-time on modern CPUs
- **Low latency** β€” ~200ms to first audio chunk
- **Streaming output** β€” Audio plays as it generates
- **Voice cloning** β€” Use custom voice samples (MP3, WAV, FLAC, etc.)
- **Pre-computed embeddings** β€” Voices work without voice cloning auth on HF Spaces

## Quick Start

```bash
pip install -r requirements.txt
python app.py
```

Open http://127.0.0.1:7860 in your browser.

## Adding Custom Voices

1. Drop audio files (MP3, WAV, etc.) into the `voices/` directory
2. Restart the app
3. Embeddings are created automatically on first boot (requires HF auth locally)
4. Once created, embeddings are saved to `embeddings/` and work without auth

### Structure

```
Pocket-TTS/
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ voices/           # Your custom voice audio files
β”‚   └── my_voice.mp3
└── embeddings/       # Auto-generated (commit these for HF Spaces)
    └── my_voice.safetensors
```

## HuggingFace Spaces Deployment

**Option 1: Pre-commit embeddings (no auth needed on Space)**

1. Run the app locally first (with HF auth) to generate embeddings
2. Commit both `voices/` and `embeddings/` directories
3. The Space will use pre-computed embeddings

**Option 2: Auto-create embeddings on Space (requires valid token)**

1. Accept terms at https://huggingface.co/kyutai/pocket-tts
2. Add `HF_TOKEN` secret in Space settings (must be a valid token)
3. Embeddings are created automatically on first boot

## Model Info

- **Model**: [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts)
- **Parameters**: 100M
- **Language**: English only
- **Sample rate**: 24kHz

## License

See the [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts) model card for licensing information.