pocket-tts-web

Running

App Files Files Community

pocket-tts-web / README.md

KevinAHM

Add COOP/COEP headers for multi-threading

a5b0453 about 2 months ago

preview code

raw

history blame contribute delete

2.82 kB

metadata

title: Pocket TTS ONNX Web Demo
emoji: 🌖
colorFrom: yellow
colorTo: pink
sdk: static
app_file: index.html
pinned: false
license: cc-by-4.0
short_description: Real-time voice cloning entirely in your browser! (CPU)
models:
  - KevinAHM/pocket-tts-onnx
custom_headers:
  cross-origin-embedder-policy: require-corp
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin

Pocket TTS Web Demo

Real-time neural text-to-speech with voice cloning, running entirely in your browser.

Features

Voice Cloning: Clone any voice from a short audio sample
Predefined Voices: 3 bundled voices (Cosette, Jean, Fantine)
Streaming Audio: Real-time audio generation with low latency
Pure Browser: No server required, runs entirely in WebAssembly

Model Files

The demo requires the following ONNX models in the onnx/ directory:

File	Size	Purpose
`mimi_encoder.onnx`	~70 MB	Voice audio → embeddings
`text_conditioner.onnx`	~16 MB	Text tokens → embeddings
`flow_lm_main_int8.onnx`	~73 MB	AR transformer (INT8)
`flow_lm_flow_int8.onnx`	~10 MB	Flow matching network (INT8)
`mimi_decoder_int8.onnx`	~22 MB	Latents → audio decoder (INT8)

Additional files:

tokenizer.model - SentencePiece tokenizer (~60 KB)
voices.bin - Predefined voice embeddings (~1.5 MB)

Browser Requirements

Modern browser with WebAssembly support
Chrome, Edge, Firefox, or Safari (latest versions)
~200 MB RAM for model loading

Voice Cloning

Click "Upload Voice" or select "Custom (Upload)" from the dropdown
Upload an audio file (WAV, MP3, etc.) with clear speech
Best results with 3-10 seconds of clean audio
The voice will be encoded and used for all subsequent generations

File Structure

pocket-tts-web/
├── index.html              # Main HTML page
├── onnx-streaming.js       # Main thread controller
├── inference-worker.js     # Web Worker for ONNX inference
├── PCMPlayerWorklet.js     # Audio playback worklet
├── EventEmitter.js         # Event utilities
├── sentencepiece.js        # SentencePiece tokenizer library
├── style.css               # Styles
├── tokenizer.model         # SentencePiece model
├── voices.bin              # Predefined voice embeddings
└── onnx/
    ├── mimi_encoder.onnx
    ├── text_conditioner.onnx
    ├── flow_lm_main_int8.onnx
    ├── flow_lm_flow_int8.onnx
    └── mimi_decoder_int8.onnx

License

Models & Voice Embeddings: CC BY 4.0 (inherited from kyutai/pocket-tts)
Code: Apache 2.0