Instructions to use Pixel-Labs/threadcast-neural-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- KittenTTS
How to use Pixel-Labs/threadcast-neural-models with KittenTTS:
from kittentts import KittenTTS m = KittenTTS("Pixel-Labs/threadcast-neural-models") audio = m.generate("This high quality TTS model works without a GPU") # Save the audio import soundfile as sf sf.write('output.wav', audio, 24000) - Notebooks
- Google Colab
- Kaggle
ThreadCast β Chrome Extension Neural Models Mirror
Hugging Face transformers.jsβformat mirror of the on-device neural TTS models used by the ThreadCast Chrome extension. The Android counterpart lives in two siblings: ../android/ (local dev staging β sherpa-onnx upstream artifacts) and ../mobile-android/ (production zips downloaded by the Android app at runtime). See the parent README for repository-wide context, branding, and license summary.
If you're an extension user, you don't need anything here β the extension downloads what it needs automatically the first time you select a Neural engine. This page is for transparency, contributors, and forks.
Layout
extension/
βββ neural-28m/ # Piper voices for the CPU (Lite) engine
β βββ en/en_US/<voice>/medium/
β βββ en_US-<voice>-medium.onnx
β βββ en_US-<voice>-medium.onnx.json
βββ neural-melo-en/ # MeloTTS for the GPU Lite engine (mobile surfaces same engine as "Plus")
β βββ model.onnx # fp32 β production default
β βββ lexicon.txt # enriched CMUdict-style lexicon
β βββ tokens.txt # phoneme β ID map
β βββ LICENSE
βββ neural-82m/ # Kokoro model + voices for the GPU (Studio) engine
βββ onnx/
β βββ model.onnx # fp32 β production default
β βββ model_fp16.onnx # fp16 β experimental, blocked by upstream bugs
βββ tokenizer.json
βββ tokenizer_config.json
βββ config.json
βββ voices/ # 11 speaker embeddings
βββ af_bella.bin β¦ bm_daniel.bin
Naming note:
neural-28m/neural-82mencode the parameter count in their folder name (CPU and GPU tiers, respectively).neural-melo-enbreaks that convention β MeloTTS at ~52 M params would naturally beneural-52m, but the folder + file naming aligns with the local staging tree atAI Neural Models/android/neural-melo-en/and the mobile production bundlethreadcast-melo-en-v2.zip. Same engine, same file, two surfaces. Tier identifier in docs / engine tables remainsneural-52m.
Engine tiers at a glance
| Tier | Subtree | Architecture | Params | Runtime | First-use download | Extension UI label |
|---|---|---|---|---|---|---|
| Lite (CPU) | neural-28m/ |
Piper VITS | ~28 M | WASM single-thread | ~63 MB per voice + ~10 MB shared espeak | Neural Β· CPU |
| GPU Lite | neural-melo-en/ |
MeloTTS VITS2 + BERT prosody assist | ~52 M | WebGPU (WASM fallback) | ~177 MB single bundle (5 EN accents) | Neural Β· GPU Lite |
| Studio (GPU) | neural-82m/ |
Kokoro StyleTTS2 | ~82 M | WebGPU | ~325 MB single bundle (11 voices) | Neural Β· GPU |
GPU Lite sits between CPU and GPU on every axis β download size, VRAM, hardware floor, output quality. Designed for users whose hardware supports WebGPU but can't comfortably run the 82 M Studio model. Same engine as the mobile app's "Local AI Plus" tier β extension just surfaces it with a tier name that aligns with the existing CPU/GPU framing users already know.
CPU tier β neural-28m β Piper (VITS Β· 28 M params Β· WASM)
Five English voices, ~63 MB per voice. One voice loaded at a time. Single-thread WASM inference inside an MV3 offscreen document. Real-time on a modern laptop.
| Voice ID | Speaker | Notes |
|---|---|---|
en_US-amy-medium |
Amy | Female Β· warm narrator |
en_US-lessac-medium |
Lessac | Female Β· neutral, news-anchor |
en_US-ryan-medium |
Ryan | Male Β· clear, newsreader |
en_US-hfc_female-medium |
HFC Female | Female Β· crisp, modern |
en_US-hfc_male-medium |
HFC Male | Male Β· crisp, modern |
Each voice ships as two files (*.onnx + *.onnx.json) under neural-28m/en/en_US/<voice>/medium/.
Upstream: diffusionstudio/piper-voices β curated subset mirrored here.
GPU Lite tier β neural-melo-en β MeloTTS English (VITS2 + BERT Β· ~52 M Β· WebGPU)
Single ~171 MB model serves all 5 English accents via speaker-ID lookup at synth time. BERT prosody assist is baked into the ONNX graph, so no separate BERT input or model. WebGPU-accelerated inference; on adapters without WebGPU support, ORT-Web falls back to single-thread WASM (slow but functional). MIT license.
Files
| File | Size | Purpose |
|---|---|---|
model.onnx |
~171 MB | fp32 ONNX export β production default; same file the Android app ships via mobile-android/v1/threadcast-melo-en-v2.zip |
lexicon.txt |
~6 MB | Enriched CMUdict-style lexicon (~250 k+ entries: base 129 k + CMUdict latest + g2p_en + Aquila-Resolve neural G2P + curated Reddit/tech/brand/modern-English terms + punctuation silence rules β including em-dash β short pause) |
tokens.txt |
~1 KB | Phoneme β integer-ID map (~219 entries, case-sensitive) |
LICENSE |
small | MIT, retained from upstream |
No espeak-ng-data/ here β MeloTTS embeds phonemization end-to-end via the CMUdict lexicon. Out-of-vocabulary tokens fall back to letter-by-letter spelling using single-letter lexicon entries.
Voices (5 EN accents β speaker IDs 0..4)
sid |
Voice ID | Name | Accent |
|---|---|---|---|
| 0 | default |
Sarah | Female Β· neutral, default |
| 1 | en-us |
Alice | Female Β· American |
| 2 | en-india |
Priya | Female Β· Indian English |
| 3 | en-uk |
Charlotte | Female Β· British |
| 4 | en-au |
Olivia | Female Β· Australian |
All speakers female today β accent diversity is the differentiator. To synth a specific accent, pass the corresponding sid to the model's input tensor.
Model input contract
Standard sherpa-onnx Melo VITS2 ONNX signature:
x int64 (1, T) β phoneme IDs (from lexicon lookup via tokens.txt)
x_lengths int64 (1,) β T
tones int64 (1, T) β tone IDs (mostly 7β10 for English), parallel to x
sid int64 (1,) β speaker ID (0..4)
noise_scale float (1,) β 0.667 default
noise_scale_w float (1,) β 0.8 default
length_scale float (1,) β 1.0 / speed
Output: y float32 (1, 1, N) at 44 100 Hz mono.
Upstream: csukuangfj/sherpa-onnx-vits-melo-tts-en (sherpa-onnx's MeloTTS English export). Original model: myshell-ai/MeloTTS-English (PyTorch, MIT).
Why fp32 (not fp16)?
Same architecture, same weights, same file as mobile's Plus tier β except mobile ships fp16 for the ARM NEON SIMD speed win on-device. The browser story is different:
- ORT-Web WebGPU's fp16 path depends on the optional
shader-f16extension, which a chunk of WebGPU adapters don't expose. On those, fp16 runs at fp32 speed anyway. - ORT-Web WASM has no native fp16 kernels β fp16 input gets up-cast at load time, gaining download size but losing nothing on inference speed.
- Audio-quality A/B between fp16 and fp32 hasn't been run on a WebGPU listening setup yet. Vocoder-family models have documented fp16 sensitivity (subnormal weights can clamp on conversion β audible artifacts on sibilants), and a per-platform listening test was deferred.
Net: fp32 is the safer browser choice. If a WebGPU + headphones A/B later validates fp16, the engine config flips with no other changes (the fp16 file already exists at AI Neural Models/android/neural-melo-en/model.fp16.onnx for upload when the time comes).
GPU tier β neural-82m β Kokoro 82 M (ONNX Β· WebGPU)
A single Kokoro model unlocks 11 distinct voices at once via 11 small speaker-embedding files. WebGPU-accelerated inference, ~10Γ real-time on a modern GPU.
Model file
| File | Precision | Size | Status |
|---|---|---|---|
neural-82m/onnx/model.onnx |
fp32 | ~325 MB | β Production default β stable on every WebGPU runtime |
neural-82m/onnx/model_fp16.onnx |
fp16 | ~165 MB | β οΈ Reserved for future use β blocked today by upstream onnxruntime-web fp16 bugs (microsoft/onnxruntime#23403, #26732) |
The fp16 file is staged here so once the upstream JS stack lands fp16+WebGPU fixes, ThreadCast can flip the default to fp16 with a single config change β halving the download and roughly doubling per-segment speed on capable GPUs.
Tokenizer + config
tokenizer.json, tokenizer_config.json, config.json β small files used by @huggingface/transformers (transformers.js) when loading the model.
Voices (neural-82m/voices/*.bin, ~520 KB each)
| Voice ID | Name | Accent | Gender |
|---|---|---|---|
af_bella |
Bella | American | Female |
af_sarah |
Sarah | American | Female |
af_nova |
Nova | American | Female |
af_sky |
Sky | American | Female |
am_adam |
Adam | American | Male |
am_michael |
Michael | American | Male |
am_echo |
Echo | American | Male |
bf_emma |
Emma | British | Female |
bf_isabella |
Isabella | British | Female |
bm_george |
George | British | Male |
bm_daniel |
Daniel | British | Male |
Voice IDs encode locale and gender: first letter = accent (a = American, b = British), second letter = gender (f = female, m = male).
Upstream: model from onnx-community/Kokoro-82M-v1.0-ONNX-timestamped; voice embeddings from onnx-community/Kokoro-82M-v1.0-ONNX.
How the extension uses these files
The ThreadCast extension fetches model files lazily, only when the user selects a Neural engine and presses Test/Play. Files are cached in the browser's Cache API and reused across sessions, so the user pays the download cost exactly once per profile.
| Engine | Files fetched on first use |
|---|---|
| System voices | None β uses OS / browser TTS |
| Neural Β· CPU | The selected voice's .onnx + .onnx.json (~63 MB total) |
| Neural Β· GPU Lite | neural-melo-en/{model.onnx, lexicon.txt, tokens.txt} (~177 MB total β all 5 EN accents in one bundle) |
| Neural Β· GPU | onnx/model.onnx + tokenizer (.bin ( |
The WASM runtimes (ONNX Runtime, Piper phonemizer) are bundled inside the extension package itself β not served from this repo β to comply with Manifest V3 CSP and avoid CDN dependencies.
License
Per-project licenses retained from upstream β see the parent README for the consolidated summary.