ThreadCast — Neural Models Mirror

Threads, now a podcast.
threadcast.app · pixellabs.ventures

Self-hosted mirror of the on-device neural TTS models used by ThreadCast, the Chrome extension that turns any Reddit thread into a hands-free podcast.

This repository exists so the extension can ship a stable, version-pinned set of model weights without depending on the availability or rate-limits of upstream Hugging Face repos at runtime.

Note: if you're a ThreadCast user, you don't need anything here — the extension downloads what it needs automatically the first time you select a Neural engine. This page is for transparency, contributors, and forks.

Repository layout

threadcast-neural-models/
├── hf-cpu-mirror/                  # Piper voices for the CPU engine
│   └── en/en_US/<voice>/medium/
│       ├── en_US-<voice>-medium.onnx
│       └── en_US-<voice>-medium.onnx.json
└── hf-gpu-mirror/                  # Kokoro model + voices for the GPU engine
    ├── onnx/
    │   ├── model.onnx              # fp32 — production default
    │   └── model_fp16.onnx         # fp16 — experimental, blocked by upstream bugs
    ├── tokenizer.json
    ├── tokenizer_config.json
    ├── config.json
    └── voices/                     # 11 speaker embeddings
        ├── af_bella.bin … bm_daniel.bin

CPU tier — Piper (VITS · 28 M params · WASM)

Five English voices, ~63 MB per voice. One voice loaded at a time. Single-thread WASM inference inside an MV3 offscreen document. Real-time on a modern laptop.

Voice ID	Speaker	Notes
`en_US-amy-medium`	Amy	Female · warm narrator
`en_US-lessac-medium`	Lessac	Female · neutral, news-anchor
`en_US-ryan-medium`	Ryan	Male · clear, newsreader
`en_US-hfc_female-medium`	HFC Female	Female · crisp, modern
`en_US-hfc_male-medium`	HFC Male	Male · crisp, modern

Each voice ships as two files (*.onnx + *.onnx.json) under hf-cpu-mirror/en/en_US/<voice>/medium/.

Upstream: diffusionstudio/piper-voices → curated subset mirrored here.

GPU tier — Kokoro 82 M (ONNX · WebGPU)

A single Kokoro model unlocks 11 distinct voices at once via 11 small speaker-embedding files. WebGPU-accelerated inference, ~10× real-time on a modern GPU.

Model file

File	Precision	Size	Status
`hf-gpu-mirror/onnx/model.onnx`	fp32	~325 MB	✅ Production default — stable on every WebGPU runtime
`hf-gpu-mirror/onnx/model_fp16.onnx`	fp16	~165 MB	⚠️ Reserved for future use — blocked today by upstream `onnxruntime-web` fp16 bugs (microsoft/onnxruntime#23403, #26732)

The fp16 file is staged here so once the upstream JS stack lands fp16+WebGPU fixes, ThreadCast can flip the default to fp16 with a single config change — halving the download and roughly doubling per-segment speed on capable GPUs.

Tokenizer + config

tokenizer.json, tokenizer_config.json, config.json — small files used by @huggingface/transformers (transformers.js) when loading the model.

Voices (`hf-gpu-mirror/voices/*.bin`, ~520 KB each)

Voice ID	Name	Accent	Gender
`af_bella`	Bella	American	Female
`af_sarah`	Sarah	American	Female
`af_nova`	Nova	American	Female
`af_sky`	Sky	American	Female
`am_adam`	Adam	American	Male
`am_michael`	Michael	American	Male
`am_echo`	Echo	American	Male
`bf_emma`	Emma	British	Female
`bf_isabella`	Isabella	British	Female
`bm_george`	George	British	Male
`bm_daniel`	Daniel	British	Male

Voice IDs encode locale and gender: first letter = accent (a = American, b = British), second letter = gender (f = female, m = male).

Upstream: model from onnx-community/Kokoro-82M-v1.0-ONNX-timestamped; voice embeddings from onnx-community/Kokoro-82M-v1.0-ONNX.

How the extension uses these files

The ThreadCast extension fetches model files lazily, only when the user selects a Neural engine and presses Test/Play. Files are cached in the browser's Cache API and reused across sessions, so the user pays the download cost exactly once per profile.

Engine	Files fetched on first use
System voices	None — uses OS / browser TTS
Neural · CPU	The selected voice's `.onnx` + `.onnx.json` (~63 MB total)
Neural · GPU	`onnx/model.onnx` + tokenizer (~~326 MB) + 11 voice `.bin` (~~5.7 MB)

The WASM runtimes (ONNX Runtime, Piper phonemizer) are bundled inside the extension package itself — not served from this repo — to comply with Manifest V3 CSP and avoid CDN dependencies.

License

This repository mirrors upstream models for distribution stability. Each upstream project retains its own license:

Kokoro-82M: Apache-2.0 (upstream model card)
Piper voices: MIT, with individual voice attributions in each .onnx.json
transformers.js, onnxruntime-web: Apache-2.0

The mirror layout, README, and any custom additions in this repository are licensed under MIT by Pixel Labs.

Pixel-Labs
/

threadcast-neural-models

ThreadCast — Neural Models Mirror

Repository layout

CPU tier — Piper (VITS · 28 M params · WASM)

GPU tier — Kokoro 82 M (ONNX · WebGPU)

Model file

Tokenizer + config

Voices (`hf-gpu-mirror/voices/*.bin`, ~520 KB each)

How the extension uses these files

License

Links

ThreadCast — Neural Models Mirror

Repository layout

CPU tier — Piper (VITS · 28 M params · WASM)

GPU tier — Kokoro 82 M (ONNX · WebGPU)

Model file

Tokenizer + config

Voices (hf-gpu-mirror/voices/*.bin, ~520 KB each)

How the extension uses these files

License

Links

Voices (`hf-gpu-mirror/voices/*.bin`, ~520 KB each)