ThreadCast

ThreadCast β€” Neural Models Mirror

Threads, now a podcast.
threadcast.app Β· pixellabs.ventures


Self-hosted mirror of the on-device neural TTS models used by ThreadCast, the Chrome extension that turns any Reddit thread into a hands-free podcast.

This repository exists so the extension can ship a stable, version-pinned set of model weights without depending on the availability or rate-limits of upstream Hugging Face repos at runtime.

Note: if you're a ThreadCast user, you don't need anything here β€” the extension downloads what it needs automatically the first time you select a Neural engine. This page is for transparency, contributors, and forks.


Repository layout

threadcast-neural-models/
β”œβ”€β”€ hf-cpu-mirror/                  # Piper voices for the CPU engine
β”‚   └── en/en_US/<voice>/medium/
β”‚       β”œβ”€β”€ en_US-<voice>-medium.onnx
β”‚       └── en_US-<voice>-medium.onnx.json
└── hf-gpu-mirror/                  # Kokoro model + voices for the GPU engine
    β”œβ”€β”€ onnx/
    β”‚   β”œβ”€β”€ model.onnx              # fp32 β€” production default
    β”‚   └── model_fp16.onnx         # fp16 β€” experimental, blocked by upstream bugs
    β”œβ”€β”€ tokenizer.json
    β”œβ”€β”€ tokenizer_config.json
    β”œβ”€β”€ config.json
    └── voices/                     # 11 speaker embeddings
        β”œβ”€β”€ af_bella.bin … bm_daniel.bin

CPU tier β€” Piper (VITS Β· 28 M params Β· WASM)

Five English voices, ~63 MB per voice. One voice loaded at a time. Single-thread WASM inference inside an MV3 offscreen document. Real-time on a modern laptop.

Voice ID Speaker Notes
en_US-amy-medium Amy Female Β· warm narrator
en_US-lessac-medium Lessac Female Β· neutral, news-anchor
en_US-ryan-medium Ryan Male Β· clear, newsreader
en_US-hfc_female-medium HFC Female Female Β· crisp, modern
en_US-hfc_male-medium HFC Male Male Β· crisp, modern

Each voice ships as two files (*.onnx + *.onnx.json) under hf-cpu-mirror/en/en_US/<voice>/medium/.

Upstream: diffusionstudio/piper-voices β†’ curated subset mirrored here.


GPU tier β€” Kokoro 82 M (ONNX Β· WebGPU)

A single Kokoro model unlocks 11 distinct voices at once via 11 small speaker-embedding files. WebGPU-accelerated inference, ~10Γ— real-time on a modern GPU.

Model file

File Precision Size Status
hf-gpu-mirror/onnx/model.onnx fp32 ~325 MB βœ… Production default β€” stable on every WebGPU runtime
hf-gpu-mirror/onnx/model_fp16.onnx fp16 ~165 MB ⚠️ Reserved for future use β€” blocked today by upstream onnxruntime-web fp16 bugs (microsoft/onnxruntime#23403, #26732)

The fp16 file is staged here so once the upstream JS stack lands fp16+WebGPU fixes, ThreadCast can flip the default to fp16 with a single config change β€” halving the download and roughly doubling per-segment speed on capable GPUs.

Tokenizer + config

tokenizer.json, tokenizer_config.json, config.json β€” small files used by @huggingface/transformers (transformers.js) when loading the model.

Voices (hf-gpu-mirror/voices/*.bin, ~520 KB each)

Voice ID Name Accent Gender
af_bella Bella American Female
af_sarah Sarah American Female
af_nova Nova American Female
af_sky Sky American Female
am_adam Adam American Male
am_michael Michael American Male
am_echo Echo American Male
bf_emma Emma British Female
bf_isabella Isabella British Female
bm_george George British Male
bm_daniel Daniel British Male

Voice IDs encode locale and gender: first letter = accent (a = American, b = British), second letter = gender (f = female, m = male).

Upstream: model from onnx-community/Kokoro-82M-v1.0-ONNX-timestamped; voice embeddings from onnx-community/Kokoro-82M-v1.0-ONNX.


How the extension uses these files

The ThreadCast extension fetches model files lazily, only when the user selects a Neural engine and presses Test/Play. Files are cached in the browser's Cache API and reused across sessions, so the user pays the download cost exactly once per profile.

Engine Files fetched on first use
System voices None β€” uses OS / browser TTS
Neural Β· CPU The selected voice's .onnx + .onnx.json (~63 MB total)
Neural Β· GPU onnx/model.onnx + tokenizer (326 MB) + 11 voice .bin (5.7 MB)

The WASM runtimes (ONNX Runtime, Piper phonemizer) are bundled inside the extension package itself β€” not served from this repo β€” to comply with Manifest V3 CSP and avoid CDN dependencies.


License

This repository mirrors upstream models for distribution stability. Each upstream project retains its own license:

  • Kokoro-82M: Apache-2.0 (upstream model card)
  • Piper voices: MIT, with individual voice attributions in each .onnx.json
  • transformers.js, onnxruntime-web: Apache-2.0

The mirror layout, README, and any custom additions in this repository are licensed under MIT by Pixel Labs.


Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support