Text-to-Speech
ONNX
KittenTTS
English
tts
kokoro
piper
vits
styletts2
sherpa-onnx
on-device
threadcast
Instructions to use Pixel-Labs/threadcast-neural-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- KittenTTS
How to use Pixel-Labs/threadcast-neural-models with KittenTTS:
from kittentts import KittenTTS m = KittenTTS("Pixel-Labs/threadcast-neural-models") audio = m.generate("This high quality TTS model works without a GPU") # Save the audio import soundfile as sf sf.write('output.wav', audio, 24000) - Notebooks
- Google Colab
- Kaggle
| # ThreadCast β Android Production Zips | |
| Distributed mirror for the Android app's neural TTS assets. The eight zips here are downloaded by the app at runtime β first install of a neural engine pulls only what's needed (~74 MB for one Piper voice + shared data, ~26 MB for the Plus bundle, or ~145 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app. | |
| > Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the same engine families (Piper VITS, KittenTTS-nano, Kokoro StyleTTS2) β only the on-disk packaging differs. | |
| --- | |
| ## Layout | |
| ``` | |
| mobile-android/ | |
| βββ v1/ | |
| βββ threadcast-piper-shared-v1.zip (~11 MB) β espeak phonemizer data, downloaded once | |
| βββ threadcast-piper-en_US-amy-medium-v1.zip (~63 MB) β Amy voice | |
| βββ threadcast-piper-en_US-lessac-medium-v1.zip (~63 MB) β Lessac voice | |
| βββ threadcast-piper-en_US-ryan-medium-v1.zip (~63 MB) β Ryan voice | |
| βββ threadcast-piper-en_US-hfc_female-medium-v1.zip (~63 MB) β HFC Female voice | |
| βββ threadcast-piper-en_US-hfc_male-medium-v1.zip (~63 MB) β HFC Male voice | |
| βββ threadcast-kitten-nano-en-v1.zip (~26 MB) β KittenTTS nano v0.1 fp16 (all 8 voices; "Local AI Plus") | |
| βββ threadcast-kokoro-int8-en-v1.zip (~145 MB) β Kokoro int8 v0.19 (all 11 voices; "Local AI Studio") | |
| ``` | |
| **Versioning:** the `v1/` segment is part of the URL the runtime requests. Bumping to `v2/` lets future format changes ship without breaking older app builds β old apps keep pulling `v1/`, new apps pull `v2/`. | |
| --- | |
| ## What's inside each zip | |
| The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-piper/` (Piper) or `filesDir/sherpa-kokoro/` (Kokoro), so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules. | |
| ### Piper β shared espeak data | |
| ``` | |
| threadcast-piper-shared-v1.zip | |
| βββ espeak-ng-data/ | |
| βββ phontab | |
| βββ phonindex | |
| βββ phondata | |
| βββ intonations | |
| βββ lang/ | |
| βββ voices/ | |
| βββ β¦ (full espeak-ng tree) | |
| ``` | |
| Downloaded once on the user's first Piper voice install. Skipped on every subsequent voice install. | |
| ### Piper β one zip per voice | |
| ``` | |
| threadcast-piper-en_US-amy-medium-v1.zip | |
| βββ en_US-amy-medium/ | |
| βββ en_US-amy-medium.onnx (~63 MB) | |
| βββ tokens.txt | |
| ``` | |
| Five zips total β one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan. | |
| ### Kitten (Local AI Plus) β single bundle, all voices | |
| ``` | |
| threadcast-kitten-nano-en-v1.zip | |
| βββ espeak-ng-data/ (separate copy from Piper's / Kokoro's β different on-disk root) | |
| β βββ β¦ | |
| βββ model.fp16.onnx (~24 MB) | |
| βββ voices.bin (~30 KB β 8 speaker embeddings) | |
| βββ tokens.txt | |
| ``` | |
| One download serves all 8 Plus voices β same style-vector-lookup pattern as Kokoro. The 8 speakers are baked into `voices.bin` in the order documented in `packages/mobile/modules/threadcast-neural/index.ts::KITTEN_VOICES`. **Never write the upstream engine codename in user-facing surfaces** β the Android UI labels this engine as "Local AI Plus" everywhere. | |
| ### Kokoro (Local AI Studio) β single bundle, all voices | |
| ``` | |
| threadcast-kokoro-int8-en-v1.zip | |
| βββ espeak-ng-data/ (separate copy from Piper's β different on-disk root) | |
| β βββ β¦ | |
| βββ model.int8.onnx (~135 MB) | |
| βββ voices.bin (~5.7 MB β concatenated speaker embeddings, all 11) | |
| βββ tokens.txt | |
| ``` | |
| One download serves every Kokoro voice β switching speakers is a free style-vector lookup at synth time. | |
| --- | |
| ## How the runtime fetches these | |
| **Mirror-and-fallback** β same pattern the extension uses for its `Pixel-Labs/threadcast-neural-models/{hf-cpu-mirror,hf-gpu-mirror}/` paths (see [`extension/src/offscreen/mirror-fetch.ts`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/extension/src/offscreen/mirror-fetch.ts)). Each download is an ordered list of URLs; the installer tries each on failure and only surfaces an error if every URL is unreachable. | |
| ``` | |
| urls[0] = PRIMARY_BASE = https://huggingface.co/Pixel-Labs/threadcast-neural-models/resolve/main/mobile-android/v1 | |
| urls[1] = FALLBACK_BASE = (configurable β sibling HF repo, GitHub Release, private CDN, β¦) | |
| ``` | |
| Both bases need to serve the **same filenames** (the eight listed above). The fallback host is configured at app build time: | |
| | Env var (set at build time) | Default | Purpose | | |
| |---|---|---| | |
| | `EXPO_PUBLIC_NEURAL_ASSETS_BASE_URL` | `https://huggingface.co/Pixel-Labs/threadcast-neural-models/resolve/main/mobile-android/v1` | Primary mirror | | |
| | `EXPO_PUBLIC_NEURAL_ASSETS_FALLBACK_URL` | unset | Optional fallback (omit for primary-only) | | |
| The native installer: | |
| - Tries `urls[0]` first. On HTTP 4xx/5xx or transport error, logs a warning and tries `urls[1]`. | |
| - Only the LAST URL's failure surfaces to the UI as a download error β transient mirror outages are silent. | |
| - Cancellation aborts immediately at any URL boundary. | |
| - Verifies post-extract that every required file exists with non-zero size; rejects malformed zips even if the HTTP layer succeeded. | |
| --- | |
| ## Publishing workflow | |
| 1. **Produce the zips** from the local sherpa-onnx staging tree: | |
| ```sh | |
| pnpm --filter mobile produce:neural-zips | |
| ``` | |
| Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the eight zips with the correct internal layouts. | |
| 2. **Upload to Hugging Face** β the user-facing primary mirror lives at: | |
| <https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1> | |
| Drop all eight zips into that tree. HF preserves filenames as-is. No build steps, no metadata munging β they're just static assets behind a CDN. | |
| 3. **(Optional) Mirror to a fallback host.** Same eight filenames at any HTTPS endpoint. Common picks: | |
| - Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`) | |
| - GitHub Release with `gh release upload v1 dist/neural-assets/*.zip` | |
| - Private CDN (R2, S3, etc.) | |
| Then ship the next app build with `EXPO_PUBLIC_NEURAL_ASSETS_FALLBACK_URL` set to the mirror's base URL. | |
| --- | |
| ## Per-engine download cost (user-facing) | |
| | User intent | Files pulled | Network | | |
| |---|---|---| | |
| | Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~74 MB | | |
| | Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~63 MB | | |
| | Install Local AI Plus | `threadcast-kitten-nano-en-v1.zip` | ~26 MB | | |
| | Install Local AI Studio | `threadcast-kokoro-int8-en-v1.zip` | ~145 MB | | |
| | Install every engine, all 5 Lite voices + Plus + Studio | every zip in `v1/` | ~496 MB | | |
| The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there β Plus is the sweet spot at 26 MB for an 8-voice multi-speaker model. | |
| --- | |
| ## License | |
| Per-project licenses retained from upstream β see the [parent README](../README.md#license) for the consolidated summary. | |