Text-to-Speech
ONNX
KittenTTS
English
tts
kokoro
piper
melotts
vits
vits2
styletts2
sherpa-onnx
on-device
threadcast
Instructions to use Pixel-Labs/threadcast-neural-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- KittenTTS
How to use Pixel-Labs/threadcast-neural-models with KittenTTS:
from kittentts import KittenTTS m = KittenTTS("Pixel-Labs/threadcast-neural-models") audio = m.generate("This high quality TTS model works without a GPU") # Save the audio import soundfile as sf sf.write('output.wav', audio, 24000) - Notebooks
- Google Colab
- Kaggle
docs(mobile-android): Melo v2 sections + retention table for legacy zips
Browse files- mobile-android/README.md +70 -25
mobile-android/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
# ThreadCast β Android Production Zips
|
| 2 |
|
| 3 |
-
Distributed mirror for the Android app's neural TTS assets. The
|
| 4 |
|
| 5 |
-
> Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
@@ -11,23 +11,46 @@ Distributed mirror for the Android app's neural TTS assets. The eight zips here
|
|
| 11 |
```
|
| 12 |
mobile-android/
|
| 13 |
βββ v1/
|
| 14 |
-
βββ threadcast-piper-shared-v1.zip (~
|
| 15 |
-
βββ threadcast-piper-en_US-amy-medium-v1.zip (~
|
| 16 |
-
βββ threadcast-piper-en_US-lessac-medium-v1.zip (~
|
| 17 |
-
βββ threadcast-piper-en_US-ryan-medium-v1.zip (~
|
| 18 |
-
βββ threadcast-piper-en_US-hfc_female-medium-v1.zip (~
|
| 19 |
-
βββ threadcast-piper-en_US-hfc_male-medium-v1.zip (~
|
| 20 |
-
βββ threadcast-
|
| 21 |
-
|
|
|
|
|
|
|
| 22 |
```
|
| 23 |
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
---
|
| 27 |
|
| 28 |
## What's inside each zip
|
| 29 |
|
| 30 |
-
The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-piper
|
| 31 |
|
| 32 |
### Piper β shared espeak data
|
| 33 |
|
|
@@ -56,7 +79,24 @@ threadcast-piper-en_US-amy-medium-v1.zip
|
|
| 56 |
|
| 57 |
Five zips total β one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.
|
| 58 |
|
| 59 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
```
|
| 62 |
threadcast-kitten-nano-en-v1.zip
|
|
@@ -67,7 +107,9 @@ threadcast-kitten-nano-en-v1.zip
|
|
| 67 |
βββ tokens.txt
|
| 68 |
```
|
| 69 |
|
| 70 |
-
|
|
|
|
|
|
|
| 71 |
|
| 72 |
### Kokoro (Local AI Studio) β single bundle, all voices
|
| 73 |
|
|
@@ -93,7 +135,7 @@ urls[0] = PRIMARY_BASE = https://huggingface.co/Pixel-Labs/threadcast-neural-m
|
|
| 93 |
urls[1] = FALLBACK_BASE = (configurable β sibling HF repo, GitHub Release, private CDN, β¦)
|
| 94 |
```
|
| 95 |
|
| 96 |
-
Both bases need to serve the **same filenames** (the
|
| 97 |
|
| 98 |
| Env var (set at build time) | Default | Purpose |
|
| 99 |
|---|---|---|
|
|
@@ -116,15 +158,17 @@ The native installer:
|
|
| 116 |
pnpm --filter mobile produce:neural-zips
|
| 117 |
```
|
| 118 |
|
| 119 |
-
Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the
|
|
|
|
|
|
|
| 120 |
|
| 121 |
2. **Upload to Hugging Face** β the user-facing primary mirror lives at:
|
| 122 |
|
| 123 |
<https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>
|
| 124 |
|
| 125 |
-
Drop
|
| 126 |
|
| 127 |
-
3. **(Optional) Mirror to a fallback host.** Same
|
| 128 |
- Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
|
| 129 |
- GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
|
| 130 |
- Private CDN (R2, S3, etc.)
|
|
@@ -137,13 +181,14 @@ The native installer:
|
|
| 137 |
|
| 138 |
| User intent | Files pulled | Network |
|
| 139 |
|---|---|---|
|
| 140 |
-
| Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~
|
| 141 |
-
| Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~
|
| 142 |
-
| Install Local AI Plus | `threadcast-
|
| 143 |
-
| Install Local AI
|
| 144 |
-
| Install
|
| 145 |
-
|
| 146 |
-
|
|
|
|
| 147 |
|
| 148 |
---
|
| 149 |
|
|
|
|
| 1 |
# ThreadCast β Android Production Zips
|
| 2 |
|
| 3 |
+
Distributed mirror for the Android app's neural TTS assets. The ten zips here are downloaded by the app at runtime β first install of a neural engine pulls only what's needed (~67 MB for one Piper voice + shared data, ~81 MB for the MeloTTS Plus bundle, or ~103 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app.
|
| 4 |
|
| 5 |
+
> Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the Piper VITS and Kokoro StyleTTS2 families β only the on-disk packaging differs. MeloTTS is Android-only (no transformers.js equivalent).
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
|
|
| 11 |
```
|
| 12 |
mobile-android/
|
| 13 |
βββ v1/
|
| 14 |
+
βββ threadcast-piper-shared-v1.zip (~9 MB) β espeak phonemizer data, downloaded once
|
| 15 |
+
βββ threadcast-piper-en_US-amy-medium-v1.zip (~58 MB) β Amy voice
|
| 16 |
+
βββ threadcast-piper-en_US-lessac-medium-v1.zip (~58 MB) β Lessac voice
|
| 17 |
+
βββ threadcast-piper-en_US-ryan-medium-v1.zip (~58 MB) β Ryan voice
|
| 18 |
+
βββ threadcast-piper-en_US-hfc_female-medium-v1.zip (~58 MB) β HFC Female voice
|
| 19 |
+
βββ threadcast-piper-en_US-hfc_male-medium-v1.zip (~58 MB) β HFC Male voice
|
| 20 |
+
βββ threadcast-melo-en-v1.zip (~159 MB) β MeloTTS English fp32 + base 129k-entry lexicon (rollback reference, not fetched by any shipping app β see "Retained legacy bundles" below)
|
| 21 |
+
βββ threadcast-melo-en-v2.zip (~81 MB) β MeloTTS English fp16 + enriched ~250k-entry lexicon + punctuation silence rules ("Local AI Plus", current as of v1.2.0)
|
| 22 |
+
βββ threadcast-kitten-nano-en-v1.zip (~29 MB) β KittenTTS nano v0.1 fp16 ("Local AI Plus", legacy β fetched only by cached v1.1.x installs)
|
| 23 |
+
βββ threadcast-kokoro-int8-en-v1.zip (~103 MB) β Kokoro int8 v0.19 (all 11 voices; "Local AI Studio")
|
| 24 |
```
|
| 25 |
|
| 26 |
+
### What gets actively fetched, vs what's just kept
|
| 27 |
+
|
| 28 |
+
The app's [asset manifest](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/lib/neural/asset-manifest.ts) only points at six of these ten zips at any one time. The other four are kept on HF deliberately:
|
| 29 |
+
|
| 30 |
+
| Zip | Pulled by current shipping builds? | Why it stays |
|
| 31 |
+
|---|---|---|
|
| 32 |
+
| `threadcast-piper-shared-v1.zip` + 5 voice zips | β
yes β every Lite voice install | active |
|
| 33 |
+
| `threadcast-melo-en-v2.zip` | β
yes β every v1.2.0+ Plus install | active |
|
| 34 |
+
| `threadcast-kokoro-int8-en-v1.zip` | β
yes β every Studio install | active |
|
| 35 |
+
| `threadcast-melo-en-v1.zip` | β no | **Rollback reference.** fp32 + base lexicon snapshot from the pre-quantization iteration. Kept so we can A/B against v2 if a quality regression surfaces, or re-promote v1 by flipping the manifest if a v2 issue blocks the release. |
|
| 36 |
+
| `threadcast-kitten-nano-en-v1.zip` | β partial β only fetched by cached v1.1.x installs that already have it on disk | **Legacy compat.** The runtime keeps the `'kitten'` engine path alive so v1.1.x users updating to v1.2.0 don't lose their already-downloaded Plus engine. Same-session fallback also kicks in here if MeloTTS load fails mid-thread. |
|
| 37 |
+
|
| 38 |
+
Neither legacy zip costs the user bandwidth β they're only fetched if the app explicitly asks for them. The cost is HF storage, which has no quota for our footprint (~520 MB total across all ten zips).
|
| 39 |
+
|
| 40 |
+
### Two independent version axes
|
| 41 |
+
|
| 42 |
+
| Axis | What it tracks | Example | When it bumps |
|
| 43 |
+
|---|---|---|---|
|
| 44 |
+
| **`v1/` path segment** | Distribution-layout version (folder structure, filenames pattern, runtime expectations) | `mobile-android/v1/` | Only on a breaking change to how the app fetches or unpacks zips. New app builds would target `v2/` while older ones keep pulling `v1/`. |
|
| 45 |
+
| **`-v<N>` filename suffix** | Per-bundle iteration (model weights, lexicon, quantization) | `threadcast-melo-en-v2.zip` | Whenever a single bundle's contents materially change. New bundle uploaded with a new suffix; the app's [asset manifest](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/lib/neural/asset-manifest.ts) is bumped to match. Older app builds keep pulling the older suffix until their next release. |
|
| 46 |
+
|
| 47 |
+
For MeloTTS: **bundle v1** (fp32 + base 129k-entry lexicon) is the pre-quantization iteration β present here as a rollback reference, not fetched by any shipping app build. **Bundle v2** β what ships in app v1.2.0 β quantizes the model to fp16 (~50% size reduction, <1% MOS drop) and enriches the lexicon with CMUdict latest + g2p_en + Aquila-Resolve neural G2P + curated Reddit/tech/brand/modern-English terms + a punctuation rule so em-dashes render as a short silence instead of being spelled out.
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
## What's inside each zip
|
| 52 |
|
| 53 |
+
The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-{piper,melo,kitten,kokoro}/`, so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules.
|
| 54 |
|
| 55 |
### Piper β shared espeak data
|
| 56 |
|
|
|
|
| 79 |
|
| 80 |
Five zips total β one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.
|
| 81 |
|
| 82 |
+
### MeloTTS (Local AI Plus, current) β single bundle, all accents
|
| 83 |
+
|
| 84 |
+
```
|
| 85 |
+
threadcast-melo-en-v2.zip
|
| 86 |
+
βββ model.fp16.onnx (~87 MB β VITS2 acoustic model with BERT prosody assist, ARM-NEON-accelerated fp16)
|
| 87 |
+
βββ lexicon.txt (~6 MB β enriched CMUdict phoneme dictionary, ~250k+ entries + punctuation silence rules)
|
| 88 |
+
βββ tokens.txt (~1 KB β phoneme β id map)
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
One download serves all five English accents β **Default, US, UK (British), Indian, Australian** β via speaker-id (`sid` 0β4) lookup at synth time. Voice switching is free; no per-accent download.
|
| 92 |
+
|
| 93 |
+
**No `espeak-ng-data/`**: MeloTTS embeds phonemization end-to-end through its CMUdict lexicon. Out-of-vocabulary words are handled by the lexicon itself (which is why it's exhaustive); sherpa-onnx's MeloTTS path does not carry over the upstream `g2p_en` neural OOV fallback, so unrecognized tokens are spelled letter-by-letter at runtime. The v2 lexicon enrichment exists to make that fallback rare.
|
| 94 |
+
|
| 95 |
+
**Why fp16, not int8**: MeloTTS is Conv1d-heavy and suffers a net perf regression on ARM under dynamic int8 quantization (sherpa-onnx #575, Coqui #2991) plus audible distortion (OpenVINO MeloTTS findings). fp16 keeps ARM NEON SIMD acceleration intact while cutting size roughly in half. The Kitten Nano `model.fp16.onnx` precedent on the same sherpa-onnx Android runtime validates the choice.
|
| 96 |
+
|
| 97 |
+
**Never write the upstream engine codename in user-facing surfaces** β the Android UI labels this engine as "Local AI Plus" everywhere.
|
| 98 |
+
|
| 99 |
+
### KittenTTS (Local AI Plus, legacy fallback) β single bundle, all voices
|
| 100 |
|
| 101 |
```
|
| 102 |
threadcast-kitten-nano-en-v1.zip
|
|
|
|
| 107 |
βββ tokens.txt
|
| 108 |
```
|
| 109 |
|
| 110 |
+
Was the Local AI Plus engine in ThreadCast v1.1.x. **Replaced by MeloTTS in v1.2.0** β tester feedback consistently rated Kitten's quality at the same tier as Piper Lite despite the marketing uplift, so Plus was rebuilt on the larger MeloTTS model.
|
| 111 |
+
|
| 112 |
+
Kept in this mirror so existing v1.1.x installs that already cached the Kitten bundle don't re-download Plus on update; the engine code keeps the `'kitten'` runtime path alive for same-session fallback if MeloTTS load fails mid-thread.
|
| 113 |
|
| 114 |
### Kokoro (Local AI Studio) β single bundle, all voices
|
| 115 |
|
|
|
|
| 135 |
urls[1] = FALLBACK_BASE = (configurable β sibling HF repo, GitHub Release, private CDN, β¦)
|
| 136 |
```
|
| 137 |
|
| 138 |
+
Both bases need to serve the **same filenames** (the six actively-fetched zips listed above; legacy and rollback zips are HF-only and don't need fallback coverage). The fallback host is configured at app build time:
|
| 139 |
|
| 140 |
| Env var (set at build time) | Default | Purpose |
|
| 141 |
|---|---|---|
|
|
|
|
| 158 |
pnpm --filter mobile produce:neural-zips
|
| 159 |
```
|
| 160 |
|
| 161 |
+
Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + MeloTTS English fp16 + enriched lexicon + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the actively-shipped zips with the correct internal layouts. The legacy `threadcast-melo-en-v1.zip` and rollback artifacts are not regenerated β they live only on HF as historical snapshots.
|
| 162 |
+
|
| 163 |
+
β **If you've edited `lexicon.txt` in the Melo staging folder**, re-run this script before uploading β the producer doesn't watch source files, so the existing `threadcast-melo-en-v2.zip` on disk may be stale.
|
| 164 |
|
| 165 |
2. **Upload to Hugging Face** β the user-facing primary mirror lives at:
|
| 166 |
|
| 167 |
<https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>
|
| 168 |
|
| 169 |
+
Drop the freshly-built zips into that tree (HF overwrites by filename). HF preserves filenames as-is. No build steps, no metadata munging β they're just static assets behind a CDN. Don't delete the legacy `threadcast-melo-en-v1.zip` or `threadcast-kitten-nano-en-v1.zip` β see the retention table above.
|
| 170 |
|
| 171 |
+
3. **(Optional) Mirror to a fallback host.** Same actively-fetched filenames at any HTTPS endpoint. Common picks:
|
| 172 |
- Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
|
| 173 |
- GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
|
| 174 |
- Private CDN (R2, S3, etc.)
|
|
|
|
| 181 |
|
| 182 |
| User intent | Files pulled | Network |
|
| 183 |
|---|---|---|
|
| 184 |
+
| Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~67 MB |
|
| 185 |
+
| Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~58 MB |
|
| 186 |
+
| Install Local AI Plus (v1.2.0+) | `threadcast-melo-en-v2.zip` | ~81 MB |
|
| 187 |
+
| Install Local AI Plus (legacy v1.1.x cached) | `threadcast-kitten-nano-en-v1.zip` | ~29 MB |
|
| 188 |
+
| Install Local AI Studio | `threadcast-kokoro-int8-en-v1.zip` | ~103 MB |
|
| 189 |
+
| Install every active engine, all 5 Lite voices + Melo + Studio | the six active zips above | ~425 MB |
|
| 190 |
+
|
| 191 |
+
The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there β MeloTTS Plus is the sweet spot at ~81 MB for a five-accent multi-speaker English model.
|
| 192 |
|
| 193 |
---
|
| 194 |
|