Pixel-Labs
/

threadcast-neural-models

@@ -1,8 +1,8 @@
 # ThreadCast — Android Production Zips
-Distributed mirror for the Android app's neural TTS assets. The eight zips here are downloaded by the app at runtime — first install of a neural engine pulls only what's needed (~74 MB for one Piper voice + shared data, ~26 MB for the Plus bundle, or ~145 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app.
-> Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the same engine families (Piper VITS, KittenTTS-nano, Kokoro StyleTTS2) — only the on-disk packaging differs.
 ---
@@ -11,23 +11,46 @@ Distributed mirror for the Android app's neural TTS assets. The eight zips here
 ```
 mobile-android/
 └── v1/
-    ├── threadcast-piper-shared-v1.zip                    (~11 MB) — espeak phonemizer data, downloaded once
-    ├── threadcast-piper-en_US-amy-medium-v1.zip          (~63 MB) — Amy voice
-    ├── threadcast-piper-en_US-lessac-medium-v1.zip       (~63 MB) — Lessac voice
-    ├── threadcast-piper-en_US-ryan-medium-v1.zip         (~63 MB) — Ryan voice
-    ├── threadcast-piper-en_US-hfc_female-medium-v1.zip   (~63 MB) — HFC Female voice
-    ├── threadcast-piper-en_US-hfc_male-medium-v1.zip     (~63 MB) — HFC Male voice
-    ├── threadcast-kitten-nano-en-v1.zip                  (~26 MB) — KittenTTS nano v0.1 fp16 (all 8 voices; "Local AI Plus")
-    └── threadcast-kokoro-int8-en-v1.zip                  (~145 MB) — Kokoro int8 v0.19 (all 11 voices; "Local AI Studio")
 ```
-**Versioning:** the `v1/` segment is part of the URL the runtime requests. Bumping to `v2/` lets future format changes ship without breaking older app builds — old apps keep pulling `v1/`, new apps pull `v2/`.
 ---
 ## What's inside each zip
-The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-piper/` (Piper) or `filesDir/sherpa-kokoro/` (Kokoro), so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules.
 ### Piper — shared espeak data
@@ -56,7 +79,24 @@ threadcast-piper-en_US-amy-medium-v1.zip
 Five zips total — one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.
-### Kitten (Local AI Plus) — single bundle, all voices
 ```
 threadcast-kitten-nano-en-v1.zip
@@ -67,7 +107,9 @@ threadcast-kitten-nano-en-v1.zip
 └── tokens.txt
 ```
-One download serves all 8 Plus voices — same style-vector-lookup pattern as Kokoro. The 8 speakers are baked into `voices.bin` in the order documented in `packages/mobile/modules/threadcast-neural/index.ts::KITTEN_VOICES`. **Never write the upstream engine codename in user-facing surfaces** — the Android UI labels this engine as "Local AI Plus" everywhere.
 ### Kokoro (Local AI Studio) — single bundle, all voices
@@ -93,7 +135,7 @@ urls[0] = PRIMARY_BASE   = https://huggingface.co/Pixel-Labs/threadcast-neural-m
 urls[1] = FALLBACK_BASE  = (configurable — sibling HF repo, GitHub Release, private CDN, …)
 ```
-Both bases need to serve the **same filenames** (the eight listed above). The fallback host is configured at app build time:
 | Env var (set at build time) | Default | Purpose |
 |---|---|---|
@@ -116,15 +158,17 @@ The native installer:
    pnpm --filter mobile produce:neural-zips
    ```
-   Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the eight zips with the correct internal layouts.
 2. **Upload to Hugging Face** — the user-facing primary mirror lives at:
    <https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>
-   Drop all eight zips into that tree. HF preserves filenames as-is. No build steps, no metadata munging — they're just static assets behind a CDN.
-3. **(Optional) Mirror to a fallback host.** Same eight filenames at any HTTPS endpoint. Common picks:
    - Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
    - GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
    - Private CDN (R2, S3, etc.)
@@ -137,13 +181,14 @@ The native installer:
 | User intent | Files pulled | Network |
 |---|---|---|
-| Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~74 MB |
-| Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~63 MB |
-| Install Local AI Plus | `threadcast-kitten-nano-en-v1.zip` | ~26 MB |
-| Install Local AI Studio | `threadcast-kokoro-int8-en-v1.zip` | ~145 MB |
-| Install every engine, all 5 Lite voices + Plus + Studio | every zip in `v1/` | ~496 MB |
-The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there — Plus is the sweet spot at 26 MB for an 8-voice multi-speaker model.
 ---

 # ThreadCast — Android Production Zips
+Distributed mirror for the Android app's neural TTS assets. The ten zips here are downloaded by the app at runtime — first install of a neural engine pulls only what's needed (~67 MB for one Piper voice + shared data, ~81 MB for the MeloTTS Plus bundle, or ~103 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app.
+> Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the Piper VITS and Kokoro StyleTTS2 families — only the on-disk packaging differs. MeloTTS is Android-only (no transformers.js equivalent).
 ---
 ```
 mobile-android/
 └── v1/
+    ├── threadcast-piper-shared-v1.zip                    (~9 MB)   — espeak phonemizer data, downloaded once
+    ├── threadcast-piper-en_US-amy-medium-v1.zip          (~58 MB)  — Amy voice
+    ├── threadcast-piper-en_US-lessac-medium-v1.zip       (~58 MB)  — Lessac voice
+    ├── threadcast-piper-en_US-ryan-medium-v1.zip         (~58 MB)  — Ryan voice
+    ├── threadcast-piper-en_US-hfc_female-medium-v1.zip   (~58 MB)  — HFC Female voice
+    ├── threadcast-piper-en_US-hfc_male-medium-v1.zip     (~58 MB)  — HFC Male voice
+    ├── threadcast-melo-en-v1.zip                         (~159 MB) — MeloTTS English fp32 + base 129k-entry lexicon (rollback reference, not fetched by any shipping app — see "Retained legacy bundles" below)
+    ├── threadcast-melo-en-v2.zip                         (~81 MB)  — MeloTTS English fp16 + enriched ~250k-entry lexicon + punctuation silence rules ("Local AI Plus", current as of v1.2.0)
+    ├── threadcast-kitten-nano-en-v1.zip                  (~29 MB)  — KittenTTS nano v0.1 fp16 ("Local AI Plus", legacy — fetched only by cached v1.1.x installs)
+    └── threadcast-kokoro-int8-en-v1.zip                  (~103 MB) — Kokoro int8 v0.19 (all 11 voices; "Local AI Studio")
 ```
+### What gets actively fetched, vs what's just kept
+The app's [asset manifest](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/lib/neural/asset-manifest.ts) only points at six of these ten zips at any one time. The other four are kept on HF deliberately:
+| Zip | Pulled by current shipping builds? | Why it stays |
+|---|---|---|
+| `threadcast-piper-shared-v1.zip` + 5 voice zips | ✅ yes — every Lite voice install | active |
+| `threadcast-melo-en-v2.zip` | ✅ yes — every v1.2.0+ Plus install | active |
+| `threadcast-kokoro-int8-en-v1.zip` | ✅ yes — every Studio install | active |
+| `threadcast-melo-en-v1.zip` | ❌ no | **Rollback reference.** fp32 + base lexicon snapshot from the pre-quantization iteration. Kept so we can A/B against v2 if a quality regression surfaces, or re-promote v1 by flipping the manifest if a v2 issue blocks the release. |
+| `threadcast-kitten-nano-en-v1.zip` | ⚠ partial — only fetched by cached v1.1.x installs that already have it on disk | **Legacy compat.** The runtime keeps the `'kitten'` engine path alive so v1.1.x users updating to v1.2.0 don't lose their already-downloaded Plus engine. Same-session fallback also kicks in here if MeloTTS load fails mid-thread. |
+Neither legacy zip costs the user bandwidth — they're only fetched if the app explicitly asks for them. The cost is HF storage, which has no quota for our footprint (~520 MB total across all ten zips).
+### Two independent version axes
+| Axis | What it tracks | Example | When it bumps |
+|---|---|---|---|
+| **`v1/` path segment** | Distribution-layout version (folder structure, filenames pattern, runtime expectations) | `mobile-android/v1/` | Only on a breaking change to how the app fetches or unpacks zips. New app builds would target `v2/` while older ones keep pulling `v1/`. |
+| **`-v<N>` filename suffix** | Per-bundle iteration (model weights, lexicon, quantization) | `threadcast-melo-en-v2.zip` | Whenever a single bundle's contents materially change. New bundle uploaded with a new suffix; the app's [asset manifest](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/lib/neural/asset-manifest.ts) is bumped to match. Older app builds keep pulling the older suffix until their next release. |
+For MeloTTS: **bundle v1** (fp32 + base 129k-entry lexicon) is the pre-quantization iteration — present here as a rollback reference, not fetched by any shipping app build. **Bundle v2** — what ships in app v1.2.0 — quantizes the model to fp16 (~50% size reduction, <1% MOS drop) and enriches the lexicon with CMUdict latest + g2p_en + Aquila-Resolve neural G2P + curated Reddit/tech/brand/modern-English terms + a punctuation rule so em-dashes render as a short silence instead of being spelled out.
 ---
 ## What's inside each zip
+The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-{piper,melo,kitten,kokoro}/`, so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules.
 ### Piper — shared espeak data
 Five zips total — one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.
+### MeloTTS (Local AI Plus, current) — single bundle, all accents
+```
+threadcast-melo-en-v2.zip
+├── model.fp16.onnx                  (~87 MB — VITS2 acoustic model with BERT prosody assist, ARM-NEON-accelerated fp16)
+├── lexicon.txt                      (~6 MB  — enriched CMUdict phoneme dictionary, ~250k+ entries + punctuation silence rules)
+└── tokens.txt                       (~1 KB  — phoneme → id map)
+```
+One download serves all five English accents — **Default, US, UK (British), Indian, Australian** — via speaker-id (`sid` 0–4) lookup at synth time. Voice switching is free; no per-accent download.
+**No `espeak-ng-data/`**: MeloTTS embeds phonemization end-to-end through its CMUdict lexicon. Out-of-vocabulary words are handled by the lexicon itself (which is why it's exhaustive); sherpa-onnx's MeloTTS path does not carry over the upstream `g2p_en` neural OOV fallback, so unrecognized tokens are spelled letter-by-letter at runtime. The v2 lexicon enrichment exists to make that fallback rare.
+**Why fp16, not int8**: MeloTTS is Conv1d-heavy and suffers a net perf regression on ARM under dynamic int8 quantization (sherpa-onnx #575, Coqui #2991) plus audible distortion (OpenVINO MeloTTS findings). fp16 keeps ARM NEON SIMD acceleration intact while cutting size roughly in half. The Kitten Nano `model.fp16.onnx` precedent on the same sherpa-onnx Android runtime validates the choice.
+**Never write the upstream engine codename in user-facing surfaces** — the Android UI labels this engine as "Local AI Plus" everywhere.
+### KittenTTS (Local AI Plus, legacy fallback) — single bundle, all voices
 ```
 threadcast-kitten-nano-en-v1.zip
 └── tokens.txt
 ```
+Was the Local AI Plus engine in ThreadCast v1.1.x. **Replaced by MeloTTS in v1.2.0** — tester feedback consistently rated Kitten's quality at the same tier as Piper Lite despite the marketing uplift, so Plus was rebuilt on the larger MeloTTS model.
+Kept in this mirror so existing v1.1.x installs that already cached the Kitten bundle don't re-download Plus on update; the engine code keeps the `'kitten'` runtime path alive for same-session fallback if MeloTTS load fails mid-thread.
 ### Kokoro (Local AI Studio) — single bundle, all voices
 urls[1] = FALLBACK_BASE  = (configurable — sibling HF repo, GitHub Release, private CDN, …)
 ```
+Both bases need to serve the **same filenames** (the six actively-fetched zips listed above; legacy and rollback zips are HF-only and don't need fallback coverage). The fallback host is configured at app build time:
 | Env var (set at build time) | Default | Purpose |
 |---|---|---|
    pnpm --filter mobile produce:neural-zips
    ```
+   Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + MeloTTS English fp16 + enriched lexicon + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the actively-shipped zips with the correct internal layouts. The legacy `threadcast-melo-en-v1.zip` and rollback artifacts are not regenerated — they live only on HF as historical snapshots.
+   ⚠ **If you've edited `lexicon.txt` in the Melo staging folder**, re-run this script before uploading — the producer doesn't watch source files, so the existing `threadcast-melo-en-v2.zip` on disk may be stale.
 2. **Upload to Hugging Face** — the user-facing primary mirror lives at:
    <https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>
+   Drop the freshly-built zips into that tree (HF overwrites by filename). HF preserves filenames as-is. No build steps, no metadata munging — they're just static assets behind a CDN. Don't delete the legacy `threadcast-melo-en-v1.zip` or `threadcast-kitten-nano-en-v1.zip` — see the retention table above.
+3. **(Optional) Mirror to a fallback host.** Same actively-fetched filenames at any HTTPS endpoint. Common picks:
    - Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
    - GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
    - Private CDN (R2, S3, etc.)
 | User intent | Files pulled | Network |
 |---|---|---|
+| Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~67 MB |
+| Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~58 MB |
+| Install Local AI Plus (v1.2.0+) | `threadcast-melo-en-v2.zip` | ~81 MB |
+| Install Local AI Plus (legacy v1.1.x cached) | `threadcast-kitten-nano-en-v1.zip` | ~29 MB |
+| Install Local AI Studio | `threadcast-kokoro-int8-en-v1.zip` | ~103 MB |
+| Install every active engine, all 5 Lite voices + Melo + Studio | the six active zips above | ~425 MB |
+The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there — MeloTTS Plus is the sweet spot at ~81 MB for a five-accent multi-speaker English model.
 ---