Pixel-Labs commited on
Commit
c08fb40
Β·
verified Β·
1 Parent(s): 72c9aa9

docs(mobile-android): Melo v2 sections + retention table for legacy zips

Browse files
Files changed (1) hide show
  1. mobile-android/README.md +70 -25
mobile-android/README.md CHANGED
@@ -1,8 +1,8 @@
1
  # ThreadCast β€” Android Production Zips
2
 
3
- Distributed mirror for the Android app's neural TTS assets. The eight zips here are downloaded by the app at runtime β€” first install of a neural engine pulls only what's needed (~74 MB for one Piper voice + shared data, ~26 MB for the Plus bundle, or ~145 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app.
4
 
5
- > Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the same engine families (Piper VITS, KittenTTS-nano, Kokoro StyleTTS2) β€” only the on-disk packaging differs.
6
 
7
  ---
8
 
@@ -11,23 +11,46 @@ Distributed mirror for the Android app's neural TTS assets. The eight zips here
11
  ```
12
  mobile-android/
13
  └── v1/
14
- β”œβ”€β”€ threadcast-piper-shared-v1.zip (~11 MB) β€” espeak phonemizer data, downloaded once
15
- β”œβ”€β”€ threadcast-piper-en_US-amy-medium-v1.zip (~63 MB) β€” Amy voice
16
- β”œβ”€β”€ threadcast-piper-en_US-lessac-medium-v1.zip (~63 MB) β€” Lessac voice
17
- β”œβ”€β”€ threadcast-piper-en_US-ryan-medium-v1.zip (~63 MB) β€” Ryan voice
18
- β”œβ”€β”€ threadcast-piper-en_US-hfc_female-medium-v1.zip (~63 MB) β€” HFC Female voice
19
- β”œβ”€β”€ threadcast-piper-en_US-hfc_male-medium-v1.zip (~63 MB) β€” HFC Male voice
20
- β”œβ”€β”€ threadcast-kitten-nano-en-v1.zip (~26 MB) β€” KittenTTS nano v0.1 fp16 (all 8 voices; "Local AI Plus")
21
- └── threadcast-kokoro-int8-en-v1.zip (~145 MB) β€” Kokoro int8 v0.19 (all 11 voices; "Local AI Studio")
 
 
22
  ```
23
 
24
- **Versioning:** the `v1/` segment is part of the URL the runtime requests. Bumping to `v2/` lets future format changes ship without breaking older app builds β€” old apps keep pulling `v1/`, new apps pull `v2/`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ---
27
 
28
  ## What's inside each zip
29
 
30
- The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-piper/` (Piper) or `filesDir/sherpa-kokoro/` (Kokoro), so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules.
31
 
32
  ### Piper β€” shared espeak data
33
 
@@ -56,7 +79,24 @@ threadcast-piper-en_US-amy-medium-v1.zip
56
 
57
  Five zips total β€” one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.
58
 
59
- ### Kitten (Local AI Plus) β€” single bundle, all voices
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ```
62
  threadcast-kitten-nano-en-v1.zip
@@ -67,7 +107,9 @@ threadcast-kitten-nano-en-v1.zip
67
  └── tokens.txt
68
  ```
69
 
70
- One download serves all 8 Plus voices β€” same style-vector-lookup pattern as Kokoro. The 8 speakers are baked into `voices.bin` in the order documented in `packages/mobile/modules/threadcast-neural/index.ts::KITTEN_VOICES`. **Never write the upstream engine codename in user-facing surfaces** β€” the Android UI labels this engine as "Local AI Plus" everywhere.
 
 
71
 
72
  ### Kokoro (Local AI Studio) β€” single bundle, all voices
73
 
@@ -93,7 +135,7 @@ urls[0] = PRIMARY_BASE = https://huggingface.co/Pixel-Labs/threadcast-neural-m
93
  urls[1] = FALLBACK_BASE = (configurable β€” sibling HF repo, GitHub Release, private CDN, …)
94
  ```
95
 
96
- Both bases need to serve the **same filenames** (the eight listed above). The fallback host is configured at app build time:
97
 
98
  | Env var (set at build time) | Default | Purpose |
99
  |---|---|---|
@@ -116,15 +158,17 @@ The native installer:
116
  pnpm --filter mobile produce:neural-zips
117
  ```
118
 
119
- Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the eight zips with the correct internal layouts.
 
 
120
 
121
  2. **Upload to Hugging Face** β€” the user-facing primary mirror lives at:
122
 
123
  <https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>
124
 
125
- Drop all eight zips into that tree. HF preserves filenames as-is. No build steps, no metadata munging β€” they're just static assets behind a CDN.
126
 
127
- 3. **(Optional) Mirror to a fallback host.** Same eight filenames at any HTTPS endpoint. Common picks:
128
  - Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
129
  - GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
130
  - Private CDN (R2, S3, etc.)
@@ -137,13 +181,14 @@ The native installer:
137
 
138
  | User intent | Files pulled | Network |
139
  |---|---|---|
140
- | Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~74 MB |
141
- | Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~63 MB |
142
- | Install Local AI Plus | `threadcast-kitten-nano-en-v1.zip` | ~26 MB |
143
- | Install Local AI Studio | `threadcast-kokoro-int8-en-v1.zip` | ~145 MB |
144
- | Install every engine, all 5 Lite voices + Plus + Studio | every zip in `v1/` | ~496 MB |
145
-
146
- The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there β€” Plus is the sweet spot at 26 MB for an 8-voice multi-speaker model.
 
147
 
148
  ---
149
 
 
1
  # ThreadCast β€” Android Production Zips
2
 
3
+ Distributed mirror for the Android app's neural TTS assets. The ten zips here are downloaded by the app at runtime β€” first install of a neural engine pulls only what's needed (~67 MB for one Piper voice + shared data, ~81 MB for the MeloTTS Plus bundle, or ~103 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app.
4
 
5
+ > Sibling: **`../extension/`** holds the Chrome extension's neural models in raw HF format. Both subtrees share the Piper VITS and Kokoro StyleTTS2 families β€” only the on-disk packaging differs. MeloTTS is Android-only (no transformers.js equivalent).
6
 
7
  ---
8
 
 
11
  ```
12
  mobile-android/
13
  └── v1/
14
+ β”œβ”€β”€ threadcast-piper-shared-v1.zip (~9 MB) β€” espeak phonemizer data, downloaded once
15
+ β”œβ”€β”€ threadcast-piper-en_US-amy-medium-v1.zip (~58 MB) β€” Amy voice
16
+ β”œβ”€β”€ threadcast-piper-en_US-lessac-medium-v1.zip (~58 MB) β€” Lessac voice
17
+ β”œβ”€β”€ threadcast-piper-en_US-ryan-medium-v1.zip (~58 MB) β€” Ryan voice
18
+ β”œβ”€β”€ threadcast-piper-en_US-hfc_female-medium-v1.zip (~58 MB) β€” HFC Female voice
19
+ β”œβ”€β”€ threadcast-piper-en_US-hfc_male-medium-v1.zip (~58 MB) β€” HFC Male voice
20
+ β”œβ”€β”€ threadcast-melo-en-v1.zip (~159 MB) β€” MeloTTS English fp32 + base 129k-entry lexicon (rollback reference, not fetched by any shipping app β€” see "Retained legacy bundles" below)
21
+ β”œβ”€β”€ threadcast-melo-en-v2.zip (~81 MB) β€” MeloTTS English fp16 + enriched ~250k-entry lexicon + punctuation silence rules ("Local AI Plus", current as of v1.2.0)
22
+ β”œβ”€β”€ threadcast-kitten-nano-en-v1.zip (~29 MB) β€” KittenTTS nano v0.1 fp16 ("Local AI Plus", legacy β€” fetched only by cached v1.1.x installs)
23
+ └── threadcast-kokoro-int8-en-v1.zip (~103 MB) β€” Kokoro int8 v0.19 (all 11 voices; "Local AI Studio")
24
  ```
25
 
26
+ ### What gets actively fetched, vs what's just kept
27
+
28
+ The app's [asset manifest](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/lib/neural/asset-manifest.ts) only points at six of these ten zips at any one time. The other four are kept on HF deliberately:
29
+
30
+ | Zip | Pulled by current shipping builds? | Why it stays |
31
+ |---|---|---|
32
+ | `threadcast-piper-shared-v1.zip` + 5 voice zips | βœ… yes β€” every Lite voice install | active |
33
+ | `threadcast-melo-en-v2.zip` | βœ… yes β€” every v1.2.0+ Plus install | active |
34
+ | `threadcast-kokoro-int8-en-v1.zip` | βœ… yes β€” every Studio install | active |
35
+ | `threadcast-melo-en-v1.zip` | ❌ no | **Rollback reference.** fp32 + base lexicon snapshot from the pre-quantization iteration. Kept so we can A/B against v2 if a quality regression surfaces, or re-promote v1 by flipping the manifest if a v2 issue blocks the release. |
36
+ | `threadcast-kitten-nano-en-v1.zip` | ⚠ partial β€” only fetched by cached v1.1.x installs that already have it on disk | **Legacy compat.** The runtime keeps the `'kitten'` engine path alive so v1.1.x users updating to v1.2.0 don't lose their already-downloaded Plus engine. Same-session fallback also kicks in here if MeloTTS load fails mid-thread. |
37
+
38
+ Neither legacy zip costs the user bandwidth β€” they're only fetched if the app explicitly asks for them. The cost is HF storage, which has no quota for our footprint (~520 MB total across all ten zips).
39
+
40
+ ### Two independent version axes
41
+
42
+ | Axis | What it tracks | Example | When it bumps |
43
+ |---|---|---|---|
44
+ | **`v1/` path segment** | Distribution-layout version (folder structure, filenames pattern, runtime expectations) | `mobile-android/v1/` | Only on a breaking change to how the app fetches or unpacks zips. New app builds would target `v2/` while older ones keep pulling `v1/`. |
45
+ | **`-v<N>` filename suffix** | Per-bundle iteration (model weights, lexicon, quantization) | `threadcast-melo-en-v2.zip` | Whenever a single bundle's contents materially change. New bundle uploaded with a new suffix; the app's [asset manifest](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/lib/neural/asset-manifest.ts) is bumped to match. Older app builds keep pulling the older suffix until their next release. |
46
+
47
+ For MeloTTS: **bundle v1** (fp32 + base 129k-entry lexicon) is the pre-quantization iteration β€” present here as a rollback reference, not fetched by any shipping app build. **Bundle v2** β€” what ships in app v1.2.0 β€” quantizes the model to fp16 (~50% size reduction, <1% MOS drop) and enriches the lexicon with CMUdict latest + g2p_en + Aquila-Resolve neural G2P + curated Reddit/tech/brand/modern-English terms + a punctuation rule so em-dashes render as a short silence instead of being spelled out.
48
 
49
  ---
50
 
51
  ## What's inside each zip
52
 
53
+ The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-{piper,melo,kitten,kokoro}/`, so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules.
54
 
55
  ### Piper β€” shared espeak data
56
 
 
79
 
80
  Five zips total β€” one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.
81
 
82
+ ### MeloTTS (Local AI Plus, current) β€” single bundle, all accents
83
+
84
+ ```
85
+ threadcast-melo-en-v2.zip
86
+ β”œβ”€β”€ model.fp16.onnx (~87 MB β€” VITS2 acoustic model with BERT prosody assist, ARM-NEON-accelerated fp16)
87
+ β”œβ”€β”€ lexicon.txt (~6 MB β€” enriched CMUdict phoneme dictionary, ~250k+ entries + punctuation silence rules)
88
+ └── tokens.txt (~1 KB β€” phoneme β†’ id map)
89
+ ```
90
+
91
+ One download serves all five English accents β€” **Default, US, UK (British), Indian, Australian** β€” via speaker-id (`sid` 0–4) lookup at synth time. Voice switching is free; no per-accent download.
92
+
93
+ **No `espeak-ng-data/`**: MeloTTS embeds phonemization end-to-end through its CMUdict lexicon. Out-of-vocabulary words are handled by the lexicon itself (which is why it's exhaustive); sherpa-onnx's MeloTTS path does not carry over the upstream `g2p_en` neural OOV fallback, so unrecognized tokens are spelled letter-by-letter at runtime. The v2 lexicon enrichment exists to make that fallback rare.
94
+
95
+ **Why fp16, not int8**: MeloTTS is Conv1d-heavy and suffers a net perf regression on ARM under dynamic int8 quantization (sherpa-onnx #575, Coqui #2991) plus audible distortion (OpenVINO MeloTTS findings). fp16 keeps ARM NEON SIMD acceleration intact while cutting size roughly in half. The Kitten Nano `model.fp16.onnx` precedent on the same sherpa-onnx Android runtime validates the choice.
96
+
97
+ **Never write the upstream engine codename in user-facing surfaces** β€” the Android UI labels this engine as "Local AI Plus" everywhere.
98
+
99
+ ### KittenTTS (Local AI Plus, legacy fallback) β€” single bundle, all voices
100
 
101
  ```
102
  threadcast-kitten-nano-en-v1.zip
 
107
  └── tokens.txt
108
  ```
109
 
110
+ Was the Local AI Plus engine in ThreadCast v1.1.x. **Replaced by MeloTTS in v1.2.0** β€” tester feedback consistently rated Kitten's quality at the same tier as Piper Lite despite the marketing uplift, so Plus was rebuilt on the larger MeloTTS model.
111
+
112
+ Kept in this mirror so existing v1.1.x installs that already cached the Kitten bundle don't re-download Plus on update; the engine code keeps the `'kitten'` runtime path alive for same-session fallback if MeloTTS load fails mid-thread.
113
 
114
  ### Kokoro (Local AI Studio) β€” single bundle, all voices
115
 
 
135
  urls[1] = FALLBACK_BASE = (configurable β€” sibling HF repo, GitHub Release, private CDN, …)
136
  ```
137
 
138
+ Both bases need to serve the **same filenames** (the six actively-fetched zips listed above; legacy and rollback zips are HF-only and don't need fallback coverage). The fallback host is configured at app build time:
139
 
140
  | Env var (set at build time) | Default | Purpose |
141
  |---|---|---|
 
158
  pnpm --filter mobile produce:neural-zips
159
  ```
160
 
161
+ Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + MeloTTS English fp16 + enriched lexicon + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the actively-shipped zips with the correct internal layouts. The legacy `threadcast-melo-en-v1.zip` and rollback artifacts are not regenerated β€” they live only on HF as historical snapshots.
162
+
163
+ ⚠ **If you've edited `lexicon.txt` in the Melo staging folder**, re-run this script before uploading β€” the producer doesn't watch source files, so the existing `threadcast-melo-en-v2.zip` on disk may be stale.
164
 
165
  2. **Upload to Hugging Face** β€” the user-facing primary mirror lives at:
166
 
167
  <https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>
168
 
169
+ Drop the freshly-built zips into that tree (HF overwrites by filename). HF preserves filenames as-is. No build steps, no metadata munging β€” they're just static assets behind a CDN. Don't delete the legacy `threadcast-melo-en-v1.zip` or `threadcast-kitten-nano-en-v1.zip` β€” see the retention table above.
170
 
171
+ 3. **(Optional) Mirror to a fallback host.** Same actively-fetched filenames at any HTTPS endpoint. Common picks:
172
  - Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
173
  - GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
174
  - Private CDN (R2, S3, etc.)
 
181
 
182
  | User intent | Files pulled | Network |
183
  |---|---|---|
184
+ | Install **first** Local AI Lite voice (e.g. Amy) | `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` | ~67 MB |
185
+ | Install **another** Local AI Lite voice (e.g. Lessac) | `threadcast-piper-en_US-lessac-medium-v1.zip` | ~58 MB |
186
+ | Install Local AI Plus (v1.2.0+) | `threadcast-melo-en-v2.zip` | ~81 MB |
187
+ | Install Local AI Plus (legacy v1.1.x cached) | `threadcast-kitten-nano-en-v1.zip` | ~29 MB |
188
+ | Install Local AI Studio | `threadcast-kokoro-int8-en-v1.zip` | ~103 MB |
189
+ | Install every active engine, all 5 Lite voices + Melo + Studio | the six active zips above | ~425 MB |
190
+
191
+ The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there β€” MeloTTS Plus is the sweet spot at ~81 MB for a five-accent multi-speaker English model.
192
 
193
  ---
194