Upload README.md

380bf54 verified 8 days ago

7.74 kB

	# ThreadCast — Android Production Zips

	Distributed mirror for the Android app's neural TTS assets. The eight zips here are downloaded by the app at runtime — first install of a neural engine pulls only what's needed (~74 MB for one Piper voice + shared data, ~26 MB for the Plus bundle, or ~145 MB for the full Kokoro bundle), and the user can manage each model individually from inside the app.

	> Sibling: `../extension/` holds the Chrome extension's neural models in raw HF format. Both subtrees share the same engine families (Piper VITS, KittenTTS-nano, Kokoro StyleTTS2) — only the on-disk packaging differs.

	---

	## Layout

	```
	mobile-android/
	└── v1/
	├── threadcast-piper-shared-v1.zip (~11 MB) — espeak phonemizer data, downloaded once
	├── threadcast-piper-en_US-amy-medium-v1.zip (~63 MB) — Amy voice
	├── threadcast-piper-en_US-lessac-medium-v1.zip (~63 MB) — Lessac voice
	├── threadcast-piper-en_US-ryan-medium-v1.zip (~63 MB) — Ryan voice
	├── threadcast-piper-en_US-hfc_female-medium-v1.zip (~63 MB) — HFC Female voice
	├── threadcast-piper-en_US-hfc_male-medium-v1.zip (~63 MB) — HFC Male voice
	├── threadcast-kitten-nano-en-v1.zip (~26 MB) — KittenTTS nano v0.1 fp16 (all 8 voices; "Local AI Plus")
	└── threadcast-kokoro-int8-en-v1.zip (~145 MB) — Kokoro int8 v0.19 (all 11 voices; "Local AI Studio")
	```

	Versioning: the `v1/` segment is part of the URL the runtime requests. Bumping to `v2/` lets future format changes ship without breaking older app builds — old apps keep pulling `v1/`, new apps pull `v2/`.

	---

	## What's inside each zip

	The native [`AssetInstaller`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/mobile/modules/threadcast-neural/android/src/main/java/app/threadcast/neural/AssetInstaller.kt) extracts each zip directly under `filesDir/sherpa-piper/` (Piper) or `filesDir/sherpa-kokoro/` (Kokoro), so the zip's internal layout = the on-device layout. No re-rooting, no per-file rules.

	### Piper — shared espeak data

	```
	threadcast-piper-shared-v1.zip
	└── espeak-ng-data/
	├── phontab
	├── phonindex
	├── phondata
	├── intonations
	├── lang/
	├── voices/
	└── … (full espeak-ng tree)
	```

	Downloaded once on the user's first Piper voice install. Skipped on every subsequent voice install.

	### Piper — one zip per voice

	```
	threadcast-piper-en_US-amy-medium-v1.zip
	└── en_US-amy-medium/
	├── en_US-amy-medium.onnx (~63 MB)
	└── tokens.txt
	```

	Five zips total — one per voice. Users only download the voices they want; selecting Amy doesn't pull Ryan.

	### Kitten (Local AI Plus) — single bundle, all voices

	```
	threadcast-kitten-nano-en-v1.zip
	├── espeak-ng-data/ (separate copy from Piper's / Kokoro's — different on-disk root)
	│ └── …
	├── model.fp16.onnx (~24 MB)
	├── voices.bin (~30 KB — 8 speaker embeddings)
	└── tokens.txt
	```

	One download serves all 8 Plus voices — same style-vector-lookup pattern as Kokoro. The 8 speakers are baked into `voices.bin` in the order documented in `packages/mobile/modules/threadcast-neural/index.ts::KITTEN_VOICES`. Never write the upstream engine codename in user-facing surfaces — the Android UI labels this engine as "Local AI Plus" everywhere.

	### Kokoro (Local AI Studio) — single bundle, all voices

	```
	threadcast-kokoro-int8-en-v1.zip
	├── espeak-ng-data/ (separate copy from Piper's — different on-disk root)
	│ └── …
	├── model.int8.onnx (~135 MB)
	├── voices.bin (~5.7 MB — concatenated speaker embeddings, all 11)
	└── tokens.txt
	```

	One download serves every Kokoro voice — switching speakers is a free style-vector lookup at synth time.

	---

	## How the runtime fetches these

	Mirror-and-fallback — same pattern the extension uses for its `Pixel-Labs/threadcast-neural-models/{hf-cpu-mirror,hf-gpu-mirror}/` paths (see [`extension/src/offscreen/mirror-fetch.ts`](https://github.com/Pixel-Labs/Reddit-Reader/blob/main/packages/extension/src/offscreen/mirror-fetch.ts)). Each download is an ordered list of URLs; the installer tries each on failure and only surfaces an error if every URL is unreachable.

	```
	urls[0] = PRIMARY_BASE = https://huggingface.co/Pixel-Labs/threadcast-neural-models/resolve/main/mobile-android/v1
	urls[1] = FALLBACK_BASE = (configurable — sibling HF repo, GitHub Release, private CDN, …)
	```

	Both bases need to serve the same filenames (the eight listed above). The fallback host is configured at app build time:

	\| Env var (set at build time) \| Default \| Purpose \|
	\|---\|---\|---\|
	\| `EXPO_PUBLIC_NEURAL_ASSETS_BASE_URL` \| `https://huggingface.co/Pixel-Labs/threadcast-neural-models/resolve/main/mobile-android/v1` \| Primary mirror \|
	\| `EXPO_PUBLIC_NEURAL_ASSETS_FALLBACK_URL` \| unset \| Optional fallback (omit for primary-only) \|

	The native installer:
	- Tries `urls[0]` first. On HTTP 4xx/5xx or transport error, logs a warning and tries `urls[1]`.
	- Only the LAST URL's failure surfaces to the UI as a download error — transient mirror outages are silent.
	- Cancellation aborts immediately at any URL boundary.
	- Verifies post-extract that every required file exists with non-zero size; rejects malformed zips even if the HTTP layer succeeded.

	---

	## Publishing workflow

	1. Produce the zips from the local sherpa-onnx staging tree:

	```sh
	pnpm --filter mobile produce:neural-zips
	```

	Writes to `packages/mobile/dist/neural-assets/`. The script reads the locally-staged sherpa-onnx upstream artifacts (Piper voice ONNX files + KittenTTS-nano v0.1 fp16 bundle + Kokoro int8 v0.19 bundle + shared espeak-ng-data) and emits the eight zips with the correct internal layouts.

	2. Upload to Hugging Face — the user-facing primary mirror lives at:

	<https://huggingface.co/Pixel-Labs/threadcast-neural-models/tree/main/mobile-android/v1>

	Drop all eight zips into that tree. HF preserves filenames as-is. No build steps, no metadata munging — they're just static assets behind a CDN.

	3. (Optional) Mirror to a fallback host. Same eight filenames at any HTTPS endpoint. Common picks:
	- Sibling HF repo (`Pixel-Labs/threadcast-neural-models-mirror/v1/...`)
	- GitHub Release with `gh release upload v1 dist/neural-assets/*.zip`
	- Private CDN (R2, S3, etc.)

	Then ship the next app build with `EXPO_PUBLIC_NEURAL_ASSETS_FALLBACK_URL` set to the mirror's base URL.

	---

	## Per-engine download cost (user-facing)

	\| User intent \| Files pulled \| Network \|
	\|---\|---\|---\|
	\| Install first Local AI Lite voice (e.g. Amy) \| `threadcast-piper-shared-v1.zip` + `threadcast-piper-en_US-amy-medium-v1.zip` \| ~74 MB \|
	\| Install another Local AI Lite voice (e.g. Lessac) \| `threadcast-piper-en_US-lessac-medium-v1.zip` \| ~63 MB \|
	\| Install Local AI Plus \| `threadcast-kitten-nano-en-v1.zip` \| ~26 MB \|
	\| Install Local AI Studio \| `threadcast-kokoro-int8-en-v1.zip` \| ~145 MB \|
	\| Install every engine, all 5 Lite voices + Plus + Studio \| every zip in `v1/` \| ~496 MB \|

	The whole-bundle worst case is comparable to Spotify's "download an album for offline" workflow. Most users will pick one tier and stop there — Plus is the sweet spot at 26 MB for an 8-voice multi-speaker model.

	---

	## License

	Per-project licenses retained from upstream — see the [parent README](../README.md#license) for the consolidated summary.