barakplasma
/

translategemma-4b-it-android-task-quantized

@@ -1,26 +1,144 @@
 ---
 license: other
-library_name: mediapipe
-pipeline_tag: text-generation
 ---
-# TranslateGemma 4B IT - Quantized Android Task Bundles
-Generated: `2026-03-30T15:52:12.740507+00:00`
-- Native quant capability: `False`
-- Reason: `gemma3 4b builder missing; available=['build_model_1b', 'build_model_270m']`
-- Plan mode: `native_quant_unavailable`
-| Requested quant | Status | Built from | Task file | Size (bytes) |
-|---|---|---|---|---|
-| `int4` | ↪️ aliased | `none` | `translategemma-4b-it-none.task` | `15529421499` |
-| `int8` | ↪️ aliased | `none` | `translategemma-4b-it-none.task` | `15529421499` |
-| `fp8` | ⏭️ unsupported by converter | - | `-` | `0` |
-| `float16` | ❌ failed (rc=1) | `self` | `-` | `0` |
-| `dynamic_int8` | ↪️ aliased | `none` | `translategemma-4b-it-none.task` | `15529421499` |
-## Notes
-- Aliased entries are not rebuilt; they point to an equivalent built variant.
-- `fp8` is often unsupported in current converter/runtime stacks.
-- Verify on-device compatibility before public release.

 ---
 license: other
+library_name: litert
+base_model: google/translategemma-4b-it
+pipeline_tag: translation
+tags:
+  - android
+  - on-device
+  - litert
+  - tflite
+  - translation
+  - gemma3
+  - google-ai-edge
 ---
+# TranslateGemma 4B IT — Android / Google AI Edge Bundles
+On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge).
+Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
+into formats that run locally on Android without internet or cloud APIs.
+Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible bundles
+in both `.litertlm` (LiteRT-LM, recommended) and `.task` (MediaPipe, legacy) formats.
+---
+## Files
+| File | Format | Size | Notes |
+|------|--------|------|-------|
+| `artifacts/int4/translategemma-4b-it-native-int4.litertlm` | LiteRT-LM | ~2 GB | INT4 weight-only, KV-cache, Jinja template embedded |
+| `artifacts/dynamic_int8/translategemma-4b-it-native-dynamic_int8.litertlm` | LiteRT-LM | ~4 GB | Dynamic INT8 *(uploading)* |
+| `artifacts/int4/translategemma-4b-it-native-int4.task` | MediaPipe | ~2 GB | INT4, KV-cache |
+**Start with `dynamic_int8`** — better translation quality than INT4. Use INT4 if RAM is tight.
+---
+## Quick Start — Google AI Edge Gallery (Android)
+1. Download a `.litertlm` file above
+2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
+3. Import the model → select your `.litertlm` file
+4. Use **Prompt Lab** mode for best results (see below)
+### Prompt Lab mode (recommended)
+Set this as your **System Prompt**, then type text to translate in the input box:
+```
+<start_of_turn>user
+You are a professional English (en) to Spanish (es) translator. Your goal is to accurately convey the meaning and nuances of the original English text while adhering to Spanish grammar, vocabulary, and cultural sensitivities.
+Produce only the Spanish translation, without any additional explanations or commentary. Please translate the following English text into Spanish:
+{{input}}<end_of_turn>
+<start_of_turn>model
+```
+For other language pairs, replace `English (en)` / `Spanish (es)` with your source and target language.
+### AI Chat mode
+The `.litertlm` bundles have an embedded chat template. Just type your text — the model will attempt to translate it. Quality may vary since the app doesn't know source/target languages without explicit instructions.
+---
+## Device Requirements
+| Spec | Minimum |
+|------|---------|
+| RAM | 6 GB free (INT4) / 8 GB free (INT8) |
+| Storage | 2 GB (INT4) / 4 GB (INT8) |
+| OS | Android 10+ |
+| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
+Tested on Pixel 10 (12 GB RAM). Both INT4 and INT8 load without "No KV cache" errors.
+---
+## What's Different From Google's Official Files
+Google's official TranslateGemma TFLite files target **WebGPU only** — they don't work with MediaPipe LLM inference on Android CPU.
+This repo's files use **Strategy 1** native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
+- Produces proper **prefill + decode signatures** with KV cache (required by MediaPipe / LiteRT-LM)
+- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
+- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
+- Quantizes weights natively during TFLite export (not post-hoc)
+---
+## Conversion Scripts
+The `scripts/` folder contains the full conversion pipeline:
+| Script | Purpose |
+|--------|---------|
+| `scripts/convert_translategemma_android.py` | Single-quant conversion: Strategy 1 (litert-torch native) → Strategy 2 (generic fallback) |
+| `scripts/multi_quant_build_upload.py` | Batch conversion + upload for multiple quant levels |
+| `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with LlmMetadata |
+### Reproduce a build
+Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`
+```bash
+# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
+git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
+pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub flatc
+# Download model
+huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
+# Convert to TFLite with KV cache (~10 min, needs ~128 GB RAM)
+python scripts/convert_translategemma_android.py \
+  --model-dir ./translategemma-4b-it \
+  --tflite-dir ./tflite_output/dynamic_int8 \
+  --output-dir ./output \
+  --task-file ./output/translategemma-4b-it-native-dynamic_int8.task \
+  --quantize dynamic_int8 \
+  --prefill-seq-len 1024 --kv-cache-max-len 1024
+# Bundle as .litertlm
+python scripts/bundle_litertlm.py \
+  --tflite ./tflite_output/dynamic_int8/*.tflite \
+  --tokenizer ./translategemma-4b-it/tokenizer.model \
+  --output ./output/translategemma-4b-it-native-dynamic_int8.litertlm \
+  --quant dynamic_int8
+```
+---
+## Supported Languages
+TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
+---
+## License
+Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms)
+Conversion scripts: Apache 2.0