File size: 5,063 Bytes
3050819 d94e0c9 3050819 d94e0c9 3050819 d94e0c9 3050819 cd2eec1 3050819 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 cd2eec1 d94e0c9 3050819 d94e0c9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
license: other
library_name: litert
base_model: google/translategemma-4b-it
pipeline_tag: translation
tags:
- android
- on-device
- litert
- tflite
- translation
- gemma3
- google-ai-edge
---
# TranslateGemma 4B IT β Android / Google AI Edge Bundles
On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge).
Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
into formats that run locally on Android without internet or cloud APIs.
Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template.
---
## Files
| File | Size | Notes |
|------|------|-------|
| `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` | ~2 GB | INT4 blockwise quant β faster, lower RAM |
| `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` | ~4 GB | Dynamic INT8 β better quality |
**Start with INT4** if you're unsure β it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.
---
## Quick Start β Google AI Edge Gallery (Android)
1. Download a `.litertlm` file above
2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
3. Import the model β select your `.litertlm` file
4. Use **AI Chat** mode
### Input format
The embedded template supports structured input for any language pair:
```
<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
```
**Examples:**
```
<src>he</src><dst>en</dst><text>Χ©ΧΧΧ Χ’ΧΧΧ</text>
```
```
<src>en</src><dst>he</dst><text>good morning</text>
```
```
<src>en</src><dst>fr</dst><text>hello world</text>
```
```
<src>ja</src><dst>en</dst><text>γγγγ¨γγγγγΎγ</text>
```
Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc.
Plain text (no tags) is also accepted β the model will attempt translation based on context.
---
## Device Requirements
| Spec | Minimum |
|------|---------|
| RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) |
| Storage | 2 GB (INT4) / 4 GB (dynamic_int8) |
| OS | Android 10+ |
| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
---
## What's Different From Google's Official Files
Google's official TranslateGemma TFLite files target **WebGPU only** β they don't work with MediaPipe LLM inference on Android CPU.
This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
- Produces proper **prefill + decode signatures** with KV cache (required by LiteRT-LM)
- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
- Fixes `qkv_fused_interleaved=False` (critical β wrong default caused garbage output in all early builds)
- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
- Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags
---
## Conversion Scripts
The `scripts/` folder contains the full conversion pipeline:
| Script | Purpose |
|--------|---------|
| `scripts/convert_translategemma_android.py` | Single-quant conversion via litert-torch native strategy |
| `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template |
| `scripts/multi_quant_build_upload.py` | Batch conversion + HuggingFace upload |
### Reproduce a build
Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`
```bash
# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub
# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
python scripts/convert_translategemma_android.py \
--model-dir ./translategemma-4b-it \
--tflite-dir ./tflite_output/dynamic_int8 \
--output-dir ./output \
--task-file ./output/translategemma-4b-it-dynamic_int8.task \
--quantize dynamic_int8 \
--prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token
# Bundle as .litertlm
python scripts/bundle_litertlm.py \
--tflite ./tflite_output/dynamic_int8/*.tflite \
--tokenizer ./translategemma-4b-it/tokenizer.model \
--output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
--quant dynamic_int8
```
---
## Supported Languages
TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
---
## License
Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms)
Conversion scripts: Apache 2.0
|