| --- |
| license: other |
| library_name: litert |
| base_model: google/translategemma-4b-it |
| pipeline_tag: translation |
| tags: |
| - android |
| - on-device |
| - litert |
| - tflite |
| - translation |
| - gemma3 |
| - google-ai-edge |
| --- |
| |
| # TranslateGemma 4B IT β Android / Google AI Edge Bundles |
|
|
| On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge). |
| Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params) |
| into formats that run locally on Android without internet or cloud APIs. |
|
|
| Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template. |
|
|
| --- |
|
|
| ## Files |
|
|
| | File | Size | Notes | |
| |------|------|-------| |
| | `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` | ~2 GB | INT4 blockwise quant β faster, lower RAM | |
| | `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` | ~4 GB | Dynamic INT8 β better quality | |
|
|
| **Start with INT4** if you're unsure β it loads faster and uses less RAM. Use dynamic_int8 for better translation quality. |
| |
| --- |
| |
| ## Quick Start β Google AI Edge Gallery (Android) |
| |
| 1. Download a `.litertlm` file above |
| 2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery) |
| 3. Import the model β select your `.litertlm` file |
| 4. Use **AI Chat** mode |
| |
| ### Input format |
| |
| The embedded template supports structured input for any language pair: |
| |
| ``` |
| <src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text> |
| ``` |
| |
| **Examples:** |
| |
| ``` |
| <src>he</src><dst>en</dst><text>Χ©ΧΧΧ Χ’ΧΧΧ</text> |
| ``` |
| ``` |
| <src>en</src><dst>he</dst><text>good morning</text> |
| ``` |
| ``` |
| <src>en</src><dst>fr</dst><text>hello world</text> |
| ``` |
| ``` |
| <src>ja</src><dst>en</dst><text>γγγγ¨γγγγγΎγ</text> |
| ``` |
| |
| Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc. |
| |
| Plain text (no tags) is also accepted β the model will attempt translation based on context. |
| |
| --- |
| |
| ## Device Requirements |
| |
| | Spec | Minimum | |
| |------|---------| |
| | RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) | |
| | Storage | 2 GB (INT4) / 4 GB (dynamic_int8) | |
| | OS | Android 10+ | |
| | Runtime | Google AI Edge Gallery or LiteRT-LM SDK | |
| |
| --- |
| |
| ## What's Different From Google's Official Files |
| |
| Google's official TranslateGemma TFLite files target **WebGPU only** β they don't work with MediaPipe LLM inference on Android CPU. |
| |
| This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that: |
| - Produces proper **prefill + decode signatures** with KV cache (required by LiteRT-LM) |
| - Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer |
| - Fixes `qkv_fused_interleaved=False` (critical β wrong default caused garbage output in all early builds) |
| - Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors |
| - Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags |
|
|
| --- |
|
|
| ## Conversion Scripts |
|
|
| The `scripts/` folder contains the full conversion pipeline: |
|
|
| | Script | Purpose | |
| |--------|---------| |
| | `scripts/convert_translategemma_android.py` | Single-quant conversion via litert-torch native strategy | |
| | `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template | |
| | `scripts/multi_quant_build_upload.py` | Batch conversion + HuggingFace upload | |
|
|
| ### Reproduce a build |
|
|
| Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0` |
|
|
| ```bash |
| # Clone LiteRT-LM builder (needed by bundle_litertlm.py) |
| git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm |
| |
| pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub |
| |
| # Download model |
| huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it |
| |
| # Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM) |
| python scripts/convert_translategemma_android.py \ |
| --model-dir ./translategemma-4b-it \ |
| --tflite-dir ./tflite_output/dynamic_int8 \ |
| --output-dir ./output \ |
| --task-file ./output/translategemma-4b-it-dynamic_int8.task \ |
| --quantize dynamic_int8 \ |
| --prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token |
| |
| # Bundle as .litertlm |
| python scripts/bundle_litertlm.py \ |
| --tflite ./tflite_output/dynamic_int8/*.tflite \ |
| --tokenizer ./translategemma-4b-it/tokenizer.model \ |
| --output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \ |
| --quant dynamic_int8 |
| ``` |
|
|
| --- |
|
|
| ## Supported Languages |
|
|
| TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list. |
|
|
| --- |
|
|
| ## License |
|
|
| Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
| Conversion scripts: Apache 2.0 |
|
|