barakplasma's picture
Upload README.md with huggingface_hub
cd2eec1 verified
---
license: other
library_name: litert
base_model: google/translategemma-4b-it
pipeline_tag: translation
tags:
- android
- on-device
- litert
- tflite
- translation
- gemma3
- google-ai-edge
---
# TranslateGemma 4B IT β€” Android / Google AI Edge Bundles
On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge).
Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
into formats that run locally on Android without internet or cloud APIs.
Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template.
---
## Files
| File | Size | Notes |
|------|------|-------|
| `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` | ~2 GB | INT4 blockwise quant β€” faster, lower RAM |
| `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` | ~4 GB | Dynamic INT8 β€” better quality |
**Start with INT4** if you're unsure β€” it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.
---
## Quick Start β€” Google AI Edge Gallery (Android)
1. Download a `.litertlm` file above
2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
3. Import the model β†’ select your `.litertlm` file
4. Use **AI Chat** mode
### Input format
The embedded template supports structured input for any language pair:
```
<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
```
**Examples:**
```
<src>he</src><dst>en</dst><text>Χ©ΧœΧ•Χ Χ’Χ•ΧœΧ</text>
```
```
<src>en</src><dst>he</dst><text>good morning</text>
```
```
<src>en</src><dst>fr</dst><text>hello world</text>
```
```
<src>ja</src><dst>en</dst><text>γ‚γ‚ŠγŒγ¨γ†γ”γ–γ„γΎγ™</text>
```
Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc.
Plain text (no tags) is also accepted β€” the model will attempt translation based on context.
---
## Device Requirements
| Spec | Minimum |
|------|---------|
| RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) |
| Storage | 2 GB (INT4) / 4 GB (dynamic_int8) |
| OS | Android 10+ |
| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
---
## What's Different From Google's Official Files
Google's official TranslateGemma TFLite files target **WebGPU only** β€” they don't work with MediaPipe LLM inference on Android CPU.
This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
- Produces proper **prefill + decode signatures** with KV cache (required by LiteRT-LM)
- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
- Fixes `qkv_fused_interleaved=False` (critical β€” wrong default caused garbage output in all early builds)
- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
- Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags
---
## Conversion Scripts
The `scripts/` folder contains the full conversion pipeline:
| Script | Purpose |
|--------|---------|
| `scripts/convert_translategemma_android.py` | Single-quant conversion via litert-torch native strategy |
| `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template |
| `scripts/multi_quant_build_upload.py` | Batch conversion + HuggingFace upload |
### Reproduce a build
Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`
```bash
# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub
# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
python scripts/convert_translategemma_android.py \
--model-dir ./translategemma-4b-it \
--tflite-dir ./tflite_output/dynamic_int8 \
--output-dir ./output \
--task-file ./output/translategemma-4b-it-dynamic_int8.task \
--quantize dynamic_int8 \
--prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token
# Bundle as .litertlm
python scripts/bundle_litertlm.py \
--tflite ./tflite_output/dynamic_int8/*.tflite \
--tokenizer ./translategemma-4b-it/tokenizer.model \
--output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
--quant dynamic_int8
```
---
## Supported Languages
TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
---
## License
Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms)
Conversion scripts: Apache 2.0