File size: 5,063 Bytes

3050819
 
d94e0c9
 
 
 
 
 
 
 
 
 
 
3050819
 
d94e0c9
3050819
d94e0c9
 
 
3050819
cd2eec1
3050819
d94e0c9
 
 
 
cd2eec1
 
 
 
d94e0c9
cd2eec1
d94e0c9
 
 
 
 
 
 
 
cd2eec1
d94e0c9
cd2eec1
d94e0c9
cd2eec1
d94e0c9
 
cd2eec1
 
d94e0c9
cd2eec1
d94e0c9
cd2eec1
 
 
 
 
 
 
 
 
 
 
d94e0c9
 
cd2eec1
d94e0c9
cd2eec1
d94e0c9
 
 
 
 
 
 
cd2eec1
 
d94e0c9
 
 
 
 
 
 
 
 
cd2eec1
 
d94e0c9
cd2eec1
d94e0c9
cd2eec1
d94e0c9
 
 
 
 
 
 
 
 
cd2eec1
 
 
d94e0c9
 
 
 
 
 
 
 
 
cd2eec1
d94e0c9
 
 
 
cd2eec1
d94e0c9
 
 
 
cd2eec1
d94e0c9
cd2eec1
d94e0c9
 
 
 
 
cd2eec1
d94e0c9
 
 
 
 
 
 
cd2eec1
d94e0c9
 
 
 
3050819
d94e0c9

---
license: other
library_name: litert
base_model: google/translategemma-4b-it
pipeline_tag: translation
tags:
  - android
  - on-device
  - litert
  - tflite
  - translation
  - gemma3
  - google-ai-edge
---

# TranslateGemma 4B IT — Android / Google AI Edge Bundles

On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge).
Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
into formats that run locally on Android without internet or cloud APIs.

Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template.

---

## Files

| File | Size | Notes |
|------|------|-------|
| `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` | ~2 GB | INT4 blockwise quant — faster, lower RAM |
| `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` | ~4 GB | Dynamic INT8 — better quality |

**Start with INT4** if you're unsure — it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.

---

## Quick Start — Google AI Edge Gallery (Android)

1. Download a `.litertlm` file above
2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
3. Import the model → select your `.litertlm` file
4. Use **AI Chat** mode

### Input format

The embedded template supports structured input for any language pair:

```
<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
```

**Examples:**

```
<src>he</src><dst>en</dst><text>שלום עולם</text>
```
```
<src>en</src><dst>he</dst><text>good morning</text>
```
```
<src>en</src><dst>fr</dst><text>hello world</text>
```
```
<src>ja</src><dst>en</dst><text>ありがとうございます</text>
```

Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc.

Plain text (no tags) is also accepted — the model will attempt translation based on context.

---

## Device Requirements

| Spec | Minimum |
|------|---------|
| RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) |
| Storage | 2 GB (INT4) / 4 GB (dynamic_int8) |
| OS | Android 10+ |
| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |

---

## What's Different From Google's Official Files

Google's official TranslateGemma TFLite files target **WebGPU only** — they don't work with MediaPipe LLM inference on Android CPU.

This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
- Produces proper **prefill + decode signatures** with KV cache (required by LiteRT-LM)
- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
- Fixes `qkv_fused_interleaved=False` (critical — wrong default caused garbage output in all early builds)
- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
- Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags

---

## Conversion Scripts

The `scripts/` folder contains the full conversion pipeline:

| Script | Purpose |
|--------|---------|
| `scripts/convert_translategemma_android.py` | Single-quant conversion via litert-torch native strategy |
| `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template |
| `scripts/multi_quant_build_upload.py` | Batch conversion + HuggingFace upload |

### Reproduce a build

Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`

```bash
# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm

pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub

# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it

# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
python scripts/convert_translategemma_android.py \
  --model-dir ./translategemma-4b-it \
  --tflite-dir ./tflite_output/dynamic_int8 \
  --output-dir ./output \
  --task-file ./output/translategemma-4b-it-dynamic_int8.task \
  --quantize dynamic_int8 \
  --prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token

# Bundle as .litertlm
python scripts/bundle_litertlm.py \
  --tflite ./tflite_output/dynamic_int8/*.tflite \
  --tokenizer ./translategemma-4b-it/tokenizer.model \
  --output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
  --quant dynamic_int8
```

---

## Supported Languages

TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.

---

## License

Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms)  
Conversion scripts: Apache 2.0