Upload README.md with huggingface_hub

cd2eec1 verified 1 day ago

5.06 kB

	---
	license: other
	library_name: litert
	base_model: google/translategemma-4b-it
	pipeline_tag: translation
	tags:
	- android
	- on-device
	- litert
	- tflite
	- translation
	- gemma3
	- google-ai-edge
	---

	# TranslateGemma 4B IT — Android / Google AI Edge Bundles

	On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge).
	Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
	into formats that run locally on Android without internet or cloud APIs.

	Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template.

	---

	## Files

	\| File \| Size \| Notes \|
	\|------\|------\|-------\|
	\| `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` \| ~2 GB \| INT4 blockwise quant — faster, lower RAM \|
	\| `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` \| ~4 GB \| Dynamic INT8 — better quality \|

	Start with INT4 if you're unsure — it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.

	---

	## Quick Start — Google AI Edge Gallery (Android)

	1. Download a `.litertlm` file above
	2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
	3. Import the model → select your `.litertlm` file
	4. Use AI Chat mode

	### Input format

	The embedded template supports structured input for any language pair:

	```
	<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
	```

	Examples:

	```
	<src>he</src><dst>en</dst><text>שלום עולם</text>
	```
	```
	<src>en</src><dst>he</dst><text>good morning</text>
	```
	```
	<src>en</src><dst>fr</dst><text>hello world</text>
	```
	```
	<src>ja</src><dst>en</dst><text>ありがとうございます</text>
	```

	Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc.

	Plain text (no tags) is also accepted — the model will attempt translation based on context.

	---

	## Device Requirements

	\| Spec \| Minimum \|
	\|------\|---------\|
	\| RAM \| 6 GB free (INT4) / 8 GB free (dynamic_int8) \|
	\| Storage \| 2 GB (INT4) / 4 GB (dynamic_int8) \|
	\| OS \| Android 10+ \|
	\| Runtime \| Google AI Edge Gallery or LiteRT-LM SDK \|

	---

	## What's Different From Google's Official Files

	Google's official TranslateGemma TFLite files target WebGPU only — they don't work with MediaPipe LLM inference on Android CPU.

	This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
	- Produces proper prefill + decode signatures with KV cache (required by LiteRT-LM)
	- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
	- Fixes `qkv_fused_interleaved=False` (critical — wrong default caused garbage output in all early builds)
	- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
	- Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags

	---

	## Conversion Scripts

	The `scripts/` folder contains the full conversion pipeline:

	\| Script \| Purpose \|
	\|--------\|---------\|
	\| `scripts/convert_translategemma_android.py` \| Single-quant conversion via litert-torch native strategy \|
	\| `scripts/bundle_litertlm.py` \| Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template \|
	\| `scripts/multi_quant_build_upload.py` \| Batch conversion + HuggingFace upload \|

	### Reproduce a build

	Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`

	```bash
	# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
	git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm

	pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub

	# Download model
	huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it

	# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
	python scripts/convert_translategemma_android.py \
	--model-dir ./translategemma-4b-it \
	--tflite-dir ./tflite_output/dynamic_int8 \
	--output-dir ./output \
	--task-file ./output/translategemma-4b-it-dynamic_int8.task \
	--quantize dynamic_int8 \
	--prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token

	# Bundle as .litertlm
	python scripts/bundle_litertlm.py \
	--tflite ./tflite_output/dynamic_int8/*.tflite \
	--tokenizer ./translategemma-4b-it/tokenizer.model \
	--output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
	--quant dynamic_int8
	```

	---

	## Supported Languages

	TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.

	---

	## License

	Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms)
	Conversion scripts: Apache 2.0