Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -19,20 +19,18 @@ On-device translation model for Android using [Google AI Edge](https://ai.google
|
|
| 19 |
Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
|
| 20 |
into formats that run locally on Android without internet or cloud APIs.
|
| 21 |
|
| 22 |
-
Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible bundles
|
| 23 |
-
in both `.litertlm` (LiteRT-LM, recommended) and `.task` (MediaPipe, legacy) formats.
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
## Files
|
| 28 |
|
| 29 |
-
| File |
|
| 30 |
-
|------|------
|
| 31 |
-
| `artifacts/int4/translategemma-4b-it-
|
| 32 |
-
| `artifacts/dynamic_int8/translategemma-4b-it-
|
| 33 |
-
| `artifacts/int4/translategemma-4b-it-native-int4.task` | MediaPipe | ~2 GB | INT4, KV-cache |
|
| 34 |
|
| 35 |
-
**Start with
|
| 36 |
|
| 37 |
---
|
| 38 |
|
|
@@ -41,27 +39,34 @@ in both `.litertlm` (LiteRT-LM, recommended) and `.task` (MediaPipe, legacy) for
|
|
| 41 |
1. Download a `.litertlm` file above
|
| 42 |
2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
|
| 43 |
3. Import the model β select your `.litertlm` file
|
| 44 |
-
4. Use **
|
| 45 |
|
| 46 |
-
###
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
```
|
| 51 |
-
<
|
| 52 |
-
|
| 53 |
-
Produce only the Spanish translation, without any additional explanations or commentary. Please translate the following English text into Spanish:
|
| 54 |
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
-
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
```
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
### AI Chat mode
|
| 63 |
|
| 64 |
-
|
| 65 |
|
| 66 |
---
|
| 67 |
|
|
@@ -69,24 +74,23 @@ The `.litertlm` bundles have an embedded chat template. Just type your text β
|
|
| 69 |
|
| 70 |
| Spec | Minimum |
|
| 71 |
|------|---------|
|
| 72 |
-
| RAM | 6 GB free (INT4) / 8 GB free (
|
| 73 |
-
| Storage | 2 GB (INT4) / 4 GB (
|
| 74 |
| OS | Android 10+ |
|
| 75 |
| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
|
| 76 |
|
| 77 |
-
Tested on Pixel 10 (12 GB RAM). Both INT4 and INT8 load without "No KV cache" errors.
|
| 78 |
-
|
| 79 |
---
|
| 80 |
|
| 81 |
## What's Different From Google's Official Files
|
| 82 |
|
| 83 |
Google's official TranslateGemma TFLite files target **WebGPU only** β they don't work with MediaPipe LLM inference on Android CPU.
|
| 84 |
|
| 85 |
-
This repo's files use
|
| 86 |
-
- Produces proper **prefill + decode signatures** with KV cache (required by
|
| 87 |
- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
|
|
|
|
| 88 |
- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
|
| 89 |
-
-
|
| 90 |
|
| 91 |
---
|
| 92 |
|
|
@@ -96,9 +100,9 @@ The `scripts/` folder contains the full conversion pipeline:
|
|
| 96 |
|
| 97 |
| Script | Purpose |
|
| 98 |
|--------|---------|
|
| 99 |
-
| `scripts/convert_translategemma_android.py` | Single-quant conversion
|
| 100 |
-
| `scripts/
|
| 101 |
-
| `scripts/
|
| 102 |
|
| 103 |
### Reproduce a build
|
| 104 |
|
|
@@ -108,25 +112,25 @@ Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`
|
|
| 108 |
# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
|
| 109 |
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
|
| 110 |
|
| 111 |
-
pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub
|
| 112 |
|
| 113 |
# Download model
|
| 114 |
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
|
| 115 |
|
| 116 |
-
# Convert to TFLite with KV cache (~
|
| 117 |
python scripts/convert_translategemma_android.py \
|
| 118 |
--model-dir ./translategemma-4b-it \
|
| 119 |
--tflite-dir ./tflite_output/dynamic_int8 \
|
| 120 |
--output-dir ./output \
|
| 121 |
-
--task-file ./output/translategemma-4b-it-
|
| 122 |
--quantize dynamic_int8 \
|
| 123 |
-
--prefill-seq-len 1024 --kv-cache-max-len 1024
|
| 124 |
|
| 125 |
# Bundle as .litertlm
|
| 126 |
python scripts/bundle_litertlm.py \
|
| 127 |
--tflite ./tflite_output/dynamic_int8/*.tflite \
|
| 128 |
--tokenizer ./translategemma-4b-it/tokenizer.model \
|
| 129 |
-
--output ./output/translategemma-4b-it-
|
| 130 |
--quant dynamic_int8
|
| 131 |
```
|
| 132 |
|
|
@@ -134,7 +138,7 @@ python scripts/bundle_litertlm.py \
|
|
| 134 |
|
| 135 |
## Supported Languages
|
| 136 |
|
| 137 |
-
TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
|
| 138 |
|
| 139 |
---
|
| 140 |
|
|
|
|
| 19 |
Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
|
| 20 |
into formats that run locally on Android without internet or cloud APIs.
|
| 21 |
|
| 22 |
+
Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template.
|
|
|
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
## Files
|
| 27 |
|
| 28 |
+
| File | Size | Notes |
|
| 29 |
+
|------|------|-------|
|
| 30 |
+
| `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` | ~2 GB | INT4 blockwise quant β faster, lower RAM |
|
| 31 |
+
| `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` | ~4 GB | Dynamic INT8 β better quality |
|
|
|
|
| 32 |
|
| 33 |
+
**Start with INT4** if you're unsure β it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.
|
| 34 |
|
| 35 |
---
|
| 36 |
|
|
|
|
| 39 |
1. Download a `.litertlm` file above
|
| 40 |
2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
|
| 41 |
3. Import the model β select your `.litertlm` file
|
| 42 |
+
4. Use **AI Chat** mode
|
| 43 |
|
| 44 |
+
### Input format
|
| 45 |
|
| 46 |
+
The embedded template supports structured input for any language pair:
|
| 47 |
|
| 48 |
```
|
| 49 |
+
<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
|
| 50 |
+
```
|
|
|
|
| 51 |
|
| 52 |
+
**Examples:**
|
| 53 |
|
| 54 |
+
```
|
| 55 |
+
<src>he</src><dst>en</dst><text>Χ©ΧΧΧ Χ’ΧΧΧ</text>
|
| 56 |
+
```
|
| 57 |
+
```
|
| 58 |
+
<src>en</src><dst>he</dst><text>good morning</text>
|
| 59 |
+
```
|
| 60 |
+
```
|
| 61 |
+
<src>en</src><dst>fr</dst><text>hello world</text>
|
| 62 |
+
```
|
| 63 |
+
```
|
| 64 |
+
<src>ja</src><dst>en</dst><text>γγγγ¨γγγγγΎγ</text>
|
| 65 |
```
|
| 66 |
|
| 67 |
+
Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc.
|
|
|
|
|
|
|
| 68 |
|
| 69 |
+
Plain text (no tags) is also accepted β the model will attempt translation based on context.
|
| 70 |
|
| 71 |
---
|
| 72 |
|
|
|
|
| 74 |
|
| 75 |
| Spec | Minimum |
|
| 76 |
|------|---------|
|
| 77 |
+
| RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) |
|
| 78 |
+
| Storage | 2 GB (INT4) / 4 GB (dynamic_int8) |
|
| 79 |
| OS | Android 10+ |
|
| 80 |
| Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
|
| 81 |
|
|
|
|
|
|
|
| 82 |
---
|
| 83 |
|
| 84 |
## What's Different From Google's Official Files
|
| 85 |
|
| 86 |
Google's official TranslateGemma TFLite files target **WebGPU only** β they don't work with MediaPipe LLM inference on Android CPU.
|
| 87 |
|
| 88 |
+
This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
|
| 89 |
+
- Produces proper **prefill + decode signatures** with KV cache (required by LiteRT-LM)
|
| 90 |
- Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
|
| 91 |
+
- Fixes `qkv_fused_interleaved=False` (critical β wrong default caused garbage output in all early builds)
|
| 92 |
- Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
|
| 93 |
+
- Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags
|
| 94 |
|
| 95 |
---
|
| 96 |
|
|
|
|
| 100 |
|
| 101 |
| Script | Purpose |
|
| 102 |
|--------|---------|
|
| 103 |
+
| `scripts/convert_translategemma_android.py` | Single-quant conversion via litert-torch native strategy |
|
| 104 |
+
| `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template |
|
| 105 |
+
| `scripts/multi_quant_build_upload.py` | Batch conversion + HuggingFace upload |
|
| 106 |
|
| 107 |
### Reproduce a build
|
| 108 |
|
|
|
|
| 112 |
# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
|
| 113 |
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
|
| 114 |
|
| 115 |
+
pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub
|
| 116 |
|
| 117 |
# Download model
|
| 118 |
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
|
| 119 |
|
| 120 |
+
# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
|
| 121 |
python scripts/convert_translategemma_android.py \
|
| 122 |
--model-dir ./translategemma-4b-it \
|
| 123 |
--tflite-dir ./tflite_output/dynamic_int8 \
|
| 124 |
--output-dir ./output \
|
| 125 |
+
--task-file ./output/translategemma-4b-it-dynamic_int8.task \
|
| 126 |
--quantize dynamic_int8 \
|
| 127 |
+
--prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token
|
| 128 |
|
| 129 |
# Bundle as .litertlm
|
| 130 |
python scripts/bundle_litertlm.py \
|
| 131 |
--tflite ./tflite_output/dynamic_int8/*.tflite \
|
| 132 |
--tokenizer ./translategemma-4b-it/tokenizer.model \
|
| 133 |
+
--output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
|
| 134 |
--quant dynamic_int8
|
| 135 |
```
|
| 136 |
|
|
|
|
| 138 |
|
| 139 |
## Supported Languages
|
| 140 |
|
| 141 |
+
TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
|
| 142 |
|
| 143 |
---
|
| 144 |
|