barakplasma commited on
Commit
d94e0c9
·
verified ·
1 Parent(s): 352bf93

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +136 -18
README.md CHANGED
@@ -1,26 +1,144 @@
1
  ---
2
  license: other
3
- library_name: mediapipe
4
- pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
- # TranslateGemma 4B IT - Quantized Android Task Bundles
8
 
9
- Generated: `2026-03-30T15:52:12.740507+00:00`
 
 
10
 
11
- - Native quant capability: `False`
12
- - Reason: `gemma3 4b builder missing; available=['build_model_1b', 'build_model_270m']`
13
- - Plan mode: `native_quant_unavailable`
14
 
15
- | Requested quant | Status | Built from | Task file | Size (bytes) |
16
- |---|---|---|---|---|
17
- | `int4` | ↪️ aliased | `none` | `translategemma-4b-it-none.task` | `15529421499` |
18
- | `int8` | ↪️ aliased | `none` | `translategemma-4b-it-none.task` | `15529421499` |
19
- | `fp8` | ⏭️ unsupported by converter | - | `-` | `0` |
20
- | `float16` | ❌ failed (rc=1) | `self` | `-` | `0` |
21
- | `dynamic_int8` | ↪️ aliased | `none` | `translategemma-4b-it-none.task` | `15529421499` |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- ## Notes
24
- - Aliased entries are not rebuilt; they point to an equivalent built variant.
25
- - `fp8` is often unsupported in current converter/runtime stacks.
26
- - Verify on-device compatibility before public release.
 
1
  ---
2
  license: other
3
+ library_name: litert
4
+ base_model: google/translategemma-4b-it
5
+ pipeline_tag: translation
6
+ tags:
7
+ - android
8
+ - on-device
9
+ - litert
10
+ - tflite
11
+ - translation
12
+ - gemma3
13
+ - google-ai-edge
14
  ---
15
 
16
+ # TranslateGemma 4B IT Android / Google AI Edge Bundles
17
 
18
+ On-device translation model for Android using [Google AI Edge](https://ai.google.dev/edge).
19
+ Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
20
+ into formats that run locally on Android without internet or cloud APIs.
21
 
22
+ Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible bundles
23
+ in both `.litertlm` (LiteRT-LM, recommended) and `.task` (MediaPipe, legacy) formats.
 
24
 
25
+ ---
26
+
27
+ ## Files
28
+
29
+ | File | Format | Size | Notes |
30
+ |------|--------|------|-------|
31
+ | `artifacts/int4/translategemma-4b-it-native-int4.litertlm` | LiteRT-LM | ~2 GB | INT4 weight-only, KV-cache, Jinja template embedded |
32
+ | `artifacts/dynamic_int8/translategemma-4b-it-native-dynamic_int8.litertlm` | LiteRT-LM | ~4 GB | Dynamic INT8 *(uploading)* |
33
+ | `artifacts/int4/translategemma-4b-it-native-int4.task` | MediaPipe | ~2 GB | INT4, KV-cache |
34
+
35
+ **Start with `dynamic_int8`** — better translation quality than INT4. Use INT4 if RAM is tight.
36
+
37
+ ---
38
+
39
+ ## Quick Start — Google AI Edge Gallery (Android)
40
+
41
+ 1. Download a `.litertlm` file above
42
+ 2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
43
+ 3. Import the model → select your `.litertlm` file
44
+ 4. Use **Prompt Lab** mode for best results (see below)
45
+
46
+ ### Prompt Lab mode (recommended)
47
+
48
+ Set this as your **System Prompt**, then type text to translate in the input box:
49
+
50
+ ```
51
+ <start_of_turn>user
52
+ You are a professional English (en) to Spanish (es) translator. Your goal is to accurately convey the meaning and nuances of the original English text while adhering to Spanish grammar, vocabulary, and cultural sensitivities.
53
+ Produce only the Spanish translation, without any additional explanations or commentary. Please translate the following English text into Spanish:
54
+
55
+
56
+ {{input}}<end_of_turn>
57
+ <start_of_turn>model
58
+ ```
59
+
60
+ For other language pairs, replace `English (en)` / `Spanish (es)` with your source and target language.
61
+
62
+ ### AI Chat mode
63
+
64
+ The `.litertlm` bundles have an embedded chat template. Just type your text — the model will attempt to translate it. Quality may vary since the app doesn't know source/target languages without explicit instructions.
65
+
66
+ ---
67
+
68
+ ## Device Requirements
69
+
70
+ | Spec | Minimum |
71
+ |------|---------|
72
+ | RAM | 6 GB free (INT4) / 8 GB free (INT8) |
73
+ | Storage | 2 GB (INT4) / 4 GB (INT8) |
74
+ | OS | Android 10+ |
75
+ | Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
76
+
77
+ Tested on Pixel 10 (12 GB RAM). Both INT4 and INT8 load without "No KV cache" errors.
78
+
79
+ ---
80
+
81
+ ## What's Different From Google's Official Files
82
+
83
+ Google's official TranslateGemma TFLite files target **WebGPU only** — they don't work with MediaPipe LLM inference on Android CPU.
84
+
85
+ This repo's files use **Strategy 1** native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
86
+ - Produces proper **prefill + decode signatures** with KV cache (required by MediaPipe / LiteRT-LM)
87
+ - Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
88
+ - Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
89
+ - Quantizes weights natively during TFLite export (not post-hoc)
90
+
91
+ ---
92
+
93
+ ## Conversion Scripts
94
+
95
+ The `scripts/` folder contains the full conversion pipeline:
96
+
97
+ | Script | Purpose |
98
+ |--------|---------|
99
+ | `scripts/convert_translategemma_android.py` | Single-quant conversion: Strategy 1 (litert-torch native) → Strategy 2 (generic fallback) |
100
+ | `scripts/multi_quant_build_upload.py` | Batch conversion + upload for multiple quant levels |
101
+ | `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with LlmMetadata |
102
+
103
+ ### Reproduce a build
104
+
105
+ Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`
106
+
107
+ ```bash
108
+ # Clone LiteRT-LM builder (needed by bundle_litertlm.py)
109
+ git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
110
+
111
+ pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub flatc
112
+
113
+ # Download model
114
+ huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
115
+
116
+ # Convert to TFLite with KV cache (~10 min, needs ~128 GB RAM)
117
+ python scripts/convert_translategemma_android.py \
118
+ --model-dir ./translategemma-4b-it \
119
+ --tflite-dir ./tflite_output/dynamic_int8 \
120
+ --output-dir ./output \
121
+ --task-file ./output/translategemma-4b-it-native-dynamic_int8.task \
122
+ --quantize dynamic_int8 \
123
+ --prefill-seq-len 1024 --kv-cache-max-len 1024
124
+
125
+ # Bundle as .litertlm
126
+ python scripts/bundle_litertlm.py \
127
+ --tflite ./tflite_output/dynamic_int8/*.tflite \
128
+ --tokenizer ./translategemma-4b-it/tokenizer.model \
129
+ --output ./output/translategemma-4b-it-native-dynamic_int8.litertlm \
130
+ --quant dynamic_int8
131
+ ```
132
+
133
+ ---
134
+
135
+ ## Supported Languages
136
+
137
+ TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
138
+
139
+ ---
140
+
141
+ ## License
142
 
143
+ Model weights: [Google Gemma Terms of Use](https://ai.google.dev/gemma/terms)
144
+ Conversion scripts: Apache 2.0