barakplasma commited on
Commit
cd2eec1
Β·
verified Β·
1 Parent(s): 9d6b376

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +40 -36
README.md CHANGED
@@ -19,20 +19,18 @@ On-device translation model for Android using [Google AI Edge](https://ai.google
19
  Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
20
  into formats that run locally on Android without internet or cloud APIs.
21
 
22
- Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible bundles
23
- in both `.litertlm` (LiteRT-LM, recommended) and `.task` (MediaPipe, legacy) formats.
24
 
25
  ---
26
 
27
  ## Files
28
 
29
- | File | Format | Size | Notes |
30
- |------|--------|------|-------|
31
- | `artifacts/int4/translategemma-4b-it-native-int4.litertlm` | LiteRT-LM | ~2 GB | INT4 weight-only, KV-cache, Jinja template embedded |
32
- | `artifacts/dynamic_int8/translategemma-4b-it-native-dynamic_int8.litertlm` | LiteRT-LM | ~4 GB | Dynamic INT8 *(uploading)* |
33
- | `artifacts/int4/translategemma-4b-it-native-int4.task` | MediaPipe | ~2 GB | INT4, KV-cache |
34
 
35
- **Start with `dynamic_int8`** β€” better translation quality than INT4. Use INT4 if RAM is tight.
36
 
37
  ---
38
 
@@ -41,27 +39,34 @@ in both `.litertlm` (LiteRT-LM, recommended) and `.task` (MediaPipe, legacy) for
41
  1. Download a `.litertlm` file above
42
  2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
43
  3. Import the model β†’ select your `.litertlm` file
44
- 4. Use **Prompt Lab** mode for best results (see below)
45
 
46
- ### Prompt Lab mode (recommended)
47
 
48
- Set this as your **System Prompt**, then type text to translate in the input box:
49
 
50
  ```
51
- <start_of_turn>user
52
- You are a professional English (en) to Spanish (es) translator. Your goal is to accurately convey the meaning and nuances of the original English text while adhering to Spanish grammar, vocabulary, and cultural sensitivities.
53
- Produce only the Spanish translation, without any additional explanations or commentary. Please translate the following English text into Spanish:
54
 
 
55
 
56
- {{input}}<end_of_turn>
57
- <start_of_turn>model
 
 
 
 
 
 
 
 
 
58
  ```
59
 
60
- For other language pairs, replace `English (en)` / `Spanish (es)` with your source and target language.
61
-
62
- ### AI Chat mode
63
 
64
- The `.litertlm` bundles have an embedded chat template. Just type your text β€” the model will attempt to translate it. Quality may vary since the app doesn't know source/target languages without explicit instructions.
65
 
66
  ---
67
 
@@ -69,24 +74,23 @@ The `.litertlm` bundles have an embedded chat template. Just type your text β€”
69
 
70
  | Spec | Minimum |
71
  |------|---------|
72
- | RAM | 6 GB free (INT4) / 8 GB free (INT8) |
73
- | Storage | 2 GB (INT4) / 4 GB (INT8) |
74
  | OS | Android 10+ |
75
  | Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
76
 
77
- Tested on Pixel 10 (12 GB RAM). Both INT4 and INT8 load without "No KV cache" errors.
78
-
79
  ---
80
 
81
  ## What's Different From Google's Official Files
82
 
83
  Google's official TranslateGemma TFLite files target **WebGPU only** β€” they don't work with MediaPipe LLM inference on Android CPU.
84
 
85
- This repo's files use **Strategy 1** native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
86
- - Produces proper **prefill + decode signatures** with KV cache (required by MediaPipe / LiteRT-LM)
87
  - Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
 
88
  - Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
89
- - Quantizes weights natively during TFLite export (not post-hoc)
90
 
91
  ---
92
 
@@ -96,9 +100,9 @@ The `scripts/` folder contains the full conversion pipeline:
96
 
97
  | Script | Purpose |
98
  |--------|---------|
99
- | `scripts/convert_translategemma_android.py` | Single-quant conversion: Strategy 1 (litert-torch native) β†’ Strategy 2 (generic fallback) |
100
- | `scripts/multi_quant_build_upload.py` | Batch conversion + upload for multiple quant levels |
101
- | `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with LlmMetadata |
102
 
103
  ### Reproduce a build
104
 
@@ -108,25 +112,25 @@ Requirements: ~128 GB RAM, Python 3.12, `litert-torch==0.8.0`
108
  # Clone LiteRT-LM builder (needed by bundle_litertlm.py)
109
  git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
110
 
111
- pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub flatc
112
 
113
  # Download model
114
  huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
115
 
116
- # Convert to TFLite with KV cache (~10 min, needs ~128 GB RAM)
117
  python scripts/convert_translategemma_android.py \
118
  --model-dir ./translategemma-4b-it \
119
  --tflite-dir ./tflite_output/dynamic_int8 \
120
  --output-dir ./output \
121
- --task-file ./output/translategemma-4b-it-native-dynamic_int8.task \
122
  --quantize dynamic_int8 \
123
- --prefill-seq-len 1024 --kv-cache-max-len 1024
124
 
125
  # Bundle as .litertlm
126
  python scripts/bundle_litertlm.py \
127
  --tflite ./tflite_output/dynamic_int8/*.tflite \
128
  --tokenizer ./translategemma-4b-it/tokenizer.model \
129
- --output ./output/translategemma-4b-it-native-dynamic_int8.litertlm \
130
  --quant dynamic_int8
131
  ```
132
 
@@ -134,7 +138,7 @@ python scripts/bundle_litertlm.py \
134
 
135
  ## Supported Languages
136
 
137
- TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
138
 
139
  ---
140
 
 
19
  Converts [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) (55 languages, 4B params)
20
  into formats that run locally on Android without internet or cloud APIs.
21
 
22
+ Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible `.litertlm` bundles (LiteRT-LM format) with embedded chat template.
 
23
 
24
  ---
25
 
26
  ## Files
27
 
28
+ | File | Size | Notes |
29
+ |------|------|-------|
30
+ | `artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm` | ~2 GB | INT4 blockwise quant β€” faster, lower RAM |
31
+ | `artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm` | ~4 GB | Dynamic INT8 β€” better quality |
 
32
 
33
+ **Start with INT4** if you're unsure β€” it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.
34
 
35
  ---
36
 
 
39
  1. Download a `.litertlm` file above
40
  2. Open [Google AI Edge Gallery](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery)
41
  3. Import the model β†’ select your `.litertlm` file
42
+ 4. Use **AI Chat** mode
43
 
44
+ ### Input format
45
 
46
+ The embedded template supports structured input for any language pair:
47
 
48
  ```
49
+ <src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>
50
+ ```
 
51
 
52
+ **Examples:**
53
 
54
+ ```
55
+ <src>he</src><dst>en</dst><text>Χ©ΧœΧ•Χ Χ’Χ•ΧœΧ</text>
56
+ ```
57
+ ```
58
+ <src>en</src><dst>he</dst><text>good morning</text>
59
+ ```
60
+ ```
61
+ <src>en</src><dst>fr</dst><text>hello world</text>
62
+ ```
63
+ ```
64
+ <src>ja</src><dst>en</dst><text>γ‚γ‚ŠγŒγ¨γ†γ”γ–γ„γΎγ™</text>
65
  ```
66
 
67
+ Use standard ISO 639-1 language codes: `en`, `he`, `fr`, `es`, `de`, `ar`, `zh`, `ja`, `ko`, `ru`, `pt`, etc.
 
 
68
 
69
+ Plain text (no tags) is also accepted β€” the model will attempt translation based on context.
70
 
71
  ---
72
 
 
74
 
75
  | Spec | Minimum |
76
  |------|---------|
77
+ | RAM | 6 GB free (INT4) / 8 GB free (dynamic_int8) |
78
+ | Storage | 2 GB (INT4) / 4 GB (dynamic_int8) |
79
  | OS | Android 10+ |
80
  | Runtime | Google AI Edge Gallery or LiteRT-LM SDK |
81
 
 
 
82
  ---
83
 
84
  ## What's Different From Google's Official Files
85
 
86
  Google's official TranslateGemma TFLite files target **WebGPU only** β€” they don't work with MediaPipe LLM inference on Android CPU.
87
 
88
+ This repo's files use native conversion via `litert-torch` with a custom `build_translategemma_4b()` builder that:
89
+ - Produces proper **prefill + decode signatures** with KV cache (required by LiteRT-LM)
90
  - Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
91
+ - Fixes `qkv_fused_interleaved=False` (critical β€” wrong default caused garbage output in all early builds)
92
  - Handles the `language_model.` weight prefix in TranslateGemma's multimodal safetensors
93
+ - Embeds a generic Jinja chat template for any language pair via `<src>`/`<dst>`/`<text>` tags
94
 
95
  ---
96
 
 
100
 
101
  | Script | Purpose |
102
  |--------|---------|
103
+ | `scripts/convert_translategemma_android.py` | Single-quant conversion via litert-torch native strategy |
104
+ | `scripts/bundle_litertlm.py` | Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template |
105
+ | `scripts/multi_quant_build_upload.py` | Batch conversion + HuggingFace upload |
106
 
107
  ### Reproduce a build
108
 
 
112
  # Clone LiteRT-LM builder (needed by bundle_litertlm.py)
113
  git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm
114
 
115
+ pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub
116
 
117
  # Download model
118
  huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it
119
 
120
+ # Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
121
  python scripts/convert_translategemma_android.py \
122
  --model-dir ./translategemma-4b-it \
123
  --tflite-dir ./tflite_output/dynamic_int8 \
124
  --output-dir ./output \
125
+ --task-file ./output/translategemma-4b-it-dynamic_int8.task \
126
  --quantize dynamic_int8 \
127
+ --prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token
128
 
129
  # Bundle as .litertlm
130
  python scripts/bundle_litertlm.py \
131
  --tflite ./tflite_output/dynamic_int8/*.tflite \
132
  --tokenizer ./translategemma-4b-it/tokenizer.model \
133
+ --output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
134
  --quant dynamic_int8
135
  ```
136
 
 
138
 
139
  ## Supported Languages
140
 
141
+ TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See [google/translategemma-4b-it](https://huggingface.co/google/translategemma-4b-it) for the full list.
142
 
143
  ---
144