TranslateGemma-4B MNN Q4_0
This is a Q4_0 quantized MNN conversion of Google's TranslateGemma-4B model for on-device translation inference.
Model Details
- Base Model: google/translategemma-4b
- Architecture: Gemma 3 Text (gemma3_text), 34 layers, hidden_size=2560
- Quantization: Q4_0 (4-bit)
- Framework: MNN (Mobile Neural Network) by Alibaba
- Weight Size: ~3.28 GB
- Total Size: ~3.3 GB (including graph + tokenizer)
Files
| File | Size | Description |
|---|---|---|
config.json |
214 B | MNN runtime configuration |
llm_config.json |
~1 KB | Model architecture parameters and prompt templates |
llm.mnn |
3.4 MB | MNN computation graph (FlatBuffers format) |
llm.mnn.weight |
3.28 GB | Q4_0 quantized weights |
tokenizer.txt |
6.1 MB | SentencePiece tokenizer (vocab size: 262144) |
Usage
With MNN llm_demo (CLI)
# Build MNN from source (requires CMake)
git clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=ON -DMNN_BUILD_LLM=ON -DMNN_SUPPORT_TRANSFORMER_FUSE=ON
make -j$(nproc)
# Run translation (interactive mode)
echo "The weather is beautiful today." | ./llm_demo /path/to/config.json
With MNN C++ API (for app integration)
#include "llm.hpp"
// Create and load the model
std::unique_ptr<Llm> llm(Llm::createLLM("/path/to/config.json"));
llm->load();
// Translate (the prompt template is built into llm_config.json)
auto response = llm->response("Hello, how are you?");
// Output: Chinese translation
Customizing Translation Direction
The default llm_config.json is configured for English to Chinese translation. To change the translation direction, modify the system_prompt_template and user_prompt_template fields in llm_config.json.
For example, to translate from Japanese to English:
{
"system_prompt_template": "You are a professional Japanese (ja) to English (en) translator. Your goal is to accurately convey the meaning and nuances of the original Japanese text while adhering to English grammar, vocabulary, and cultural sensitivities.\n\n",
"user_prompt_template": "Produce only the English translation, without any additional explanations or commentary. Please translate the following Japanese text into English:\n\n\n%s<end_of_turn>\n"
}
TranslateGemma supports translation between 100+ languages. Use ISO 639-1 language codes (e.g., en, zh, ja, ko, fr, de, es, etc.).
Translation Examples
| English Input | Chinese Output |
|---|---|
| The weather is beautiful today, let's go for a walk in the park. | 今天天气很好,我们一起去公园散步吧。 |
| Artificial intelligence is transforming how we work and live. | 人工智能正在改变我们工作和生活。 |
| The quick brown fox jumps over the lazy dog near the riverbank at sunset. | 那只敏捷的狐狸跳过河岸边,在日落时。 |
Conversion Process
This model was converted using a hybrid approach because MNN's standard llmexport.py tool produces a computation graph with insufficient Cast operations for TranslateGemma (1 Cast op vs. the required 137), causing segfaults at runtime.
Conversion Pipeline
TranslateGemma-4B (HuggingFace, Gemma3ForConditionalGeneration)
|
+-- Extract language_model (Gemma3TextModel) as standalone Gemma3ForCausalLM
|
+-- Convert to GGUF via llama.cpp (Q4_0 quantization)
| $ python convert_hf_to_gguf.py /path/to/translategemma-4b-text-only --outtype q4_0
|
+-- Use ModelScope's gemma-3-4b-it MNN model as graph donor
| (MNN/gemma-3-4b-it-q4_0-mnn provides a working llm.mnn with 137 Cast ops)
|
+-- Inject TranslateGemma weights into donor graph via gguf2mnn.py
| $ python gguf2mnn.py /path/to/translategemma.gguf /path/to/donor/llm.mnn ...
|
+-- Re-serialize with MNNConvert
$ MNNConvert --modelFile llm_new.mnn --framework MNN --MNNModel llm.mnn
Key Technical Notes
Graph structure reuse: The MNN computation graph (
llm.mnn) comes from ModelScope'sgemma-3-4b-it-q4_0-mnn. Since TranslateGemma-4B shares the same Gemma 3 architecture (34 layers, hidden_size=2560), the graph is fully compatible. Only the weight data differs.Weight injection via gguf2mnn.py: TranslateGemma weights (converted to GGUF Q4_0) are injected into the donor graph's weight file using MNN's
gguf2mnn.pytool.jinja chat_template incompatibility: TranslateGemma's original
chat_template(a 6783-character Jinja2 template with language code mappings) causes MNN's rapidjson parser to fail silently when placed inllm_config.json. The solution is to use MNN's simplersystem_prompt_template/user_prompt_templatefields instead.Tokenizer: Uses ModelScope's Gemma 3 tokenizer (vocab size 262144). TranslateGemma's original tokenizer has 262208 tokens (64 extra), but no translation quality issues have been observed with the smaller tokenizer.
Known Limitations
Simplified Chinese output for zh-TW: TranslateGemma maps all Chinese variants (
zh,zh-TW,zh-Hans,zh-Hant) to"Chinese"internally, so output tends to be Simplified Chinese regardless of the language code specified. For Traditional Chinese, consider post-processing with OpenCC.Fixed translation direction: The prompt template in
llm_config.jsonis hardcoded. To support dynamic language switching at runtime, the application must modify the prompt template programmatically via the MNN API.Single-turn only: Each translation request should be a fresh single-turn conversation. Multi-turn context may cause degraded output quality.
License
This model inherits the license from the original TranslateGemma-4B model. Please refer to Google's model card for licensing details.
Acknowledgments
- Google for the TranslateGemma-4B model
- Alibaba MNN for the Mobile Neural Network framework
- ModelScope for the reference Gemma 3 MNN model
- llama.cpp for GGUF conversion tools
- Downloads last month
- 19