YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

TranslateGemma-4B MNN Q4_0

This is a Q4_0 quantized MNN conversion of Google's TranslateGemma-4B model for on-device translation inference.

Model Details

Base Model: google/translategemma-4b
Architecture: Gemma 3 Text (gemma3_text), 34 layers, hidden_size=2560
Quantization: Q4_0 (4-bit)
Framework: MNN (Mobile Neural Network) by Alibaba
Weight Size: ~3.28 GB
Total Size: ~3.3 GB (including graph + tokenizer)

Files

File	Size	Description
`config.json`	214 B	MNN runtime configuration
`llm_config.json`	~1 KB	Model architecture parameters and prompt templates
`llm.mnn`	3.4 MB	MNN computation graph (FlatBuffers format)
`llm.mnn.weight`	3.28 GB	Q4_0 quantized weights
`tokenizer.txt`	6.1 MB	SentencePiece tokenizer (vocab size: 262144)

Usage

With MNN llm_demo (CLI)

# Build MNN from source (requires CMake)
git clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build && cd build
cmake .. -DMNN_LOW_MEMORY=ON -DMNN_BUILD_LLM=ON -DMNN_SUPPORT_TRANSFORMER_FUSE=ON
make -j$(nproc)

# Run translation (interactive mode)
echo "The weather is beautiful today." | ./llm_demo /path/to/config.json

With MNN C++ API (for app integration)

#include "llm.hpp"

// Create and load the model
std::unique_ptr<Llm> llm(Llm::createLLM("/path/to/config.json"));
llm->load();

// Translate (the prompt template is built into llm_config.json)
auto response = llm->response("Hello, how are you?");
// Output: Chinese translation

Customizing Translation Direction

The default llm_config.json is configured for English to Chinese translation. To change the translation direction, modify the system_prompt_template and user_prompt_template fields in llm_config.json.

For example, to translate from Japanese to English:

{
    "system_prompt_template": "You are a professional Japanese (ja) to English (en) translator. Your goal is to accurately convey the meaning and nuances of the original Japanese text while adhering to English grammar, vocabulary, and cultural sensitivities.\n\n",
    "user_prompt_template": "Produce only the English translation, without any additional explanations or commentary. Please translate the following Japanese text into English:\n\n\n%s<end_of_turn>\n"
}

TranslateGemma supports translation between 100+ languages. Use ISO 639-1 language codes (e.g., en, zh, ja, ko, fr, de, es, etc.).

Translation Examples

English Input	Chinese Output
The weather is beautiful today, let's go for a walk in the park.	今天天气很好，我们一起去公园散步吧。
Artificial intelligence is transforming how we work and live.	人工智能正在改变我们工作和生活。
The quick brown fox jumps over the lazy dog near the riverbank at sunset.	那只敏捷的狐狸跳过河岸边，在日落时。

Conversion Process

This model was converted using a hybrid approach because MNN's standard llmexport.py tool produces a computation graph with insufficient Cast operations for TranslateGemma (1 Cast op vs. the required 137), causing segfaults at runtime.

Conversion Pipeline

TranslateGemma-4B (HuggingFace, Gemma3ForConditionalGeneration)
  |
  +-- Extract language_model (Gemma3TextModel) as standalone Gemma3ForCausalLM
  |
  +-- Convert to GGUF via llama.cpp (Q4_0 quantization)
  |     $ python convert_hf_to_gguf.py /path/to/translategemma-4b-text-only --outtype q4_0
  |
  +-- Use ModelScope's gemma-3-4b-it MNN model as graph donor
  |     (MNN/gemma-3-4b-it-q4_0-mnn provides a working llm.mnn with 137 Cast ops)
  |
  +-- Inject TranslateGemma weights into donor graph via gguf2mnn.py
  |     $ python gguf2mnn.py /path/to/translategemma.gguf /path/to/donor/llm.mnn ...
  |
  +-- Re-serialize with MNNConvert
        $ MNNConvert --modelFile llm_new.mnn --framework MNN --MNNModel llm.mnn

Key Technical Notes

Graph structure reuse: The MNN computation graph (llm.mnn) comes from ModelScope's gemma-3-4b-it-q4_0-mnn. Since TranslateGemma-4B shares the same Gemma 3 architecture (34 layers, hidden_size=2560), the graph is fully compatible. Only the weight data differs.
Weight injection via gguf2mnn.py: TranslateGemma weights (converted to GGUF Q4_0) are injected into the donor graph's weight file using MNN's gguf2mnn.py tool.
jinja chat_template incompatibility: TranslateGemma's original chat_template (a 6783-character Jinja2 template with language code mappings) causes MNN's rapidjson parser to fail silently when placed in llm_config.json. The solution is to use MNN's simpler system_prompt_template / user_prompt_template fields instead.
Tokenizer: Uses ModelScope's Gemma 3 tokenizer (vocab size 262144). TranslateGemma's original tokenizer has 262208 tokens (64 extra), but no translation quality issues have been observed with the smaller tokenizer.

Known Limitations

Simplified Chinese output for zh-TW: TranslateGemma maps all Chinese variants (zh, zh-TW, zh-Hans, zh-Hant) to "Chinese" internally, so output tends to be Simplified Chinese regardless of the language code specified. For Traditional Chinese, consider post-processing with OpenCC.
Fixed translation direction: The prompt template in llm_config.json is hardcoded. To support dynamic language switching at runtime, the application must modify the prompt template programmatically via the MNN API.
Single-turn only: Each translation request should be a fresh single-turn conversation. Multi-turn context may cause degraded output quality.

License

This model inherits the license from the original TranslateGemma-4B model. Please refer to Google's model card for licensing details.

Acknowledgments

Google for the TranslateGemma-4B model
Alibaba MNN for the Mobile Neural Network framework
ModelScope for the reference Gemma 3 MNN model
llama.cpp for GGUF conversion tools

Downloads last month: 19

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support