Quantized TranslateGemma-4B-IT Model
This repository provides quantized GGUF versions of the TranslateGemma-4B-IT model, an instruction-tuned multilingual translation model designed for high-quality language translation and cross-lingual understanding. Built on the Gemma architecture, the model balances strong translation accuracy with efficient inference at a moderate parameter scale.
Model Overview
- Model Name: TranslateGemma-4B-IT
- Parameter Count: 4 Billion
- Architecture: Decoder-only transformer
- Base Model: Gemma-4B
- Training Type: Instruction-tuned (IT)
- Modalities: Text only
- Developer: Google
- Supported Languages: Multiple (high-resource and selected low-resource languages)
Instruction Tuning Details
TranslateGemma-4B-IT has been fine-tuned using instruction-style prompts that emphasize translation tasks. This enables the model to:
- Follow explicit translation instructions (e.g., tone, formality, domain)
- Handle multi-sentence and paragraph-level translations
- Perform language-to-language translation without requiring task-specific formatting
The instruction-tuning process improves controllability and consistency compared to base multilingual language models.
Key Features
- High-quality multilingual translation across diverse language pairs
- Instruction-following behavior for translation-related prompts
- Strong contextual awareness for idioms and longer passages
- Suitable for both interactive and batch translation workflows
- Optimized for research and production prototyping
Quantization Details
Q4_K_M Version
- Approximately 73% reduction in model size compared to full-precision weights
- Significantly reduced memory usage (~2.32 GB)
- Well-suited for CPU inference, edge deployments, and low-VRAM GPUs
- Minor degradation may appear in nuanced translation quality or long-form generation
Q5_K_M Version
- Approximately 69% reduction in size relative to full precision
- Improved output fidelity with a moderate memory footprint (~2.64 GB)
- Stronger preservation of translation accuracy and linguistic coherence
- Recommended when output quality is preferred over maximum compression
Usage
This model is intended for developers and researchers working on multilingual applications, translation systems, or cross-lingual NLP tasks.
Example (text-only inference):
llama.cpp (text-only)
./llama-cli -hf SandLogicTechnologies/TranslateGemma-4B-IT-GGUF -p "Translate the following text from English to French: The future of AI depends on responsible innovation."
The model will generate a fluent translation while respecting the instruction context.
Dataset Overview
TranslateGemma-4B-IT is trained on a mixture of:
- Curated multilingual parallel corpora
- Instruction-based translation datasets
- High-quality web and licensed text data covering multiple domains
The data mixture emphasizes translation accuracy, linguistic diversity, and robustness across domains such as news, technical writing, and conversational text.
Recommended Use Cases
This model is optimized for multilingual and translation-centric workflows, including:
Machine translation systems Build translation pipelines for applications, services, or internal tools.
Cross-lingual assistants Enable chatbots or agents to respond in multiple languages.
Localization workflows Support document, UI, or content localization across regions.
Research and evaluation Study instruction-following behavior and multilingual generalization.
Acknowledgments
These quantized models are based on the original work by google development team.
Special thanks to:
The google team for developing and releasing the translategemma-4b-it model.
Georgi Gerganov and the entire
llama.cppopen-source community for enabling efficient model quantization and inference via the GGUF format.
Contact
For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.
- Downloads last month
- 40
4-bit
5-bit
Model tree for SandLogicTechnologies/translategemma-4b-it-GGUF
Base model
google/translategemma-4b-it