GemmaTranslate-v3-12B
GemmaTranslate-v3-12B is a high-performance, multimodal hybrid model designed for advanced translation and general-purpose assistance. It is a SLERP merge of two state-of-the-art models: Gemma-3-12B-IT and TranslateGemma-12B-IT.
By combining the general reasoning and multimodal capabilities of Gemma-3 with the specialized translation expertise of TranslateGemma, this model offers superior performance in multilingual contexts while maintaining the ability to process visual information.
Model Details
- Architecture: Gemma-3 (Multimodal, 128K context length)
- Merge Method: SLERP (Spherical Linear Interpolation)
- Base Models:
- Language(s): Multilingual (Supports 30+ languages)
- License: Gemma Terms of Use
Key Features
- Multimodal Capabilities: Inherits the vision tower and multimodal projector from Gemma-3-12B-IT, enabling image understanding and visual reasoning.
- Enhanced Translation: Optimized for translation tasks by merging with TranslateGemma-12B-IT.
- Large Context Window: Supports up to 128K tokens.
- Hybrid Reasoning: Combines instruction-following capabilities with high-fidelity translation.
Merge Configuration
The following YAML configuration was used to produce this model:
base_model: google/gemma-3-12b-it
dtype: bfloat16
merge_method: slerp
parameters:
t:
- filter: vision_tower
value: 0
- filter: multi_modal_projector
value: 0
- value: 0.5
models:
- model: google/gemma-3-12b-it
- model: google/translategemma-12b-it
Note: The vision_tower and multi_modal_projector weights are taken 100% from Gemma-3-12B-IT (t=0) to ensure visual stability, while the language model weights are a 50/50 SLERP merge (t=0.5).
Usage
You can use this model with the Hugging Face transformers library.
from transformers import Gemma3ForConditionalGeneration, AutoProcessor
import torch
model_id = "SpongeBOB9684/GemmaTranslate-v3-12B"
model = Gemma3ForConditionalGeneration.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
processor = AutoProcessor.from_pretrained(model_id)
# Example: Multimodal Translation
# (Add your image and prompt here)
prompt = "Translate the text in this image to French and explain its context."
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
# Example: Text-only Translation
prompt = "Translate the following English text to Japanese: 'The future of AI is multimodal.'"
inputs = processor(text=prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))
License
This model is subject to the Gemma Terms of Use. By using this model, you agree to the terms and conditions specified by Google.
- Downloads last month
- 24