GemmaTranslate-v3-12B

GemmaTranslate-v3-12B is a high-performance, multimodal hybrid model designed for advanced translation and general-purpose assistance. It is a SLERP merge of two state-of-the-art models: Gemma-3-12B-IT and TranslateGemma-12B-IT.

By combining the general reasoning and multimodal capabilities of Gemma-3 with the specialized translation expertise of TranslateGemma, this model offers superior performance in multilingual contexts while maintaining the ability to process visual information.

Model Details

Key Features

  • Multimodal Capabilities: Inherits the vision tower and multimodal projector from Gemma-3-12B-IT, enabling image understanding and visual reasoning.
  • Enhanced Translation: Optimized for translation tasks by merging with TranslateGemma-12B-IT.
  • Large Context Window: Supports up to 128K tokens.
  • Hybrid Reasoning: Combines instruction-following capabilities with high-fidelity translation.

Merge Configuration

The following YAML configuration was used to produce this model:

base_model: google/gemma-3-12b-it
dtype: bfloat16
merge_method: slerp
parameters:
  t:
    - filter: vision_tower
      value: 0
    - filter: multi_modal_projector
      value: 0
    - value: 0.5
models:
  - model: google/gemma-3-12b-it
  - model: google/translategemma-12b-it

Note: The vision_tower and multi_modal_projector weights are taken 100% from Gemma-3-12B-IT (t=0) to ensure visual stability, while the language model weights are a 50/50 SLERP merge (t=0.5).

Usage

You can use this model with the Hugging Face transformers library.

from transformers import Gemma3ForConditionalGeneration, AutoProcessor
import torch

model_id = "SpongeBOB9684/GemmaTranslate-v3-12B"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
processor = AutoProcessor.from_pretrained(model_id)

# Example: Multimodal Translation
# (Add your image and prompt here)
prompt = "Translate the text in this image to French and explain its context."
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Example: Text-only Translation
prompt = "Translate the following English text to Japanese: 'The future of AI is multimodal.'"
inputs = processor(text=prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))

License

This model is subject to the Gemma Terms of Use. By using this model, you agree to the terms and conditions specified by Google.

Downloads last month
24
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SpongeBOB9684/GemmaTranslate-v3-12B

Merge model
this model
Quantizations
3 models