GemmaTranslate-v3-12B

GemmaTranslate-v3-12B is a high-performance, multimodal hybrid model designed for advanced translation and general-purpose assistance. It is a SLERP merge of two state-of-the-art models: Gemma-3-12B-IT and TranslateGemma-12B-IT.

By combining the general reasoning and multimodal capabilities of Gemma-3 with the specialized translation expertise of TranslateGemma, this model offers superior performance in multilingual contexts while maintaining the ability to process visual information.

Model Details

Key Features

  • Multimodal Capabilities: Inherits the vision tower and multimodal projector from Gemma-3-12B-IT, enabling image understanding and visual reasoning.
  • Enhanced Translation: Optimized for translation tasks by merging with TranslateGemma-12B-IT.
  • Large Context Window: Supports up to 128K tokens.
  • Hybrid Reasoning: Combines instruction-following capabilities with high-fidelity translation.

Merge Configuration

The following YAML configuration was used to produce this model:

base_model: google/gemma-3-12b-it
dtype: bfloat16
merge_method: slerp
parameters:
  t:
    - filter: vision_tower
      value: 0
    - filter: multi_modal_projector
      value: 0
    - value: 0.5
models:
  - model: google/gemma-3-12b-it
  - model: google/translategemma-12b-it

Note: The vision_tower and multi_modal_projector weights are taken 100% from Gemma-3-12B-IT (t=0) to ensure visual stability, while the language model weights are a 50/50 SLERP merge (t=0.5).

Usage

You can use this model with the Hugging Face transformers library.

from transformers import Gemma3ForConditionalGeneration, AutoProcessor
import torch

model_id = "SpongeBOB9684/GemmaTranslate-v3-12B"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
processor = AutoProcessor.from_pretrained(model_id)

# Example: Multimodal Translation
# (Add your image and prompt here)
prompt = "Translate the text in this image to French and explain its context."
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Example: Text-only Translation
prompt = "Translate the following English text to Japanese: 'The future of AI is multimodal.'"
inputs = processor(text=prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))

License

This model is subject to the Gemma Terms of Use. By using this model, you agree to the terms and conditions specified by Google.

Downloads last month
4
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SpongeBOB9684/GemmaTranslate-v3-12B

Merge model
this model
Quantizations
3 models