GemmaTranslate-v3-12B

GemmaTranslate-v3-12B is a high-performance, multimodal hybrid model designed for advanced translation and general-purpose assistance. It is a SLERP merge of two state-of-the-art models: Gemma-3-12B-IT and TranslateGemma-12B-IT.

By combining the general reasoning and multimodal capabilities of Gemma-3 with the specialized translation expertise of TranslateGemma, this model offers superior performance in multilingual contexts while maintaining the ability to process visual information.

Model Details

Architecture: Gemma-3 (Multimodal, 128K context length)
Merge Method: SLERP (Spherical Linear Interpolation)
Base Models:
- google/gemma-3-12b-it
- google/translategemma-12b-it
Language(s): Multilingual (Supports 30+ languages)
License: Gemma Terms of Use

Key Features

Multimodal Capabilities: Inherits the vision tower and multimodal projector from Gemma-3-12B-IT, enabling image understanding and visual reasoning.
Enhanced Translation: Optimized for translation tasks by merging with TranslateGemma-12B-IT.
Large Context Window: Supports up to 128K tokens.
Hybrid Reasoning: Combines instruction-following capabilities with high-fidelity translation.

Merge Configuration

The following YAML configuration was used to produce this model:

base_model: google/gemma-3-12b-it
dtype: bfloat16
merge_method: slerp
parameters:
  t:
    - filter: vision_tower
      value: 0
    - filter: multi_modal_projector
      value: 0
    - value: 0.5
models:
  - model: google/gemma-3-12b-it
  - model: google/translategemma-12b-it

Note: The vision_tower and multi_modal_projector weights are taken 100% from Gemma-3-12B-IT (t=0) to ensure visual stability, while the language model weights are a 50/50 SLERP merge (t=0.5).

Usage

You can use this model with the Hugging Face transformers library.

from transformers import Gemma3ForConditionalGeneration, AutoProcessor
import torch

model_id = "SpongeBOB9684/GemmaTranslate-v3-12B"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)
processor = AutoProcessor.from_pretrained(model_id)

# Example: Multimodal Translation
# (Add your image and prompt here)
prompt = "Translate the text in this image to French and explain its context."
# inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)

# Example: Text-only Translation
prompt = "Translate the following English text to Japanese: 'The future of AI is multimodal.'"
inputs = processor(text=prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=True))

License

This model is subject to the Gemma Terms of Use. By using this model, you agree to the terms and conditions specified by Google.

Downloads last month: 4

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for SpongeBOB9684/GemmaTranslate-v3-12B

google/gemma-3-12b-it

google/translategemma-12b-it

Merge model

this model

Quantizations

3 models