Model Card for CALISTA-INDUSTRY/gemma_3_1B_reasoning_multimodal_en_ft_v2

Model Details

  • Developed by: Mohammad Yani & Rizky Sulaeman, Politeknik Negeri Indramayu
  • Model type: Fine-tuned multimodal large language model
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from: gemma-3b

Model Description

gemma_3_1B_reasoning_multimodal_en_ft_v2 is a fine-tuned version of the Gemma3 model, enhanced for multimodal reasoning tasks. It integrates both visual and textual inputs to perform complex reasoning, making it suitable for applications that require understanding and interpreting combined modalities.

Intended Uses & Limitations

Intended Uses

  • Visual Question Answering (VQA)
  • Image Captioning
  • Multimodal Dialogue Systems
  • Instruction Following with Visual Inputs

Limitations

  • Performance may degrade on non-English inputs.
  • May not generalize well to domains significantly different from the training data.
  • Not suitable for real-time applications without further optimization.

How to Use

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="CALISTA-INDUSTRY/gemma_3_4B_reasoning_multimodal_en_ft_v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
Downloads last month
24
Safetensors
Model size
1.0B params
Tensor type
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CALISTA-INDUSTRY/gemma_3_1B_reasoning_en_ft_v1

Quantized
(41)
this model

Dataset used to train CALISTA-INDUSTRY/gemma_3_1B_reasoning_en_ft_v1