Model Card for CALISTA-INDUSTRY/gemma_3_1B_reasoning_multimodal_en_ft_v2

Model Details

Developed by: Mohammad Yani & Rizky Sulaeman, Politeknik Negeri Indramayu
Model type: Fine-tuned multimodal large language model
Language(s): English
License: Apache 2.0
Finetuned from: gemma-3b

Model Description

gemma_3_1B_reasoning_multimodal_en_ft_v2 is a fine-tuned version of the Gemma3 model, enhanced for multimodal reasoning tasks. It integrates both visual and textual inputs to perform complex reasoning, making it suitable for applications that require understanding and interpreting combined modalities.

Intended Uses & Limitations

Intended Uses

Visual Question Answering (VQA)
Image Captioning
Multimodal Dialogue Systems
Instruction Following with Visual Inputs

Limitations

Performance may degrade on non-English inputs.
May not generalize well to domains significantly different from the training data.
Not suitable for real-time applications without further optimization.

How to Use

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="CALISTA-INDUSTRY/gemma_3_4B_reasoning_multimodal_en_ft_v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

Downloads last month: 24

Safetensors

Model size

1.0B params

Tensor type

BF16

F16

Model tree for CALISTA-INDUSTRY/gemma_3_1B_reasoning_en_ft_v1

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Quantized

unsloth/gemma-3-1b-it-unsloth-bnb-4bit

Quantized

(41)

this model

CALISTA-INDUSTRY
/

gemma_3_1B_reasoning_en_ft_v1