Model Card for CALISTA-INDUSTRY/gemma_3_1B_reasoning_multimodal_en_ft_v2
Model Details
- Developed by: Mohammad Yani & Rizky Sulaeman, Politeknik Negeri Indramayu
- Model type: Fine-tuned multimodal large language model
- Language(s): English
- License: Apache 2.0
- Finetuned from: gemma-3b
Model Description
gemma_3_1B_reasoning_multimodal_en_ft_v2 is a fine-tuned version of the Gemma3 model, enhanced for multimodal reasoning tasks. It integrates both visual and textual inputs to perform complex reasoning, making it suitable for applications that require understanding and interpreting combined modalities.
Intended Uses & Limitations
Intended Uses
- Visual Question Answering (VQA)
- Image Captioning
- Multimodal Dialogue Systems
- Instruction Following with Visual Inputs
Limitations
- Performance may degrade on non-English inputs.
- May not generalize well to domains significantly different from the training data.
- Not suitable for real-time applications without further optimization.
How to Use
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="CALISTA-INDUSTRY/gemma_3_4B_reasoning_multimodal_en_ft_v2")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages)
- Downloads last month
- 24
Model tree for CALISTA-INDUSTRY/gemma_3_1B_reasoning_en_ft_v1
Base model
google/gemma-3-1b-pt Finetuned
google/gemma-3-1b-it Quantized
unsloth/gemma-3-1b-it-unsloth-bnb-4bit