Multimodal-Social-Response
This model is a fine-tuned version of google/paligemma2-3b-pt-224 on hamzasibous/reddit-dataset .
- Developed by: Hamza Sibous
- Model type: VLM
- Base Model: PaliGemma 2 (3B parameters)
- Fine-tuning Method: LoRA (11M parameters)
Intended Use
This model is intended for research into social sentiment and multimodal conversational AI.
- Input: Image + Text Caption (Task Prefix)
- Output: Predicted Comment
Training Data
The model was trained on a custom-curated dataset of 12,900 Reddit entries collected via PRAW. Each entry consists of:
- Image: The original post image.
- Caption: The title of the Reddit post.
- Label (Comment): The top-voted human comment.
Training Details
- Hardware: Kaggle GPU (T4 x2)
- Epochs: 1
- Learning Rate: 2e-5
- Optimizer: paged_adamw_8bit
Limitations & Bias
- Sentiment Bias: The model often gravitates toward positive, high-frequency Reddit phrases (e.g., "I love your smile") due to the distribution of the training data.
- Hallucinations: As a Reddit-trained model, it may prioritize "vibe" over factual accuracy.
- Safety: It reflects the language of Reddit. While cleaned, it may inherit the casual or sarcastic tone of the platform.
Results
| Image | Caption / Prompt | Base Model | Fine-Tuned Model |
|---|---|---|---|
![]() |
CAT 950 - 100 year anniversary special edition color.... | ### $ ## , ###.## front loader | I love this. |
![]() |
The ultimate 'indoor-outdoor' experience My grandfather was a WWII cartographer on the Gothic Line. ... | balcony | I'm so jealous of this place. I've always wanted to build a cabin like this. |
![]() |
Made an probiotic seed treatment that makes plants grow 2x faster - early results look promising Bee... | we are working on a new product that will be available in the spring that will be a plant growth enhancer that will be used in soil and hydroponics . | I'm not sure if it's a good idea to buy a lot of these. I'd be worried that it's not working. |
![]() |
Elon musk doing a nazi salute at the whitehouse. Unreal... | text : | Elon Musk doing a nazi salute at the whitehouse. Unreal |
![]() |
Kowloon walled city this former military base turned into china s tightest city, it was demolished i... | photo credit : courtesy of the artist | Kowloon walled city this former military base turned into china s tightest city, it was demolished in 1994 |
![]() |
Mini bulletin board Working on a mini office and made this bulletin board from cork material and cof... | mini office supplies | I love this idea. I've been trying to make a miniature office for a while now. I've been trying to make a miniature office for a while now. I've been trying to make a miniature office for a while now. |
![]() |
Santa Cruz Island... | < | <img src="https://www.flickr.com/photos/1000000000000000000/in/album/156551164111 |
![]() |
Brown Butter Triple Chocolate Chip Cookies Baked these tonight for a birthday party. Doubled the bat... | cookies | I love this recipe. I made it for a friend's birthday party and it was a huge hit. I love the chocolate chunks and the brown butter. |
![]() |
HFF... | < | HFF |
![]() |
Popocatepetl Volcano view, Puebla, Mexico... | <img src="https://www.flickr.com/photos/1000000000000000000/in/album/115655154511 |
Usage
!pip install -q -U datasets bitsandbytes peft git+https://github.com/huggingface/transformers.git
import torch
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image
import requests
# 1. Setup IDs
base_model_id = "google/paligemma2-3b-pt-448"
adapter_id = "hamzasibous/paligemma_vqav2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
llm_int8_skip_modules=["vision_tower", "multi_modal_projector", "lm_head"]
)
device = "cuda"
model = PaliGemmaForConditionalGeneration.from_pretrained(base_model_id,
torch_dtype=torch.bfloat16,
quantization_config=bnb_config,
low_cpu_mem_usage=True,
device_map="auto")
device = next(model.parameters()).device
model.to(device)
processor = PaliGemmaProcessor.from_pretrained(base_model_id)
# Load LoRA Adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval() # Set to evaluation mode
image_url = "https://preview.redd.it/update-watermelon-sold-v0-snbn9oxn5hng1.jpg?width=1080&crop=smart&auto=webp&s=dd52d355fb847a27a0b4a02e06eee6bc7ea7f55e"
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
caption = "🍉Update: Watermelon Sold A few days ago I asked for melon vendors.\nThe community responded and now the melons are sold.\nThanks you to everyone who responded. And a special thank you to the buyer."
prompt = "<img>caption text\n" + caption + "\n"
# Generate
inputs = processor(text=prompt, images=image, return_tensors="pt")
inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
# Disable gradient tracking for inference
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=50)
print("Caption:" + caption+" Model Output:", processor.decode(output[0], skip_special_tokens=True)[len(prompt):].strip())
- Downloads last month
- 73
Model tree for hamzasibous/gemma-msr
Base model
google/paligemma2-3b-pt-224








