Multimodal-Social-Response

This model is a fine-tuned version of google/paligemma2-3b-pt-224 on hamzasibous/reddit-dataset .

Developed by: Hamza Sibous
Model type: VLM
Base Model: PaliGemma 2 (3B parameters)
Fine-tuning Method: LoRA (11M parameters)

Intended Use

This model is intended for research into social sentiment and multimodal conversational AI.

Input: Image + Text Caption (Task Prefix)
Output: Predicted Comment

Training Data

The model was trained on a custom-curated dataset of 12,900 Reddit entries collected via PRAW. Each entry consists of:

Image: The original post image.
Caption: The title of the Reddit post.
Label (Comment): The top-voted human comment.

Training Details

Hardware: Kaggle GPU (T4 x2)
Epochs: 1
Learning Rate: 2e-5
Optimizer: paged_adamw_8bit

Limitations & Bias

Sentiment Bias: The model often gravitates toward positive, high-frequency Reddit phrases (e.g., "I love your smile") due to the distribution of the training data.
Hallucinations: As a Reddit-trained model, it may prioritize "vibe" over factual accuracy.
Safety: It reflects the language of Reddit. While cleaned, it may inherit the casual or sarcastic tone of the platform.

Results

Caption / Prompt	Base Model	Fine-Tuned Model
CAT 950 - 100 year anniversary special edition color....	### $ ## , ###.## front loader	I love this.
The ultimate 'indoor-outdoor' experience My grandfather was a WWII cartographer on the Gothic Line. ...	balcony	I'm so jealous of this place. I've always wanted to build a cabin like this.
Made an probiotic seed treatment that makes plants grow 2x faster - early results look promising Bee...	we are working on a new product that will be available in the spring that will be a plant growth enhancer that will be used in soil and hydroponics .	I'm not sure if it's a good idea to buy a lot of these. I'd be worried that it's not working.
Elon musk doing a nazi salute at the whitehouse. Unreal...	text :	Elon Musk doing a nazi salute at the whitehouse. Unreal
Kowloon walled city this former military base turned into china s tightest city, it was demolished i...	photo credit : courtesy of the artist	Kowloon walled city this former military base turned into china s tightest city, it was demolished in 1994
Mini bulletin board Working on a mini office and made this bulletin board from cork material and cof...	mini office supplies	I love this idea. I've been trying to make a miniature office for a while now. I've been trying to make a miniature office for a while now. I've been trying to make a miniature office for a while now.
Santa Cruz Island...	<	<img src="https://www.flickr.com/photos/1000000000000000000/in/album/156551164111
Brown Butter Triple Chocolate Chip Cookies Baked these tonight for a birthday party. Doubled the bat...	cookies	I love this recipe. I made it for a friend's birthday party and it was a huge hit. I love the chocolate chunks and the brown butter.
HFF...	<	HFF
Popocatepetl Volcano view, Puebla, Mexico...		<img src="https://www.flickr.com/photos/1000000000000000000/in/album/115655154511

Usage

!pip install -q -U datasets bitsandbytes peft git+https://github.com/huggingface/transformers.git
import torch
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image
import requests

# 1. Setup IDs
base_model_id = "google/paligemma2-3b-pt-448"
adapter_id = "hamzasibous/paligemma_vqav2"


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_skip_modules=["vision_tower", "multi_modal_projector", "lm_head"] 
)

device = "cuda"
model = PaliGemmaForConditionalGeneration.from_pretrained(base_model_id, 
                                                          torch_dtype=torch.bfloat16, 
                                                          quantization_config=bnb_config,
                                                          low_cpu_mem_usage=True,
                                                          device_map="auto")
device = next(model.parameters()).device
model.to(device)
processor = PaliGemmaProcessor.from_pretrained(base_model_id)

# Load  LoRA Adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval() # Set to evaluation mode

image_url = "https://preview.redd.it/update-watermelon-sold-v0-snbn9oxn5hng1.jpg?width=1080&crop=smart&auto=webp&s=dd52d355fb847a27a0b4a02e06eee6bc7ea7f55e"
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")


caption = "🍉Update: Watermelon Sold A few days ago I asked for melon vendors.\nThe community responded and now the melons are sold.\nThanks you to everyone who responded. And a special thank you to the buyer."
prompt = "<img>caption text\n" +  caption + "\n"

# Generate
inputs = processor(text=prompt, images=image, return_tensors="pt")
inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}

# Disable gradient tracking for inference
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=50)

print("Caption:" + caption+" Model Output:", processor.decode(output[0], skip_special_tokens=True)[len(prompt):].strip())

Downloads last month: 73

Model tree for hamzasibous/gemma-msr

Base model

google/paligemma2-3b-pt-224

Adapter

(104)

this model

hamzasibous
/

gemma-msr