Multimodal-Social-Response

This model is a fine-tuned version of google/paligemma2-3b-pt-224 on hamzasibous/reddit-dataset .

  • Developed by: Hamza Sibous
  • Model type: VLM
  • Base Model: PaliGemma 2 (3B parameters)
  • Fine-tuning Method: LoRA (11M parameters)

Intended Use

This model is intended for research into social sentiment and multimodal conversational AI.

  • Input: Image + Text Caption (Task Prefix)
  • Output: Predicted Comment

Training Data

The model was trained on a custom-curated dataset of 12,900 Reddit entries collected via PRAW. Each entry consists of:

  1. Image: The original post image.
  2. Caption: The title of the Reddit post.
  3. Label (Comment): The top-voted human comment.

Training Details

  • Hardware: Kaggle GPU (T4 x2)
  • Epochs: 1
  • Learning Rate: 2e-5
  • Optimizer: paged_adamw_8bit

Limitations & Bias

  • Sentiment Bias: The model often gravitates toward positive, high-frequency Reddit phrases (e.g., "I love your smile") due to the distribution of the training data.
  • Hallucinations: As a Reddit-trained model, it may prioritize "vibe" over factual accuracy.
  • Safety: It reflects the language of Reddit. While cleaned, it may inherit the casual or sarcastic tone of the platform.

Results

Image Caption / Prompt Base Model Fine-Tuned Model
1ihskoi CAT 950 - 100 year anniversary special edition color.... ### $ ## , ###.## front loader I love this.
1qwa8ui The ultimate 'indoor-outdoor' experience My grandfather was a WWII cartographer on the Gothic Line. ... balcony I'm so jealous of this place. I've always wanted to build a cabin like this.
1qn3uhg Made an probiotic seed treatment that makes plants grow 2x faster - early results look promising Bee... we are working on a new product that will be available in the spring that will be a plant growth enhancer that will be used in soil and hydroponics . I'm not sure if it's a good idea to buy a lot of these. I'd be worried that it's not working.
1i606wx Elon musk doing a nazi salute at the whitehouse. Unreal... text : Elon Musk doing a nazi salute at the whitehouse. Unreal
lx2ur4 Kowloon walled city this former military base turned into china s tightest city, it was demolished i... photo credit : courtesy of the artist Kowloon walled city this former military base turned into china s tightest city, it was demolished in 1994
1r3a0jv Mini bulletin board Working on a mini office and made this bulletin board from cork material and cof... mini office supplies I love this idea. I've been trying to make a miniature office for a while now. I've been trying to make a miniature office for a while now. I've been trying to make a miniature office for a while now.
1k905vo Santa Cruz Island... < <img src="https://www.flickr.com/photos/1000000000000000000/in/album/156551164111
1rnr32s Brown Butter Triple Chocolate Chip Cookies Baked these tonight for a birthday party. Doubled the bat... cookies I love this recipe. I made it for a friend's birthday party and it was a huge hit. I love the chocolate chunks and the brown butter.
1rglmm2 HFF... < HFF
1rgbe51 Popocatepetl Volcano view, Puebla, Mexico... <img src="https://www.flickr.com/photos/1000000000000000000/in/album/115655154511

Usage

!pip install -q -U datasets bitsandbytes peft git+https://github.com/huggingface/transformers.git
import torch
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image
import requests

# 1. Setup IDs
base_model_id = "google/paligemma2-3b-pt-448"
adapter_id = "hamzasibous/paligemma_vqav2"


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_skip_modules=["vision_tower", "multi_modal_projector", "lm_head"] 
)

device = "cuda"
model = PaliGemmaForConditionalGeneration.from_pretrained(base_model_id, 
                                                          torch_dtype=torch.bfloat16, 
                                                          quantization_config=bnb_config,
                                                          low_cpu_mem_usage=True,
                                                          device_map="auto")
device = next(model.parameters()).device
model.to(device)
processor = PaliGemmaProcessor.from_pretrained(base_model_id)

# Load  LoRA Adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval() # Set to evaluation mode

image_url = "https://preview.redd.it/update-watermelon-sold-v0-snbn9oxn5hng1.jpg?width=1080&crop=smart&auto=webp&s=dd52d355fb847a27a0b4a02e06eee6bc7ea7f55e"
image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")


caption = "🍉Update: Watermelon Sold A few days ago I asked for melon vendors.\nThe community responded and now the melons are sold.\nThanks you to everyone who responded. And a special thank you to the buyer."
prompt = "<img>caption text\n" +  caption + "\n"

# Generate
inputs = processor(text=prompt, images=image, return_tensors="pt")
inputs = {k: v.to(device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}

# Disable gradient tracking for inference
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=50)

print("Caption:" + caption+" Model Output:", processor.decode(output[0], skip_special_tokens=True)[len(prompt):].strip())
Downloads last month
73
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hamzasibous/gemma-msr

Adapter
(104)
this model

Dataset used to train hamzasibous/gemma-msr