NLP Orange AttentionSeekers (PES1UG23AM016, PES1UG23AM053,PES1UG23AM059,PES1UG23AM222)

SmolVLM2-2.2B Fine-tuned on ChartQA (LoRA)

This is a LoRA adapter for SmolVLM2-2.2B-Instruct, fine-tuned on the ChartQA dataset to answer questions about charts and graphs.

GitHub repository: NLP_Orange_Problem_AttentionSeekers

Model Details

Component	Details
Base model	`HuggingFaceTB/SmolVLM2-2.2B-Instruct`
Fine-tuning method	LoRA (r=16, alpha=32)
Dataset	ChartQA (1000 training samples)
Training hardware	Kaggle 2x T4 (32 GB VRAM)
Final training loss	1.855
Epochs	2

How to Use

Installation

pip install transformers peft accelerate torch pillow

Load Adapters and Run Inference

import torch
from PIL import Image
from peft import PeftModel
from transformers import AutoProcessor, AutoModelForImageTextToText

MODEL_PATH   = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
ADAPTER_PATH = "pes1ug23am016/smolvlm2-chartqa-lora"

# 1. Load base model
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="eager"
)
processor = AutoProcessor.from_pretrained(ADAPTER_PATH)

# 2. Load and merge LoRA adapters
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
model = model.merge_and_unload()
model.eval()

# 3. Run inference
def predict(image, question):
    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": question}
        ]}
    ]
    prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=prompt, images=[[image]], return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=64, do_sample=False)
    new_tokens = out[0][inputs["input_ids"].shape[1]:]
    return processor.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()

# Example
image = Image.open("your_chart.png")
answer = predict(image, "What is the highest value in the chart?")
print(answer)

Training Details

LoRA Config

r = 16
lora_alpha = 32
lora_dropout = 0.05
target_modules: q_proj, k_proj, v_proj, o_proj

Training Arguments

Batch size: 1 (effective 16 with gradient accumulation)
Learning rate: 2e-4
Epochs: 2
fp16: True
Optimizer: AdamW

Limitations

Fine-tuned on only 1000 samples — performance on complex or unseen chart types may be limited
Best suited for the types of charts present in ChartQA (bar, line, pie charts)

Downloads last month: 3

Model tree for NLP-Orange-Problem/AttentionSeekers

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Quantized

HuggingFaceTB/SmolVLM-Instruct

Finetuned

HuggingFaceTB/SmolVLM2-2.2B-Instruct

Adapter

(8)

this model

NLP-Orange-Problem
/

AttentionSeekers

NLP Orange AttentionSeekers (PES1UG23AM016, PES1UG23AM053,PES1UG23AM059,PES1UG23AM222)

SmolVLM2-2.2B Fine-tuned on ChartQA (LoRA)

Model Details

How to Use

Installation

Load Adapters and Run Inference

Training Details

LoRA Config

Training Arguments

Limitations

Model tree for NLP-Orange-Problem/AttentionSeekers

Dataset used to train NLP-Orange-Problem/AttentionSeekers