chart-vision-qwen
Qwen2-VL-2B-Instruct fine-tuned on ChartQA using LoRA adapters for chart question answering.
Team: Langrangers (PES University)
- Aaron Thomas Mathew --- PES1UG23AM005\
- Aman Kumar Mishra --- PES1UG23AM040\
- Preetham VJ --- PES1UG23AM913
GitHub Repository:
https://github.com/Aman-K-Mishra/orange-chartqa-slm
Model Description
This model performs chart question answering: given a chart image (bar chart, line chart, pie chart, etc.) and a natural language question, it predicts the answer based on visual reasoning over the chart.
The model is a LoRA fine‑tuned adapter built on top of:
**Base model:**
Qwen/Qwen2-VL-2B-Instruct
Training used the ChartQA dataset containing chart images paired with question--answer pairs.
Property Value
Base model Qwen2-VL-2B-Instruct Fine-tuning method LoRA (PEFT) Dataset HuggingFaceM4/ChartQA Training samples 28,299 Trainable parameters 4.36M (≈0.20% of 2.21B) Hardware Tesla T4 (15.6 GB VRAM) Epochs 1
Training Details
LoRA Configuration
Parameter Value Reason
Rank (r) 16 Rank 8 insufficient for
chart reasoning; rank
32 caused OOM
Alpha 32 Standard
alpha = 2 × rank
heuristic
Dropout 0.05 Light regularisation
Target modules q_proj, k_proj, Core attention
v_proj, o_proj projections
Training Hyperparameters
Parameter Value Reason
Batch size 1 Avoids OOM on T4
Gradient accumulation 16 Effective batch size = 16
Learning rate 2e‑4 Typical for LoRA
Max sequence length 768 Balance between context and memory
Quantization 8‑bit (BitsAndBytes) Reduces VRAM usage
Image resolution 256--512 patches Matches Qwen2‑VL patch (28×28) size
LR scheduler Cosine annealing Smooth LR decay
Adapter Location
The recommended adapter checkpoint is located at:
lora_adapters/best
This directory contains:
adapter_config.json
adapter_model.safetensors
Installation
pip install transformers peft bitsandbytes accelerate datasets pillow
Load Model and Run Inference
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image
import torch
BASE_MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"
ADAPTER_REPO = "preethamvj/chart-vision-qwen"
ADAPTER_PATH = "lora_adapters/best"
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model = Qwen2VLForConditionalGeneration.from_pretrained(
BASE_MODEL_ID,
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(
model,
ADAPTER_REPO,
subfolder=ADAPTER_PATH
)
model = model.merge_and_unload()
processor = AutoProcessor.from_pretrained(
BASE_MODEL_ID,
min_pixels=256 * 28 * 28,
max_pixels=512 * 28 * 28
)
image = Image.open("your_chart.png").convert("RGB")
question = "What is the highest value in the chart?"
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": question}
]
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=64)
answer = processor.decode(output[0], skip_special_tokens=True)
print(answer.split("assistant")[-1].strip())
Intended Use
This model is intended for chart question answering tasks, including:
- Reading chart values
- Comparing bars or segments
- Identifying trends
- Extracting numeric information
It is not designed for general visual question answering outside the chart domain.
Limitations
- Trained for only 1 epoch due to compute limitations
- Training loss shows high variance across steps
- Performance may degrade on chart types not well represented in ChartQA
- Complex infographics may still challenge the model
Citation
If you use this model in research or projects, please cite:
@misc{chartvisionqwen2026,
title={chart-vision-qwen: LoRA Fine-tuned Qwen2-VL for Chart Question Answering},
author={Langrangers Team},
year={2026},
howpublished={HuggingFace Model Hub}
}