chart-vision-qwen

Qwen2-VL-2B-Instruct fine-tuned on ChartQA using LoRA adapters for chart question answering.

Team: Langrangers (PES University)

  • Aaron Thomas Mathew --- PES1UG23AM005\
  • Aman Kumar Mishra --- PES1UG23AM040\
  • Preetham VJ --- PES1UG23AM913

GitHub Repository:
https://github.com/Aman-K-Mishra/orange-chartqa-slm


Model Description

This model performs chart question answering: given a chart image (bar chart, line chart, pie chart, etc.) and a natural language question, it predicts the answer based on visual reasoning over the chart.

The model is a LoRA fine‑tuned adapter built on top of:

**Base model:**
Qwen/Qwen2-VL-2B-Instruct

Training used the ChartQA dataset containing chart images paired with question--answer pairs.

Property Value


Base model Qwen2-VL-2B-Instruct Fine-tuning method LoRA (PEFT) Dataset HuggingFaceM4/ChartQA Training samples 28,299 Trainable parameters 4.36M (≈0.20% of 2.21B) Hardware Tesla T4 (15.6 GB VRAM) Epochs 1


Training Details

LoRA Configuration


Parameter Value Reason


Rank (r) 16 Rank 8 insufficient for chart reasoning; rank 32 caused OOM

Alpha 32 Standard alpha = 2 × rank heuristic

Dropout 0.05 Light regularisation

Target modules q_proj, k_proj, Core attention v_proj, o_proj projections


Training Hyperparameters


Parameter Value Reason


Batch size 1 Avoids OOM on T4

Gradient accumulation 16 Effective batch size = 16

Learning rate 2e‑4 Typical for LoRA

Max sequence length 768 Balance between context and memory

Quantization 8‑bit (BitsAndBytes) Reduces VRAM usage

Image resolution 256--512 patches Matches Qwen2‑VL patch (28×28) size

LR scheduler Cosine annealing Smooth LR decay


Adapter Location

The recommended adapter checkpoint is located at:

lora_adapters/best

This directory contains:

adapter_config.json
adapter_model.safetensors

Installation

pip install transformers peft bitsandbytes accelerate datasets pillow

Load Model and Run Inference

from transformers import AutoProcessor, Qwen2VLForConditionalGeneration, BitsAndBytesConfig
from peft import PeftModel
from PIL import Image
import torch

BASE_MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"
ADAPTER_REPO = "preethamvj/chart-vision-qwen"
ADAPTER_PATH = "lora_adapters/best"

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model = Qwen2VLForConditionalGeneration.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.float16
)

model = PeftModel.from_pretrained(
    model,
    ADAPTER_REPO,
    subfolder=ADAPTER_PATH
)

model = model.merge_and_unload()

processor = AutoProcessor.from_pretrained(
    BASE_MODEL_ID,
    min_pixels=256 * 28 * 28,
    max_pixels=512 * 28 * 28
)

image = Image.open("your_chart.png").convert("RGB")
question = "What is the highest value in the chart?"

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": question}
    ]
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    text=[text],
    images=[image],
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=64)

answer = processor.decode(output[0], skip_special_tokens=True)
print(answer.split("assistant")[-1].strip())

Intended Use

This model is intended for chart question answering tasks, including:

  • Reading chart values
  • Comparing bars or segments
  • Identifying trends
  • Extracting numeric information

It is not designed for general visual question answering outside the chart domain.


Limitations

  • Trained for only 1 epoch due to compute limitations
  • Training loss shows high variance across steps
  • Performance may degrade on chart types not well represented in ChartQA
  • Complex infographics may still challenge the model

Citation

If you use this model in research or projects, please cite:

@misc{chartvisionqwen2026,
  title={chart-vision-qwen: LoRA Fine-tuned Qwen2-VL for Chart Question Answering},
  author={Langrangers Team},
  year={2026},
  howpublished={HuggingFace Model Hub}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaronmat1905/Qwen2VL-finetuned-chartqa

Base model

Qwen/Qwen2-VL-2B
Adapter
(134)
this model

Dataset used to train aaronmat1905/Qwen2VL-finetuned-chartqa