Qwen2-VL-2B Fine-tuned on ChartQA
Fine-tuned version of Qwen2-VL-2B-Instruct on the ChartQA dataset for visual question answering on charts.
Model Details
- Base model: Qwen/Qwen2-VL-2B-Instruct
- Fine-tuning: LoRA (r=16, alpha=32, target: q/k/v/o projections)
- Dataset: HuggingFaceM4/ChartQA (2000 train, 200 val samples)
- Training: 1 epoch, lr=2e-4, T4 GPU on Kaggle
- Results: Train Loss: 0.5040 | Val Loss: 0.6956
How to Use
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
import torch
model_id = "Devildarker6789/qwen2vl-chartqa"
processor = AutoProcessor.from_pretrained(model_id, min_pixels=256*28*28, max_pixels=512*28*28)
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
model.eval()
image = Image.open("your_chart.png").convert("RGB")
question = "What is the highest value in this chart?"
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": question}
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=64, do_sample=False)
print(processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training Details
- Quantization: 8-bit during training
- LoRA rank: 16, alpha: 32, dropout: 0.05
- Optimizer: AdamW, lr=2e-4, cosine scheduler
- Batch size: 1 × 16 grad accumulation = 16 effective
- Hardware: T4 GPU (Kaggle)
Then click **Save** — and your HF link is:
- Downloads last month
- 34
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support