Agent-ChartQA Qwen2.5-VL-3B RL Checkpoint

This repository stores a model-only checkpoint from an Agent-ChartQA reproduction run.

model_world_size_1_rank_0.pt: model-only PyTorch state dict saved from a single-GPU FSDP/NO_SHARD run.
extra_state_world_size_1_rank_0.pt: lightweight training state.
huggingface/: tokenizer, processor, config, and generation config copied from the training runtime.

This is not a standard save_pretrained() Hugging Face model directory. It is intended as a reproducibility checkpoint for the accompanying Agent-ChartQA training code.

Training Setup

Base model: Qwen2.5-VL-3B-Instruct.
Dataset: ChartQA-style multimodal chart question answering data.
Frameworks: veRL, vLLM, PyTorch/FSDP.
Algorithm: GRPO-style reinforcement learning with tool-use reward.
Hardware: single RTX 4090 48GB.

Reproduction Metrics

The final 20-step reproduction run reached:

reward/overall: mean 0.9204, last-10-step mean 0.9371.
reward/accuracy: mean 0.8553.
reward/tool: 1.0 throughout the run.
reward/format: 1.0 throughout the run.
response_length/clip_ratio: 0.0 throughout the run.

These are internal training/reward metrics from the reproduction run, not official ChartQA test-set accuracy.

Notes

The checkpoint was saved in model-only mode to avoid uploading full optimizer state. The full optimizer checkpoint would be substantially larger than the model weights.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Model tree for AuroraVvvv/qwen2.5_instruct_3b_rl

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(789)

this model

AuroraVvvv
/

qwen2.5_instruct_3b_rl

Agent-ChartQA Qwen2.5-VL-3B RL Checkpoint

Contents

Training Setup

Reproduction Metrics

Notes

Model tree for AuroraVvvv/qwen2.5_instruct_3b_rl