Agent-ChartQA Qwen2.5-VL-3B RL Checkpoint
This repository stores a model-only checkpoint from an Agent-ChartQA reproduction run.
Contents
model_world_size_1_rank_0.pt: model-only PyTorch state dict saved from a single-GPU FSDP/NO_SHARD run.extra_state_world_size_1_rank_0.pt: lightweight training state.huggingface/: tokenizer, processor, config, and generation config copied from the training runtime.
This is not a standard save_pretrained() Hugging Face model directory. It is intended as a reproducibility checkpoint for the accompanying Agent-ChartQA training code.
Training Setup
- Base model: Qwen2.5-VL-3B-Instruct.
- Dataset: ChartQA-style multimodal chart question answering data.
- Frameworks: veRL, vLLM, PyTorch/FSDP.
- Algorithm: GRPO-style reinforcement learning with tool-use reward.
- Hardware: single RTX 4090 48GB.
Reproduction Metrics
The final 20-step reproduction run reached:
reward/overall: mean 0.9204, last-10-step mean 0.9371.reward/accuracy: mean 0.8553.reward/tool: 1.0 throughout the run.reward/format: 1.0 throughout the run.response_length/clip_ratio: 0.0 throughout the run.
These are internal training/reward metrics from the reproduction run, not official ChartQA test-set accuracy.
Notes
The checkpoint was saved in model-only mode to avoid uploading full optimizer state. The full optimizer checkpoint would be substantially larger than the model weights.
Model tree for AuroraVvvv/qwen2.5_instruct_3b_rl
Base model
Qwen/Qwen2.5-VL-3B-Instruct