Agent-ChartQA Qwen2.5-VL-3B RL Checkpoint

This repository stores a model-only checkpoint from an Agent-ChartQA reproduction run.

Contents

  • model_world_size_1_rank_0.pt: model-only PyTorch state dict saved from a single-GPU FSDP/NO_SHARD run.
  • extra_state_world_size_1_rank_0.pt: lightweight training state.
  • huggingface/: tokenizer, processor, config, and generation config copied from the training runtime.

This is not a standard save_pretrained() Hugging Face model directory. It is intended as a reproducibility checkpoint for the accompanying Agent-ChartQA training code.

Training Setup

  • Base model: Qwen2.5-VL-3B-Instruct.
  • Dataset: ChartQA-style multimodal chart question answering data.
  • Frameworks: veRL, vLLM, PyTorch/FSDP.
  • Algorithm: GRPO-style reinforcement learning with tool-use reward.
  • Hardware: single RTX 4090 48GB.

Reproduction Metrics

The final 20-step reproduction run reached:

  • reward/overall: mean 0.9204, last-10-step mean 0.9371.
  • reward/accuracy: mean 0.8553.
  • reward/tool: 1.0 throughout the run.
  • reward/format: 1.0 throughout the run.
  • response_length/clip_ratio: 0.0 throughout the run.

These are internal training/reward metrics from the reproduction run, not official ChartQA test-set accuracy.

Notes

The checkpoint was saved in model-only mode to avoid uploading full optimizer state. The full optimizer checkpoint would be substantially larger than the model weights.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for AuroraVvvv/qwen2.5_instruct_3b_rl

Finetuned
(789)
this model