Perceive-to-Reason
Collection
5 items • Updated
This repository contains the P2R-4B, introduced in Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning.
P2R-4B is a fine-grained visual reasoning model built upon Qwen3-VL-4B-Instruct. It performs inference under the P2R framework, a two-stage visual reasoning framework that decouples perception from reasoning. Training is powered by PRA-GRPO, a role-aware alternating RL strategy.
| Model | V-Star | HR-Bench-4K | HR-Bench-8K | MME-RealWorld-Lite |
|---|---|---|---|---|
| Qwen3-VL-Instruct-4B | 81.7 | 73.8 | 67.0 | 47.7 |
| P2R-4B | 93.2 | 81.9 | 80.5 | 54.8 |
| Δ | +11.5 | +8.1 | +13.5 | +7.1 |
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model = Qwen3VLForConditionalGeneration.from_pretrained("hongxingli/P2R-4B")
processor = AutoProcessor.from_pretrained("hongxingli/P2R-4B")
For the full two-stage P2R inference pipeline, please refer to our code repository.
@misc{li2026perceivetoreasondecouplingperceptionreasoning,
title={Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning},
author={Hongxing Li and Xiufeng Huang and Dingming Li and Wenjing Jiang and Zixuan Wang and Haolei Xu and Hanrong Zhang and Haiwen Hong and Longtao Huang and Hui Xue and Weiming Lu and Jun Xiao and Yueting Zhuang and Yongliang Shen},
year={2026},
eprint={2607.01191},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2607.01191},
}
Base model
Qwen/Qwen3-VL-4B-Instruct