arXiv Code Data

P2R-8B

This repository contains the P2R-8B, introduced in Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning.

Model Description

P2R-8B is a fine-grained visual reasoning model built upon Qwen3-VL-8B-Instruct. It performs inference under the P2R framework, a two-stage visual reasoning framework that decouples perception from reasoning. Training is powered by PRA-GRPO, a role-aware alternating RL strategy.

Model Performance

Model V-Star HR-Bench-4K HR-Bench-8K MME-RealWorld-Lite
Qwen3-VL-Instruct-8B 83.8 74.8 70.1 50.4
P2R-8B 93.7 81.5 82.6 57.4
Δ +9.9 +6.7 +12.5 +7.0

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained("hongxingli/P2R-8B")
processor = AutoProcessor.from_pretrained("hongxingli/P2R-8B")

For the full two-stage P2R inference pipeline, please refer to our code repository.

Citation

@misc{li2026perceivetoreasondecouplingperceptionreasoning,
      title={Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning}, 
      author={Hongxing Li and Xiufeng Huang and Dingming Li and Wenjing Jiang and Zixuan Wang and Haolei Xu and Hanrong Zhang and Haiwen Hong and Longtao Huang and Hui Xue and Weiming Lu and Jun Xiao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2607.01191},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2607.01191}, 
}
Downloads last month
6
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hongxingli/P2R-8B

Finetuned
(341)
this model

Collection including hongxingli/P2R-8B

Paper for hongxingli/P2R-8B