---
library_name: transformers
pipeline_tag: image-text-to-text
tags:
  - vision-language
  - multimodal
  - process-reward-modeling
  - visual-reasoning
  - best-of-n
---

# VRPRM-MiMo-7B

VRPRM-MiMo-7B is a visual process reward model from **VRPRM: Process Reward Modeling via Visual Reasoning**.

VRPRM is designed to evaluate intermediate reasoning steps for multimodal problems. The model is intended for visual process reward modeling, reasoning-step scoring, and Best-of-N selection for vision-language model outputs.

## Model Details

- Model family: VRPRM
- Release variant: MiMo-7B
- Serialized architecture: `Qwen2_5_VLForConditionalGeneration`
- Model type: `qwen2_5_vl`
- Weights format: sharded `safetensors`
- Recommended library: `transformers`

## Training Summary

The [VRPRM](https://arxiv.org/abs/2508.03556) paper trains the model with a two-stage recipe:

1. Supervised fine-tuning cold start on high-quality CoT-PRM data. Open-sourced on [VRPRM3.6K](https://huggingface.co/datasets/two-tiger/VRPRM3.6K).
2. Reinforcement learning scaling on lower-cost non-CoT PRM data.

## Intended Use

This model is intended for research on:

- Visual process reward modeling
- Multimodal reasoning evaluation
- Step-level scoring of visual question answering rationales
- Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

## Usage

Load the model with Hugging Face Transformers from the repository root:

```python
from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-MiMo-7B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
```

For the complete inference and evaluation pipeline, use the VRPRM project code.


## Citation

```bibtex
@misc{chen2026vrprmprocessrewardmodeling,
      title={VRPRM: Process Reward Modeling via Visual Reasoning}, 
      author={Xinquan Chen and Chongying Yue and Bangwei Liu and Xuhong Wang and Yingchun Wang and Chaochao Lu},
      year={2026},
      eprint={2508.03556},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.03556}, 
}
```