--- library_name: transformers pipeline_tag: image-text-to-text tags: - vision-language - multimodal - process-reward-modeling - visual-reasoning - best-of-n --- # VRPRM-MiMo-7B VRPRM-MiMo-7B is a visual process reward model from **VRPRM: Process Reward Modeling via Visual Reasoning**. VRPRM is designed to evaluate intermediate reasoning steps for multimodal problems. The model is intended for visual process reward modeling, reasoning-step scoring, and Best-of-N selection for vision-language model outputs. ## Model Details - Model family: VRPRM - Release variant: MiMo-7B - Serialized architecture: `Qwen2_5_VLForConditionalGeneration` - Model type: `qwen2_5_vl` - Weights format: sharded `safetensors` - Recommended library: `transformers` ## Training Summary The [VRPRM](https://arxiv.org/abs/2508.03556) paper trains the model with a two-stage recipe: 1. Supervised fine-tuning cold start on high-quality CoT-PRM data. Open-sourced on [VRPRM3.6K](https://huggingface.co/datasets/two-tiger/VRPRM3.6K). 2. Reinforcement learning scaling on lower-cost non-CoT PRM data. ## Intended Use This model is intended for research on: - Visual process reward modeling - Multimodal reasoning evaluation - Step-level scoring of visual question answering rationales - Best-of-N selection for vision-language model responses This model is not intended to be used as a standalone assistant. ## Usage Load the model with Hugging Face Transformers from the repository root: ```python from transformers import AutoModelForVision2Seq, AutoProcessor model_id = "YOUR_USERNAME/VRPRM-MiMo-7B" processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForVision2Seq.from_pretrained( model_id, torch_dtype="auto", device_map="auto", trust_remote_code=True, ) ``` For the complete inference and evaluation pipeline, use the VRPRM project code. ## Citation ```bibtex @misc{chen2026vrprmprocessrewardmodeling, title={VRPRM: Process Reward Modeling via Visual Reasoning}, author={Xinquan Chen and Chongying Yue and Bangwei Liu and Xuhong Wang and Yingchun Wang and Chaochao Lu}, year={2026}, eprint={2508.03556}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.03556}, } ```