LeonOverload
/

PRIMO-R1-7B

Video-Text-to-Text

Model card Files Files and versions

PRIMO-R1-7B / README.md

nielsr's picture

nielsr HF Staff

Update model card with description and paper link

fa6d40a verified 1 day ago

|

2.35 kB

	---
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	license: apache-2.0
	metrics:
	- mae
	- accuracy
	pipeline_tag: video-text-to-text
	---

	# PRIMO R1: Process Reasoning Induced Monitoring

	This repository contains the model weights for PRIMO R1, introduced in the paper [From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation](https://huggingface.co/papers/2603.15600).

	## Model Description

	PRIMO R1 is a 7B framework designed to transform video Multimodal Large Language Models (MLLMs) from passive "Observers" into active "Critics" for long-horizon robotic manipulation. While traditional models often focus on recognizing ongoing events, PRIMO R1 evaluates the current state of a task relative to its final goal.

	The model is fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) using outcome-based Reinforcement Learning to elicit explicit Chain-of-Thought (CoT) generation for progress estimation. Its architecture incorporates a structured temporal input that anchors video sequences between the initial and current state images.

	## Key Features

	- RL-Induced Reasoning: Uses outcome-based RL to incentivize the generation of thought processes that evaluate state progress.
	- State-of-the-Art Performance: Achieves a 50% reduction in the mean absolute error of specialized reasoning baselines, outperforming much larger general MLLMs.
	- Strong Generalization: Exhibits zero-shot performance on failure detection tasks, achieving 67.0% accuracy on the RoboFail benchmark and surpassing closed-source models like OpenAI o1.
	- Structured Temporal Input: Explicitly anchors the video sequence between initial and current state images to provide clear goal-oriented context.

	## Citations

	If you find our work helpful for your research, please consider citing our work.

	```
	@misc{liu2026passiveobserveractivecritic,
	title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation},
	author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu},
	year={2026},
	eprint={2603.15600},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2603.15600},
	}
	```