--- license: cc-by-4.0 datasets: - teetone/RoboReward language: - en base_model: - Qwen/Qwen3-VL-4B-Instruct --- # RoboReward 4B **Paper:** [https://arxiv.org/abs/2601.00675](https://arxiv.org/abs/2601.00675) RoboReward provides **general-purpose vision-language reward model for robotics**, trained on the [RoboReward dataset](https://huggingface.co/datasets/teetone/RoboReward) with **Qwen-3 VL** to predict **discrete end-of-episode progress rewards** from real-robot rollout videos. ## Usage ### Purpose Given a **task instruction** and a **rollout video**, the model predicts an end-of-episode progress score: - **1:** No success - **2:** Minimal progress - **3:** Partial completion - **4:** Near completion - **5:** Perfect completion ### Inference Follow the [original Qwen 3-VL instructions with video input](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) and use a text prompt like this: ```text Given the task, assign a discrete progress score reward (1,2,3,4,5) for the robot in the video in the format: ANSWER: Rubric for end-of-episode progress (judge only the final state without time limits): 1 - No Success: Final state shows no goal-relevant change for the command. 2 - Minimal Progress: Final state shows a small but insufficient change toward the goal. 3 - Partial Completion: The final state shows good progress toward the goal but violates more than one requirement or a major requirement. 4 - Near Completion: Final state is correct in region and intent but misses a single minor requirement. 5 - Perfect Completion: Final state satisfies all requirements. Task: ``` ## Citation ```bibtex @misc{lee2026roborewardgeneralpurposevisionlanguagereward, title={RoboReward: General-Purpose Vision-Language Reward Models for Robotics}, author={Tony Lee and Andrew Wagenmaker and Karl Pertsch and Percy Liang and Sergey Levine and Chelsea Finn}, year={2026}, eprint={2601.00675}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2601.00675}, } ```