|
|
--- |
|
|
license: cc-by-4.0 |
|
|
datasets: |
|
|
- teetone/RoboReward |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen3-VL-8B-Instruct |
|
|
--- |
|
|
|
|
|
|
|
|
# RoboReward 8B |
|
|
|
|
|
**Paper:** [https://arxiv.org/abs/2601.00675](https://arxiv.org/abs/2601.00675) |
|
|
|
|
|
RoboReward provides **general-purpose vision-language reward model for robotics**, trained on the [RoboReward dataset](https://huggingface.co/datasets/teetone/RoboReward) with **Qwen-3 VL** to predict **discrete end-of-episode progress rewards** from real-robot rollout videos. |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
### Purpose |
|
|
|
|
|
Given a **task instruction** and a **rollout video**, the model predicts an end-of-episode progress score: |
|
|
- **1:** No success |
|
|
- **2:** Minimal progress |
|
|
- **3:** Partial completion |
|
|
- **4:** Near completion |
|
|
- **5:** Perfect completion |
|
|
|
|
|
### Inference |
|
|
|
|
|
Follow the [original Qwen 3-VL instructions with video input](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) and use a text prompt like this: |
|
|
|
|
|
```text |
|
|
Given the task, assign a discrete progress score reward (1,2,3,4,5) for the robot in the video in the format: ANSWER: <score> |
|
|
Rubric for end-of-episode progress (judge only the final state without time limits): |
|
|
1 - No Success: Final state shows no goal-relevant change for the command. |
|
|
2 - Minimal Progress: Final state shows a small but insufficient change toward the goal. |
|
|
3 - Partial Completion: The final state shows good progress toward the goal but violates more than one requirement or a major requirement. |
|
|
4 - Near Completion: Final state is correct in region and intent but misses a single minor requirement. |
|
|
5 - Perfect Completion: Final state satisfies all requirements. |
|
|
|
|
|
Task: <INSERT TASK HERE> |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{lee2026roborewardgeneralpurposevisionlanguagereward, |
|
|
title={RoboReward: General-Purpose Vision-Language Reward Models for Robotics}, |
|
|
author={Tony Lee and Andrew Wagenmaker and Karl Pertsch and Percy Liang and Sergey Levine and Chelsea Finn}, |
|
|
year={2026}, |
|
|
eprint={2601.00675}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.RO}, |
|
|
url={https://arxiv.org/abs/2601.00675}, |
|
|
} |
|
|
``` |