teetone
/

RoboReward-8B

Model card Files Files and versions

teetone commited on 15 days ago

Commit

3a185b4

·

verified ·

1 Parent(s): 18bd333

Update README.md

Files changed (1) hide show

README.md +50 -1

README.md CHANGED Viewed

@@ -6,4 +6,53 @@ language:
 - en
 base_model:
 - Qwen/Qwen3-VL-8B-Instruct
----

 - en
 base_model:
 - Qwen/Qwen3-VL-8B-Instruct
+---
+# RoboReward 8B
+**Paper:** [https://arxiv.org/abs/2601.00675](https://arxiv.org/abs/2601.00675)
+RoboReward provides **general-purpose vision-language reward model for robotics**, trained on the [RoboReward dataset](https://huggingface.co/datasets/teetone/RoboReward) with **Qwen-3 VL** to predict **discrete end-of-episode progress rewards** from real-robot rollout videos.
+## Usage
+### Purpose
+Given a **task instruction** and a **rollout video**, the model predicts an end-of-episode progress score:
+- **1:** No success
+- **2:** Minimal progress
+- **3:** Partial completion
+- **4:** Near completion
+- **5:** Perfect completion
+### Inference
+Follow the [original Qwen 3-VL instructions with video input](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) and use a text prompt like this:
+```text
+Given the task, assign a discrete progress score reward (1,2,3,4,5) for the robot in the video in the format: ANSWER: <score>
+Rubric for end-of-episode progress (judge only the final state without time limits):
+1 - No Success: Final state shows no goal-relevant change for the command.
+2 - Minimal Progress: Final state shows a small but insufficient change toward the goal.
+3 - Partial Completion: The final state shows good progress toward the goal but violates more than one requirement or a major requirement.
+4 - Near Completion: Final state is correct in region and intent but misses a single minor requirement.
+5 - Perfect Completion: Final state satisfies all requirements.
+Task: <INSERT TASK HERE>
+```
+## Citation
+```bibtex
+@misc{lee2026roborewardgeneralpurposevisionlanguagereward,
+      title={RoboReward: General-Purpose Vision-Language Reward Models for Robotics},
+      author={Tony Lee and Andrew Wagenmaker and Karl Pertsch and Percy Liang and Sergey Levine and Chelsea Finn},
+      year={2026},
+      eprint={2601.00675},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2601.00675},
+}
+```