YanruWu commited on
Commit
22d5020
·
verified ·
1 Parent(s): f7c1aa5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ base_model: Qwen/Qwen3-VL-8B-Instruct
6
+ tags:
7
+ - reward-model
8
+ - robotics
9
+ - reinforcement-learning
10
+ - vision-language-model
11
+ - qwen3-vl
12
+ library_name: transformers
13
+ pipeline_tag: image-text-to-text
14
+ ---
15
+
16
+ # Large Reward Models (LRMs)
17
+
18
+ **Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models**
19
+
20
+ [Project Page](https://yanru-wu.github.io/Large-Reward-Models/) | [Paper](https://arxiv.org/abs/2603.16065)
21
+
22
+ **Authors:** Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao†, Yue Wang†
23
+
24
+ **Affiliations:** USC Physical Superintelligence Lab, Toyota Research Institute
25
+
26
+ ## Overview
27
+
28
+ This repository contains three specialized Large Reward Models (LRMs) fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) for generating reward signals in robot reinforcement learning. Each model serves a distinct role in the reward pipeline:
29
+
30
+ | Model | Path | Description |
31
+ |-------|------|-------------|
32
+ | **Temporal Contrastive** | `contrastive/` | Compares two observations to determine which is closer to task completion |
33
+ | **Absolute Progress** | `progress/` | Estimates the completion progress (0.0–1.0) from a single observation |
34
+ | **Task Completion** | `completion/` | Binary classifier for whether a task has been completed (yes/no) |
35
+
36
+ ## Usage
37
+
38
+ ### Requirements
39
+
40
+ ```bash
41
+ pip install transformers torch pillow
42
+ ```
43
+
44
+ ### Temporal Contrastive Model
45
+
46
+ Given an initial observation and two later observations, predicts which is closer to task completion.
47
+
48
+ ```python
49
+ from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
50
+ import torch
51
+ from PIL import Image
52
+
53
+ model_path = "USC-PSI-Lab/LRM-models"
54
+ subfolder = "contrastive"
55
+
56
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
57
+ model_path, subfolder=subfolder,
58
+ torch_dtype=torch.bfloat16, device_map="auto",
59
+ )
60
+ processor = AutoProcessor.from_pretrained(
61
+ model_path, subfolder=subfolder,
62
+ )
63
+
64
+ # Load images
65
+ initial_img = Image.open("initial.jpg").convert("RGB")
66
+ image_a = Image.open("image_a.jpg").convert("RGB")
67
+ image_b = Image.open("image_b.jpg").convert("RGB")
68
+
69
+ messages = [{"role": "user", "content": [
70
+ {"type": "text", "text": "Task: Compare the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Initial observation: "},
71
+ {"type": "image", "image": initial_img},
72
+ {"type": "text", "text": "\n- Later observation (Image A): "},
73
+ {"type": "image", "image": image_a},
74
+ {"type": "text", "text": "\n- Later observation (Image B): "},
75
+ {"type": "image", "image": image_b},
76
+ {"type": "text", "text": '\n\nQuestion: Which of Image A or Image B is closer to completing the task?\nSelect one value from the following list:\n["ImageA", "ImageB"]\n\nPlease provide a step-by-step visual analysis first, and then output your answer in the following JSON format:\n{ "more_complete_image": "selected_value" }'},
77
+ ]}]
78
+
79
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
80
+ inputs = processor(text=[text], images=[initial_img, image_a, image_b], padding=True, return_tensors="pt").to(model.device)
81
+
82
+ with torch.no_grad():
83
+ outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
84
+
85
+ response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
86
+ print(response)
87
+ # Output: { "more_complete_image": "ImageA" }
88
+ ```
89
+
90
+ ### Absolute Progress Model
91
+
92
+ Estimates completion progress as a value between 0.0 and 1.0.
93
+
94
+ ```python
95
+ subfolder = "progress"
96
+
97
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
98
+ model_path, subfolder=subfolder,
99
+ torch_dtype=torch.bfloat16, device_map="auto",
100
+ )
101
+ processor = AutoProcessor.from_pretrained(
102
+ model_path, subfolder=subfolder,
103
+ )
104
+
105
+ observation = Image.open("observation.jpg").convert("RGB")
106
+
107
+ messages = [{"role": "user", "content": [
108
+ {"type": "text", "text": "Task: Estimate the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
109
+ {"type": "image", "image": observation},
110
+ {"type": "text", "text": '\n\nEstimate the task completion progress from 0.0 (not started) to 1.0 (fully completed).\nOutput your answer in the following JSON format:\n{ "completion_progress": value }'},
111
+ ]}]
112
+
113
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
114
+ inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)
115
+
116
+ with torch.no_grad():
117
+ outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
118
+
119
+ response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
120
+ print(response)
121
+ # Output: { "completion_progress": 0.7 }
122
+ ```
123
+
124
+ ### Task Completion Model
125
+
126
+ Binary prediction of whether a task has been completed.
127
+
128
+ ```python
129
+ subfolder = "completion"
130
+
131
+ model = Qwen3VLForConditionalGeneration.from_pretrained(
132
+ model_path, subfolder=subfolder,
133
+ torch_dtype=torch.bfloat16, device_map="auto",
134
+ )
135
+ processor = AutoProcessor.from_pretrained(
136
+ model_path, subfolder=subfolder,
137
+ )
138
+
139
+ observation = Image.open("observation.jpg").convert("RGB")
140
+
141
+ messages = [{"role": "user", "content": [
142
+ {"type": "text", "text": "Task: Determine task completion.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
143
+ {"type": "image", "image": observation},
144
+ {"type": "text", "text": '\n\nHas the task been completed?\nOutput your answer in the following JSON format:\n{ "task_completed": "yes" or "no" }'},
145
+ ]}]
146
+
147
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
148
+ inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)
149
+
150
+ with torch.no_grad():
151
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
152
+
153
+ response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
154
+ print(response)
155
+ # Output: { "task_completed": "no" }
156
+ ```
157
+
158
+ ## License
159
+
160
+ This project is licensed under the Apache 2.0 License.