File size: 6,032 Bytes
22d5020
 
 
 
 
 
 
 
 
 
 
5f8da7f
22d5020
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f8da7f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Instruct
tags:
- reward-model
- robotics
- reinforcement-learning
- vision-language-model
- qwen3-vl
- robot-learning
library_name: transformers
---

# Large Reward Models (LRMs)

**Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models**

[Project Page](https://yanru-wu.github.io/Large-Reward-Models/) | [Paper](https://arxiv.org/abs/2603.16065)

**Authors:** Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao†, Yue Wang†

**Affiliations:** USC Physical Superintelligence Lab, Toyota Research Institute

## Overview

This repository contains three specialized Large Reward Models (LRMs) fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) for generating reward signals in robot reinforcement learning. Each model serves a distinct role in the reward pipeline:

| Model | Path | Description |
|-------|------|-------------|
| **Temporal Contrastive** | `contrastive/` | Compares two observations to determine which is closer to task completion |
| **Absolute Progress** | `progress/` | Estimates the completion progress (0.0–1.0) from a single observation |
| **Task Completion** | `completion/` | Binary classifier for whether a task has been completed (yes/no) |

## Usage

### Requirements

```bash
pip install transformers torch pillow
```

### Temporal Contrastive Model

Given an initial observation and two later observations, predicts which is closer to task completion.

```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

model_path = "USC-PSI-Lab/LRM-models"
subfolder = "contrastive"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path, subfolder=subfolder,
    torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_path, subfolder=subfolder,
)

# Load images
initial_img = Image.open("initial.jpg").convert("RGB")
image_a = Image.open("image_a.jpg").convert("RGB")
image_b = Image.open("image_b.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "text", "text": "Task: Compare the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Initial observation: "},
    {"type": "image", "image": initial_img},
    {"type": "text", "text": "\n- Later observation (Image A): "},
    {"type": "image", "image": image_a},
    {"type": "text", "text": "\n- Later observation (Image B): "},
    {"type": "image", "image": image_b},
    {"type": "text", "text": '\n\nQuestion: Which of Image A or Image B is closer to completing the task?\nSelect one value from the following list:\n["ImageA", "ImageB"]\n\nPlease provide a step-by-step visual analysis first, and then output your answer in the following JSON format:\n{ "more_complete_image": "selected_value" }'},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[initial_img, image_a, image_b], padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "more_complete_image": "ImageA" }
```

### Absolute Progress Model

Estimates completion progress as a value between 0.0 and 1.0.

```python
subfolder = "progress"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path, subfolder=subfolder,
    torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_path, subfolder=subfolder,
)

observation = Image.open("observation.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "text", "text": "Task: Estimate the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
    {"type": "image", "image": observation},
    {"type": "text", "text": '\n\nEstimate the task completion progress from 0.0 (not started) to 1.0 (fully completed).\nOutput your answer in the following JSON format:\n{ "completion_progress": value }'},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "completion_progress": 0.7 }
```

### Task Completion Model

Binary prediction of whether a task has been completed.

```python
subfolder = "completion"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path, subfolder=subfolder,
    torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_path, subfolder=subfolder,
)

observation = Image.open("observation.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "text", "text": "Task: Determine task completion.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
    {"type": "image", "image": observation},
    {"type": "text", "text": '\n\nHas the task been completed?\nOutput your answer in the following JSON format:\n{ "task_completed": "yes" or "no" }'},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "task_completed": "no" }
```

## License

This project is licensed under the Apache 2.0 License.