LRM-models / README.md
YanruWu's picture
Update README.md
5f8da7f verified
---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Instruct
tags:
- reward-model
- robotics
- reinforcement-learning
- vision-language-model
- qwen3-vl
- robot-learning
library_name: transformers
---
# Large Reward Models (LRMs)
**Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models**
[Project Page](https://yanru-wu.github.io/Large-Reward-Models/) | [Paper](https://arxiv.org/abs/2603.16065)
**Authors:** Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao†, Yue Wang†
**Affiliations:** USC Physical Superintelligence Lab, Toyota Research Institute
## Overview
This repository contains three specialized Large Reward Models (LRMs) fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) for generating reward signals in robot reinforcement learning. Each model serves a distinct role in the reward pipeline:
| Model | Path | Description |
|-------|------|-------------|
| **Temporal Contrastive** | `contrastive/` | Compares two observations to determine which is closer to task completion |
| **Absolute Progress** | `progress/` | Estimates the completion progress (0.0–1.0) from a single observation |
| **Task Completion** | `completion/` | Binary classifier for whether a task has been completed (yes/no) |
## Usage
### Requirements
```bash
pip install transformers torch pillow
```
### Temporal Contrastive Model
Given an initial observation and two later observations, predicts which is closer to task completion.
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
model_path = "USC-PSI-Lab/LRM-models"
subfolder = "contrastive"
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_path, subfolder=subfolder,
torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
model_path, subfolder=subfolder,
)
# Load images
initial_img = Image.open("initial.jpg").convert("RGB")
image_a = Image.open("image_a.jpg").convert("RGB")
image_b = Image.open("image_b.jpg").convert("RGB")
messages = [{"role": "user", "content": [
{"type": "text", "text": "Task: Compare the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Initial observation: "},
{"type": "image", "image": initial_img},
{"type": "text", "text": "\n- Later observation (Image A): "},
{"type": "image", "image": image_a},
{"type": "text", "text": "\n- Later observation (Image B): "},
{"type": "image", "image": image_b},
{"type": "text", "text": '\n\nQuestion: Which of Image A or Image B is closer to completing the task?\nSelect one value from the following list:\n["ImageA", "ImageB"]\n\nPlease provide a step-by-step visual analysis first, and then output your answer in the following JSON format:\n{ "more_complete_image": "selected_value" }'},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[initial_img, image_a, image_b], padding=True, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "more_complete_image": "ImageA" }
```
### Absolute Progress Model
Estimates completion progress as a value between 0.0 and 1.0.
```python
subfolder = "progress"
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_path, subfolder=subfolder,
torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
model_path, subfolder=subfolder,
)
observation = Image.open("observation.jpg").convert("RGB")
messages = [{"role": "user", "content": [
{"type": "text", "text": "Task: Estimate the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
{"type": "image", "image": observation},
{"type": "text", "text": '\n\nEstimate the task completion progress from 0.0 (not started) to 1.0 (fully completed).\nOutput your answer in the following JSON format:\n{ "completion_progress": value }'},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "completion_progress": 0.7 }
```
### Task Completion Model
Binary prediction of whether a task has been completed.
```python
subfolder = "completion"
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_path, subfolder=subfolder,
torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
model_path, subfolder=subfolder,
)
observation = Image.open("observation.jpg").convert("RGB")
messages = [{"role": "user", "content": [
{"type": "text", "text": "Task: Determine task completion.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
{"type": "image", "image": observation},
{"type": "text", "text": '\n\nHas the task been completed?\nOutput your answer in the following JSON format:\n{ "task_completed": "yes" or "no" }'},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "task_completed": "no" }
```
## License
This project is licensed under the Apache 2.0 License.