File size: 6,032 Bytes

---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-VL-8B-Instruct
tags:
- reward-model
- robotics
- reinforcement-learning
- vision-language-model
- qwen3-vl
- robot-learning
library_name: transformers
---

# Large Reward Models (LRMs)

**Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models**

[Project Page](https://yanru-wu.github.io/Large-Reward-Models/) | [Paper](https://arxiv.org/abs/2603.16065)

**Authors:** Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao†, Yue Wang†

**Affiliations:** USC Physical Superintelligence Lab, Toyota Research Institute

## Overview

This repository contains three specialized Large Reward Models (LRMs) fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) for generating reward signals in robot reinforcement learning. Each model serves a distinct role in the reward pipeline:

| Model | Path | Description |
|-------|------|-------------|
| **Temporal Contrastive** | `contrastive/` | Compares two observations to determine which is closer to task completion |
| **Absolute Progress** | `progress/` | Estimates the completion progress (0.0–1.0) from a single observation |
| **Task Completion** | `completion/` | Binary classifier for whether a task has been completed (yes/no) |

## Usage

### Requirements

```bash
pip install transformers torch pillow
```

### Temporal Contrastive Model

Given an initial observation and two later observations, predicts which is closer to task completion.

```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

model_path = "USC-PSI-Lab/LRM-models"
subfolder = "contrastive"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path, subfolder=subfolder,
    torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_path, subfolder=subfolder,
)

# Load images
initial_img = Image.open("initial.jpg").convert("RGB")
image_a = Image.open("image_a.jpg").convert("RGB")
image_b = Image.open("image_b.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "text", "text": "Task: Compare the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Initial observation: "},
    {"type": "image", "image": initial_img},
    {"type": "text", "text": "\n- Later observation (Image A): "},
    {"type": "image", "image": image_a},
    {"type": "text", "text": "\n- Later observation (Image B): "},
    {"type": "image", "image": image_b},
    {"type": "text", "text": '\n\nQuestion: Which of Image A or Image B is closer to completing the task?\nSelect one value from the following list:\n["ImageA", "ImageB"]\n\nPlease provide a step-by-step visual analysis first, and then output your answer in the following JSON format:\n{ "more_complete_image": "selected_value" }'},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[initial_img, image_a, image_b], padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "more_complete_image": "ImageA" }
```

### Absolute Progress Model

Estimates completion progress as a value between 0.0 and 1.0.

```python
subfolder = "progress"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path, subfolder=subfolder,
    torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_path, subfolder=subfolder,
)

observation = Image.open("observation.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "text", "text": "Task: Estimate the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
    {"type": "image", "image": observation},
    {"type": "text", "text": '\n\nEstimate the task completion progress from 0.0 (not started) to 1.0 (fully completed).\nOutput your answer in the following JSON format:\n{ "completion_progress": value }'},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "completion_progress": 0.7 }
```

### Task Completion Model

Binary prediction of whether a task has been completed.

```python
subfolder = "completion"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path, subfolder=subfolder,
    torch_dtype=torch.bfloat16, device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    model_path, subfolder=subfolder,
)

observation = Image.open("observation.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "text", "text": "Task: Determine task completion.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
    {"type": "image", "image": observation},
    {"type": "text", "text": '\n\nHas the task been completed?\nOutput your answer in the following JSON format:\n{ "task_completed": "yes" or "no" }'},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# Output: { "task_completed": "no" }
```

## License

This project is licensed under the Apache 2.0 License.