Update README.md

5f8da7f verified 1 day ago

6.03 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: Qwen/Qwen3-VL-8B-Instruct
	tags:
	- reward-model
	- robotics
	- reinforcement-learning
	- vision-language-model
	- qwen3-vl
	- robot-learning
	library_name: transformers
	---

	# Large Reward Models (LRMs)

	Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

	[Project Page](https://yanru-wu.github.io/Large-Reward-Models/) \| [Paper](https://arxiv.org/abs/2603.16065)

	Authors: Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao†, Yue Wang†

	Affiliations: USC Physical Superintelligence Lab, Toyota Research Institute

	## Overview

	This repository contains three specialized Large Reward Models (LRMs) fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) for generating reward signals in robot reinforcement learning. Each model serves a distinct role in the reward pipeline:

	\| Model \| Path \| Description \|
	\|-------\|------\|-------------\|
	\| Temporal Contrastive \| `contrastive/` \| Compares two observations to determine which is closer to task completion \|
	\| Absolute Progress \| `progress/` \| Estimates the completion progress (0.0–1.0) from a single observation \|
	\| Task Completion \| `completion/` \| Binary classifier for whether a task has been completed (yes/no) \|

	## Usage

	### Requirements

	```bash
	pip install transformers torch pillow
	```

	### Temporal Contrastive Model

	Given an initial observation and two later observations, predicts which is closer to task completion.

	```python
	from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
	import torch
	from PIL import Image

	model_path = "USC-PSI-Lab/LRM-models"
	subfolder = "contrastive"

	model = Qwen3VLForConditionalGeneration.from_pretrained(
	model_path, subfolder=subfolder,
	torch_dtype=torch.bfloat16, device_map="auto",
	)
	processor = AutoProcessor.from_pretrained(
	model_path, subfolder=subfolder,
	)

	# Load images
	initial_img = Image.open("initial.jpg").convert("RGB")
	image_a = Image.open("image_a.jpg").convert("RGB")
	image_b = Image.open("image_b.jpg").convert("RGB")

	messages = [{"role": "user", "content": [
	{"type": "text", "text": "Task: Compare the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Initial observation: "},
	{"type": "image", "image": initial_img},
	{"type": "text", "text": "\n- Later observation (Image A): "},
	{"type": "image", "image": image_a},
	{"type": "text", "text": "\n- Later observation (Image B): "},
	{"type": "image", "image": image_b},
	{"type": "text", "text": '\n\nQuestion: Which of Image A or Image B is closer to completing the task?\nSelect one value from the following list:\n["ImageA", "ImageB"]\n\nPlease provide a step-by-step visual analysis first, and then output your answer in the following JSON format:\n{ "more_complete_image": "selected_value" }'},
	]}]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=[text], images=[initial_img, image_a, image_b], padding=True, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

	response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	print(response)
	# Output: { "more_complete_image": "ImageA" }
	```

	### Absolute Progress Model

	Estimates completion progress as a value between 0.0 and 1.0.

	```python
	subfolder = "progress"

	model = Qwen3VLForConditionalGeneration.from_pretrained(
	model_path, subfolder=subfolder,
	torch_dtype=torch.bfloat16, device_map="auto",
	)
	processor = AutoProcessor.from_pretrained(
	model_path, subfolder=subfolder,
	)

	observation = Image.open("observation.jpg").convert("RGB")

	messages = [{"role": "user", "content": [
	{"type": "text", "text": "Task: Estimate the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
	{"type": "image", "image": observation},
	{"type": "text", "text": '\n\nEstimate the task completion progress from 0.0 (not started) to 1.0 (fully completed).\nOutput your answer in the following JSON format:\n{ "completion_progress": value }'},
	]}]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)

	response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	print(response)
	# Output: { "completion_progress": 0.7 }
	```

	### Task Completion Model

	Binary prediction of whether a task has been completed.

	```python
	subfolder = "completion"

	model = Qwen3VLForConditionalGeneration.from_pretrained(
	model_path, subfolder=subfolder,
	torch_dtype=torch.bfloat16, device_map="auto",
	)
	processor = AutoProcessor.from_pretrained(
	model_path, subfolder=subfolder,
	)

	observation = Image.open("observation.jpg").convert("RGB")

	messages = [{"role": "user", "content": [
	{"type": "text", "text": "Task: Determine task completion.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "},
	{"type": "image", "image": observation},
	{"type": "text", "text": '\n\nHas the task been completed?\nOutput your answer in the following JSON format:\n{ "task_completed": "yes" or "no" }'},
	]}]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)

	response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
	print(response)
	# Output: { "task_completed": "no" }
	```

	## License

	This project is licensed under the Apache 2.0 License.