--- language: - en license: apache-2.0 base_model: Qwen/Qwen3-VL-8B-Instruct tags: - reward-model - robotics - reinforcement-learning - vision-language-model - qwen3-vl - robot-learning library_name: transformers --- # Large Reward Models (LRMs) **Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models** [Project Page](https://yanru-wu.github.io/Large-Reward-Models/) | [Paper](https://arxiv.org/abs/2603.16065) **Authors:** Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao†, Yue Wang† **Affiliations:** USC Physical Superintelligence Lab, Toyota Research Institute ## Overview This repository contains three specialized Large Reward Models (LRMs) fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) for generating reward signals in robot reinforcement learning. Each model serves a distinct role in the reward pipeline: | Model | Path | Description | |-------|------|-------------| | **Temporal Contrastive** | `contrastive/` | Compares two observations to determine which is closer to task completion | | **Absolute Progress** | `progress/` | Estimates the completion progress (0.0–1.0) from a single observation | | **Task Completion** | `completion/` | Binary classifier for whether a task has been completed (yes/no) | ## Usage ### Requirements ```bash pip install transformers torch pillow ``` ### Temporal Contrastive Model Given an initial observation and two later observations, predicts which is closer to task completion. ```python from transformers import Qwen3VLForConditionalGeneration, AutoProcessor import torch from PIL import Image model_path = "USC-PSI-Lab/LRM-models" subfolder = "contrastive" model = Qwen3VLForConditionalGeneration.from_pretrained( model_path, subfolder=subfolder, torch_dtype=torch.bfloat16, device_map="auto", ) processor = AutoProcessor.from_pretrained( model_path, subfolder=subfolder, ) # Load images initial_img = Image.open("initial.jpg").convert("RGB") image_a = Image.open("image_a.jpg").convert("RGB") image_b = Image.open("image_b.jpg").convert("RGB") messages = [{"role": "user", "content": [ {"type": "text", "text": "Task: Compare the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Initial observation: "}, {"type": "image", "image": initial_img}, {"type": "text", "text": "\n- Later observation (Image A): "}, {"type": "image", "image": image_a}, {"type": "text", "text": "\n- Later observation (Image B): "}, {"type": "image", "image": image_b}, {"type": "text", "text": '\n\nQuestion: Which of Image A or Image B is closer to completing the task?\nSelect one value from the following list:\n["ImageA", "ImageB"]\n\nPlease provide a step-by-step visual analysis first, and then output your answer in the following JSON format:\n{ "more_complete_image": "selected_value" }'}, ]}] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=[text], images=[initial_img, image_a, image_b], padding=True, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False) response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) # Output: { "more_complete_image": "ImageA" } ``` ### Absolute Progress Model Estimates completion progress as a value between 0.0 and 1.0. ```python subfolder = "progress" model = Qwen3VLForConditionalGeneration.from_pretrained( model_path, subfolder=subfolder, torch_dtype=torch.bfloat16, device_map="auto", ) processor = AutoProcessor.from_pretrained( model_path, subfolder=subfolder, ) observation = Image.open("observation.jpg").convert("RGB") messages = [{"role": "user", "content": [ {"type": "text", "text": "Task: Estimate the completion progress.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "}, {"type": "image", "image": observation}, {"type": "text", "text": '\n\nEstimate the task completion progress from 0.0 (not started) to 1.0 (fully completed).\nOutput your answer in the following JSON format:\n{ "completion_progress": value }'}, ]}] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False) response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) # Output: { "completion_progress": 0.7 } ``` ### Task Completion Model Binary prediction of whether a task has been completed. ```python subfolder = "completion" model = Qwen3VLForConditionalGeneration.from_pretrained( model_path, subfolder=subfolder, torch_dtype=torch.bfloat16, device_map="auto", ) processor = AutoProcessor.from_pretrained( model_path, subfolder=subfolder, ) observation = Image.open("observation.jpg").convert("RGB") messages = [{"role": "user", "content": [ {"type": "text", "text": "Task: Determine task completion.\n\nThe task is: Pick up the cup.\n\nYou are given:\n- Current observation: "}, {"type": "image", "image": observation}, {"type": "text", "text": '\n\nHas the task been completed?\nOutput your answer in the following JSON format:\n{ "task_completed": "yes" or "no" }'}, ]}] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=[text], images=[observation], padding=True, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False) response = processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) # Output: { "task_completed": "no" } ``` ## License This project is licensed under the Apache 2.0 License.