Philip-MIT commited on
Commit
f407c3f
·
verified ·
1 Parent(s): 179a97a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -27,7 +27,7 @@ This model accompanies the paper **“SOLE-R1: Video-Language Reasoning as the S
27
 
28
  ## Model Description
29
 
30
- SOLE-R1 predicts robot task progress from visual observations. Given an image or video-frame montage containing the first, previous, and current timestep views, plus a task description and prior progress value, the model outputs a reasoning trace and a scalar progress estimate.
31
 
32
  Expected output format:
33
 
@@ -35,17 +35,6 @@ Expected output format:
35
 
36
  The progress estimate is intended to serve as a dense reward signal for robotic reinforcement learning, especially when manually engineered rewards are unavailable.
37
 
38
- ## Intended Use
39
-
40
- SOLE-R1-8B is intended for:
41
-
42
- - Robotics reward prediction
43
- - Online robot RL reward generation
44
- - Evaluating task progress from robot videos
45
- - Interpretable video-language reasoning for manipulation tasks
46
- - Research on learned reward models and robotic foundation models
47
-
48
- It is not intended as a general-purpose safety-critical robotics controller. The model should be validated in the target environment before use in closed-loop robotic systems.
49
 
50
  ## Quick Start
51
 
@@ -68,9 +57,20 @@ The recommended interface for inference is [RoboReason](https://github.com/Phili
68
  view_type_per_video=["external and wrist"],
69
  verbose=False,
70
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
- print(rewards)
73
- print(reasoning_traces)
74
 
75
 
76
 
 
27
 
28
  ## Model Description
29
 
30
+ SOLE-R1 predicts robot task progress from visual observations. Given a video and a task description, the model outputs a reasoning trace and a scalar progress estimate.
31
 
32
  Expected output format:
33
 
 
35
 
36
  The progress estimate is intended to serve as a dense reward signal for robotic reinforcement learning, especially when manually engineered rewards are unavailable.
37
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Quick Start
40
 
 
57
  view_type_per_video=["external and wrist"],
58
  verbose=False,
59
  )
60
+ print(rewards)
61
+ print(reasoning_traces)
62
+
63
+ # Plotting with show_reasoning_traces=True
64
+ output_sole = {"model": "SOLE-R1", "rewards": rewards[0], "reasoning_traces": reasoning_traces[0]}
65
+ rr.video_plot(
66
+ outputs=[output_sole],
67
+ plot_save_path='model_outputs/sole-r1/robosuite/lift/unsuccessful/robosuite_lift_episode_12_unsuccessful_max_reward_38.mp4',
68
+ video_path=video_paths[0],
69
+ show_reasoning_traces=True,
70
+ task_description=task_description,
71
+ verbose=False
72
+ )
73
 
 
 
74
 
75
 
76