aliangdw commited on
Commit
ba55e10
·
verified ·
1 Parent(s): e179783

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -12
README.md CHANGED
@@ -2,26 +2,67 @@
2
  license: apache-2.0
3
  base_model: Qwen/Qwen3-VL-4B-Instruct
4
  tags:
5
- - reward_model
6
- - rfm
7
- - preference_comparisons
8
  library_name: transformers
9
  ---
10
 
11
- # aliangdw/libero_ablation_prog_pref_with_fail_lora_ft_4frames
12
 
13
- ## Model Details
14
 
15
- - **Base Model**: Qwen/Qwen3-VL-4B-Instruct
16
- - **Model Type**: qwen3_vl
17
 
18
- ## Training Run
19
 
20
- - **Wandb Run**: [libero_ablation_prog_pref_with_fail_lora_ft_4frames_2000steps](https://wandb.ai/clvr/rfm/runs/f4yifl7b)
21
- - **Wandb ID**: `f4yifl7b`
22
- - **Project**: rfm
23
- - **Notes**: libero prog pref with fail lora ft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Citation
26
 
27
  If you use this model, please cite:
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  base_model: Qwen/Qwen3-VL-4B-Instruct
4
  tags:
5
+ - reward model
6
+ - robot learning
7
+ - foundation models
8
  library_name: transformers
9
  ---
10
 
11
+ # Robometer 4B LIBERO
12
 
13
+ **Paper:** [arXiv (Coming Soon)](https://arxiv.org/)
14
 
15
+ **Robometer** is a general-purpose vision-language reward model for robotics. It is trained on [RBM-1M](https://huggingface.co/datasets/) with **Qwen3-VL-4B** to predict **per-frame progress**, **per-frame success**, and **trajectory preferences** from rollout videos. The model combines (1) frame-level progress supervision on expert data and (2) trajectory-comparison preference supervision, so it can learn from both successful and failed rollouts and generalize across diverse robot embodiments and tasks.
 
16
 
17
+ Given a **task instruction** and a **rollout video** (or frame sequence), the model predicts:
18
 
19
+ - **Per-frame progress** — continuous progress values over time (e.g. 0–1 or binned).
20
+ - **Per-frame success** — success probability (or binary) at each timestep.
21
+ - **Preference / ranking** — which of two trajectories is better for the task.
22
+
23
+ This model is trained on LIBERO-10/Spatial/Object/Goal
24
+
25
+ ### Usage
26
+
27
+ For full setup, example scripts, and configs, see the **GitHub repo**: [github.com/aliang8/robometer](https://github.com/aliang8/robometer).
28
+
29
+ **Option 1 — Run the model locally** (loads this checkpoint from Hugging Face):
30
+
31
+ ```bash
32
+ uv run python scripts/example_inference_local.py \
33
+ --model-path aliangdw/Robometer-4B-LIBERO \
34
+ --video /path/to/video.mp4 \
35
+ --task "your task description"
36
+ ```
37
+
38
+ **Option 2 — Use the evaluation server** (start server, then run client):
39
+
40
+ ```bash
41
+ # Start server
42
+ uv run python robometer/evals/eval_server.py \
43
+ --config-path=robometer/configs \
44
+ --config-name=eval_config_server \
45
+ model_path=aliangdw/Robometer-4B-LIBERO \
46
+ server_url=0.0.0.0 \
47
+ server_port=8000
48
+
49
+ # Client (no robometer dependency)
50
+ uv run python scripts/example_inference.py \
51
+ --eval-server-url http://localhost:8000 \
52
+ --video /path/to/video.mp4 \
53
+ --task "your task description"
54
+ ```
55
 
56
  ## Citation
57
 
58
  If you use this model, please cite:
59
+
60
+ ```bibtex
61
+ @misc{robometer2025,
62
+ title={Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons},
63
+ author={Anthony Liang* and Yigit Korkmaz* and Jiahui Zhang and Minyoung Hwang and Abrar Anwar and Sidhant Kaushik and Aditya Shah and Alex S. Huang and Luke Zettlemoyer and Dieter Fox and Yu Xiang and Anqi Li and Andreea Bobu and Abhishek Gupta and Stephen Tu† and Erdem B{\i}y{\i}k† and Jesse Zhang†},
64
+ year={2025},
65
+ url={https://github.com/aliang8/reward_fm},
66
+ note={arXiv coming soon}
67
+ }
68
+ ```