lakomchik commited on
Commit
7c02222
·
verified ·
1 Parent(s): 1d63aa0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +185 -0
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: lerobot
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ base_model:
7
+ - SberRoboticsCenter/GreenVLA-5b-base
8
+ pipeline_tag: robotics
9
+ tags:
10
+ - robotics
11
+ - vla
12
+ - vision-language-action
13
+ - manipulation
14
+ - flow-matching
15
+ - action-prediction
16
+ - green-vla
17
+ - bridge
18
+ - widowx
19
+ - reinforcement-learning
20
+ datasets:
21
+ - IPEC-COMMUNITY/bridge_orig_lerobot
22
+ model-index:
23
+ - name: GreenVLA-5b-R2-bridge
24
+ results:
25
+ - task:
26
+ type: robotics
27
+ name: SimplerEnv WidowX (Bridge)
28
+ dataset:
29
+ type: IPEC-COMMUNITY/bridge_orig_lerobot
30
+ name: Bridge
31
+ metrics:
32
+ - type: success_rate
33
+ name: Partial Average
34
+ value: 91.7
35
+ - type: success_rate
36
+ name: Entire Average
37
+ value: 79.1
38
+ ---
39
+
40
+ <div align="center">
41
+
42
+ # GreenVLA-5b-R2-bridge
43
+
44
+ ### RL-Aligned VLA for Bridge (WidowX)
45
+
46
+ **Sber Robotics Center &middot; Manipulation Team**
47
+
48
+ [![arXiv](https://img.shields.io/badge/arXiv-2602.00919-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.00919)
49
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=github&logoColor=white)](https://greenvla.github.io/)
50
+ [![Code](https://img.shields.io/badge/Code-GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/greenvla/GreenVLA)
51
+
52
+ </div>
53
+
54
+ ---
55
+
56
+ ## Overview
57
+
58
+ **GreenVLA-5b-R2-bridge** is the R2 (RL-aligned) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Bridge](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) dataset for the WidowX robot arm joined with additional trajectories collected in SimplerEnv Bridge environments. Trajectory collection and RL fine-tuning were conducted according to the Trajectory Optimization approach described in the [technical report](https://arxiv.org/abs/2602.00919).
59
+
60
+ Starting from [GreenVLA-5b-base](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base), this model went through both R1 (supervised fine-tuning) and R2 (RL policy alignment) stages, resulting in significant performance gains over behavior cloning alone.
61
+
62
+ ## Evaluation
63
+
64
+ Evaluated on **SimplerEnv WidowX (Bridge)** benchmark.
65
+
66
+ > **Note:** Bridge benchmark results can vary up to ±6% between runs. We recommend averaging over multiple evaluation runs for reliable comparisons.
67
+
68
+ ### Partial Success Rate
69
+
70
+ | Task | Success Rate |
71
+ |------|:---:|
72
+ | Put Spoon on Towel | 87.5% |
73
+ | Stack Blocks (Cubes) | 95.8% |
74
+ | Put Eggplant in Basket | 91.7% |
75
+ | Put Carrot on Plate | 91.6% |
76
+ | **Average** | **91.7%** |
77
+
78
+ ### Entire Success Rate
79
+
80
+ | Task | Success Rate |
81
+ |------|:---:|
82
+ | Put Spoon on Towel | 90.1% |
83
+ | Stack Blocks (Cubes) | 52.6% |
84
+ | Put Eggplant in Basket | 84.8% |
85
+ | Put Carrot on Plate | 89.0% |
86
+ | **Average** | **79.1%** |
87
+
88
+ ## Training
89
+
90
+ | | Details |
91
+ |---|---|
92
+ | **Base checkpoint** | [GreenVLA-5b-base](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base) |
93
+ | **Stage** | R2 — RL policy alignment |
94
+ | **Method** | Trajectory optimization (SFT + RL on collected trajectories) |
95
+ | **Dataset** | [IPEC-COMMUNITY/bridge_orig_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) + SimplerEnv rollouts |
96
+ | **Robot** | WidowX (Bridge) |
97
+ | **Parameters** | ~5B |
98
+
99
+ ## Quick Start
100
+
101
+ ### Installation
102
+
103
+ ```bash
104
+ git clone https://github.com/greenvla/GreenVLA.git
105
+ cd GreenVLA
106
+ uv sync # or: pip install -e .
107
+ ```
108
+
109
+ ### Inference
110
+
111
+ ```python
112
+ import numpy as np
113
+ import torch
114
+ from lerobot.common.policies.factory import load_pretrained_policy
115
+ from lerobot.common.utils.torch_observation import (
116
+ move_dict_to_batch_for_inference,
117
+ torch_preprocess_dict_inference,
118
+ )
119
+
120
+ # 1. Load policy and transforms.
121
+ policy, input_transforms, output_transforms = load_pretrained_policy(
122
+ "SberRoboticsCenter/GreenVLA-5b-R2-bridge",
123
+ data_config_name="bridge",
124
+ )
125
+ policy.to("cuda").eval()
126
+
127
+ # 2. Build an observation (replace with real sensor data).
128
+ raw_obs = {
129
+ "observation/state": np.random.rand(8).astype(np.float32), # x y z roll pitch yaw _pad_ gripper
130
+ "observation/image": np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8),
131
+ "prompt": "pick up the green block and place it on the plate",
132
+ }
133
+
134
+ # 3. Transform, preprocess, and batch.
135
+ obs = input_transforms(raw_obs)
136
+ obs = torch_preprocess_dict_inference(obs)
137
+ batch = move_dict_to_batch_for_inference(obs, device="cuda")
138
+
139
+ # 4. Predict actions and post-process.
140
+ with torch.inference_mode():
141
+ raw_actions = policy.select_action(batch).cpu().numpy()
142
+
143
+ actions = output_transforms(
144
+ {"actions": raw_actions, "state": batch["state"].cpu().numpy()}
145
+ )["actions"]
146
+ # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
147
+ ```
148
+
149
+ See [`examples/example_inference_bridge.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_bridge.py) for the full runnable script with argument parsing.
150
+
151
+ ## Model Family
152
+
153
+ | Model | Stage | Params | Description | Link |
154
+ |-------|:-----:|:------:|-------------|:----:|
155
+ | **GreenVLA-2b-base** | Base | 2B | Base pretrained (lightweight) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-2b-base) |
156
+ | **GreenVLA-5b-base** | Base | 5B | Base pretrained (recommended) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base) |
157
+ | **GreenVLA-5b-R1-bridge** | R1 | 5B | Fine-tuned on Bridge (WidowX) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-R1-bridge) |
158
+ | **GreenVLA-5b-R2-bridge** | R2 | 5B | RL-aligned on Bridge (WidowX) | You are here |
159
+ | **GreenVLA-5b-R1-fractal** | R1 | 5B | Fine-tuned on Fractal (Google Robot) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-R1-fractal) |
160
+
161
+ ## Citation
162
+
163
+ ```bibtex
164
+ @misc{greenvla,
165
+ title = {Green-VLA: Staged Vision-Language-Action Model for Generalist Robots},
166
+ author = {I. Apanasevich and M. Artemyev and R. Babakyan and P. Fedotova and
167
+ D. Grankin and E. Kupryashin and A. Misailidi and D. Nerus and
168
+ A. Nutalapati and G. Sidorov and I. Efremov and M. Gerasyov and
169
+ D. Pikurov and Y. Senchenko and S. Davidenko and D. Kulikov and
170
+ M. Sultankin and K. Askarbek and O. Shamanin and D. Statovoy and
171
+ E. Zalyaev and I. Zorin and A. Letkin and E. Rusakov and
172
+ A. Silchenko and V. Vorobyov and S. Sobolnikov and A. Postnikov},
173
+ year = {2026},
174
+ eprint = {2602.00919},
175
+ archivePrefix = {arXiv},
176
+ primaryClass = {cs.RO},
177
+ url = {https://arxiv.org/abs/2602.00919},
178
+ }
179
+ ```
180
+
181
+ <div align="center">
182
+
183
+ &copy; 2026 Sber Robotics Center &middot; Manipulation Team
184
+
185
+ </div>