lakomchik commited on
Commit
c2dfb89
·
0 Parent(s):

Initial release

Browse files
Files changed (1) hide show
  1. README.md +182 -0
README.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: lerobot
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ base_model:
7
+ - SberRoboticsCenter/GreenVLA-5b-base
8
+ pipeline_tag: robotics
9
+ tags:
10
+ - robotics
11
+ - vla
12
+ - vision-language-action
13
+ - manipulation
14
+ - flow-matching
15
+ - action-prediction
16
+ - green-vla
17
+ - bridge
18
+ - widowx
19
+ datasets:
20
+ - IPEC-COMMUNITY/bridge_orig_lerobot
21
+ model-index:
22
+ - name: GreenVLA-5b-R1-bridge
23
+ results:
24
+ - task:
25
+ type: robotics
26
+ name: SimplerEnv WidowX (Bridge)
27
+ dataset:
28
+ type: IPEC-COMMUNITY/bridge_orig_lerobot
29
+ name: Bridge
30
+ metrics:
31
+ - type: success_rate
32
+ name: Partial Average
33
+ value: 86.5
34
+ - type: success_rate
35
+ name: Entire Average
36
+ value: 71.9
37
+ ---
38
+
39
+ <div align="center">
40
+
41
+ # GreenVLA-5b-R1-bridge
42
+
43
+ ### Embodiment-Adapted VLA for Bridge (WidowX)
44
+
45
+ **Sber Robotics Center &middot; Manipulation Team**
46
+
47
+ [![arXiv](https://img.shields.io/badge/arXiv-2602.00919-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.00919)
48
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=github&logoColor=white)](https://greenvla.github.io/)
49
+ [![Code](https://img.shields.io/badge/Code-GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/greenvla/GreenVLA)
50
+
51
+ </div>
52
+
53
+ ---
54
+
55
+ ## Overview
56
+
57
+ **GreenVLA-5b-R1-bridge** is the R1 (embodiment-adapted) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Bridge](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) dataset for the WidowX robot arm.
58
+
59
+ Starting from the [GreenVLA-5b-base](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base) pretrained checkpoint, this model was adapted via supervised fine-tuning (R1 stage) to the Bridge embodiment, achieving strong manipulation performance on the SimplerEnv benchmark.
60
+
61
+ ## Evaluation
62
+
63
+ Evaluated on **SimplerEnv WidowX (Bridge)** benchmark with default episode length:
64
+
65
+ ### Partial Success Rate
66
+
67
+ | Task | Success Rate |
68
+ |------|:---:|
69
+ | Put Spoon on Towel | 87.5% |
70
+ | Put Carrot on Plate | 83.3% |
71
+ | Stack Blocks | 79.2% |
72
+ | Put Eggplant in Basket | 95.8% |
73
+ | **Average** | **86.5%** |
74
+
75
+ ### Entire Success Rate
76
+
77
+ | Task | Success Rate |
78
+ |------|:---:|
79
+ | Put Spoon on Towel | 79.2% |
80
+ | Put Carrot on Plate | 70.8% |
81
+ | Stack Blocks | 41.7% |
82
+ | Put Eggplant in Basket | 95.8% |
83
+ | **Average** | **71.9%** |
84
+
85
+ ## Training
86
+
87
+ | | Details |
88
+ |---|---|
89
+ | **Base checkpoint** | [GreenVLA-5b-base](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base) |
90
+ | **Stage** | R1 — Embodiment-specific adaptation |
91
+ | **Method** | Supervised fine-tuning |
92
+ | **Dataset** | [IPEC-COMMUNITY/bridge_orig_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) |
93
+ | **Robot** | WidowX (Bridge) |
94
+ | **Parameters** | ~5B |
95
+
96
+ ## Quick Start
97
+
98
+ ### Installation
99
+
100
+ ```bash
101
+ git clone https://github.com/greenvla/GreenVLA.git
102
+ cd GreenVLA
103
+ uv sync # or: pip install -e .
104
+ ```
105
+
106
+ ### Inference
107
+
108
+ ```python
109
+ import numpy as np
110
+ import torch
111
+ from lerobot.common.policies.factory import load_pretrained_policy
112
+ from lerobot.common.utils.torch_observation import (
113
+ move_dict_to_batch_for_inference,
114
+ torch_preprocess_dict_inference,
115
+ )
116
+
117
+ # 1. Load policy and transforms.
118
+ policy, input_transforms, output_transforms = load_pretrained_policy(
119
+ "SberRoboticsCenter/GreenVLA-5b-R1-bridge",
120
+ data_config_name="bridge",
121
+ )
122
+ policy.to("cuda").eval()
123
+
124
+ # 2. Build an observation (replace with real sensor data).
125
+ raw_obs = {
126
+ "observation/state": np.random.rand(8).astype(np.float32), # x y z roll pitch yaw _pad_ gripper
127
+ "observation/image": np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8),
128
+ "prompt": "pick up the green block and place it on the plate",
129
+ }
130
+
131
+ # 3. Transform, preprocess, and batch.
132
+ obs = input_transforms(raw_obs)
133
+ obs = torch_preprocess_dict_inference(obs)
134
+ batch = move_dict_to_batch_for_inference(obs, device="cuda")
135
+
136
+ # 4. Predict actions and post-process.
137
+ with torch.inference_mode():
138
+ raw_actions = policy.select_action(batch).cpu().numpy()
139
+
140
+ actions = output_transforms(
141
+ {"actions": raw_actions, "state": batch["state"].cpu().numpy()}
142
+ )["actions"]
143
+ # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
144
+ ```
145
+
146
+ See [`examples/example_inference_bridge.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_bridge.py) for the full runnable script with argument parsing.
147
+
148
+ ## Model Family
149
+
150
+ | Model | Stage | Params | Description | Link |
151
+ |-------|:-----:|:------:|-------------|:----:|
152
+ | **GreenVLA-2b-base** | Base | 2B | Base pretrained (lightweight) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-2b-base) |
153
+ | **GreenVLA-5b-base** | Base | 5B | Base pretrained (recommended) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base) |
154
+ | **GreenVLA-5b-R1-bridge** | R1 | 5B | Fine-tuned on Bridge (WidowX) | You are here |
155
+ | **GreenVLA-5b-R2-bridge** | R2 | 5B | RL-aligned on Bridge (WidowX) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-R2-bridge) |
156
+ | **GreenVLA-5b-R1-fractal** | R1 | 5B | Fine-tuned on Fractal (Google Robot) | [Hub](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-R1-fractal) |
157
+
158
+ ## Citation
159
+
160
+ ```bibtex
161
+ @misc{greenvla,
162
+ title = {Green-VLA: Staged Vision-Language-Action Model for Generalist Robots},
163
+ author = {I. Apanasevich and M. Artemyev and R. Babakyan and P. Fedotova and
164
+ D. Grankin and E. Kupryashin and A. Misailidi and D. Nerus and
165
+ A. Nutalapati and G. Sidorov and I. Efremov and M. Gerasyov and
166
+ D. Pikurov and Y. Senchenko and S. Davidenko and D. Kulikov and
167
+ M. Sultankin and K. Askarbek and O. Shamanin and D. Statovoy and
168
+ E. Zalyaev and I. Zorin and A. Letkin and E. Rusakov and
169
+ A. Silchenko and V. Vorobyov and S. Sobolnikov and A. Postnikov},
170
+ year = {2026},
171
+ eprint = {2602.00919},
172
+ archivePrefix = {arXiv},
173
+ primaryClass = {cs.RO},
174
+ url = {https://arxiv.org/abs/2602.00919},
175
+ }
176
+ ```
177
+
178
+ <div align="center">
179
+
180
+ &copy; 2026 Sber Robotics Center &middot; Manipulation Team
181
+
182
+ </div>