lakomchik commited on
Commit
fd2aef1
·
verified ·
1 Parent(s): 3c4fd70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -40
README.md CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
4
  language:
5
  - en
6
  base_model:
7
- - SberRoboticsCenter/GreenVLA-5b-base-stride-1
8
  pipeline_tag: robotics
9
  tags:
10
  - robotics
@@ -14,33 +14,36 @@ tags:
14
  - flow-matching
15
  - action-prediction
16
  - green-vla
17
- - bridge
18
- - widowx
19
  datasets:
20
- - IPEC-COMMUNITY/bridge_orig_lerobot
21
  model-index:
22
- - name: GreenVLA-5b-stride-1-R1-bridge
23
  results:
24
  - task:
25
  type: robotics
26
- name: SimplerEnv WidowX (Bridge)
27
  dataset:
28
- type: IPEC-COMMUNITY/bridge_orig_lerobot
29
- name: Bridge
30
  metrics:
31
  - type: success_rate
32
- name: Partial Average
33
- value: 86.5
34
  - type: success_rate
35
- name: Entire Average
36
- value: 71.9
 
 
 
37
  ---
38
 
39
  <div align="center">
40
 
41
- # GreenVLA-5b-stride-1-R1-bridge
42
 
43
- ### Embodiment-Adapted VLA for Bridge (WidowX)
44
 
45
  **Sber Robotics Center &middot; Manipulation Team**
46
 
@@ -54,45 +57,45 @@ model-index:
54
 
55
  ## Overview
56
 
57
- **GreenVLA-5b-stride-1-R1-bridge** is the R1 (embodiment-adapted) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Bridge](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) dataset for the WidowX robot arm.
58
 
59
- Starting from the [GreenVLA-5b-base-stride-1](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-1) pretrained checkpoint, this model was adapted via supervised fine-tuning (R1 stage) to the Bridge embodiment, achieving strong manipulation performance on the SimplerEnv benchmark.
60
 
61
  ## Evaluation
62
 
63
- Evaluated on **SimplerEnv WidowX (Bridge)** benchmark with default episode length.
64
-
65
- > **Note:** Bridge benchmark results can vary up to ±6% between runs. We recommend averaging over multiple evaluation runs for reliable comparisons.
66
 
67
- ### Partial Success Rate
68
 
69
  | Task | Success Rate |
70
  |------|:---:|
71
- | Put Spoon on Towel | 87.5% |
72
- | Put Carrot on Plate | 83.3% |
73
- | Stack Blocks | 79.2% |
74
- | Put Eggplant in Basket | 95.8% |
75
- | **Average** | **86.5%** |
76
 
77
- ### Entire Success Rate
78
 
79
  | Task | Success Rate |
80
  |------|:---:|
81
- | Put Spoon on Towel | 79.2% |
82
- | Put Carrot on Plate | 70.8% |
83
- | Stack Blocks | 41.7% |
84
- | Put Eggplant in Basket | 95.8% |
85
- | **Average** | **71.9%** |
 
 
86
 
87
  ## Training
88
 
89
  | | Details |
90
  |---|---|
91
- | **Base checkpoint** | [GreenVLA-5b-base-stride-1](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-1) |
92
  | **Stage** | R1 — Embodiment-specific adaptation |
93
  | **Method** | Supervised fine-tuning |
94
- | **Dataset** | [IPEC-COMMUNITY/bridge_orig_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) |
95
- | **Robot** | WidowX (Bridge) |
96
  | **Parameters** | ~5B |
97
 
98
  ## Quick Start
@@ -118,16 +121,16 @@ from lerobot.common.utils.torch_observation import (
118
 
119
  # 1. Load policy and transforms.
120
  policy, input_transforms, output_transforms = load_pretrained_policy(
121
- "SberRoboticsCenter/GreenVLA-5b-stride-1-R1-bridge",
122
- data_config_name="bridge",
123
  )
124
  policy.to("cuda").eval()
125
 
126
  # 2. Build an observation (replace with real sensor data).
127
  raw_obs = {
128
- "observation/state": np.random.rand(8).astype(np.float32), # x y z roll pitch yaw _pad_ gripper
129
- "observation/image": np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8),
130
- "prompt": "pick up the green block and place it on the plate",
131
  }
132
 
133
  # 3. Transform, preprocess, and batch.
@@ -145,7 +148,9 @@ actions = output_transforms(
145
  # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
146
  ```
147
 
148
- See [`examples/example_inference_bridge.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_bridge.py) for the full runnable script with argument parsing.
 
 
149
 
150
  ## Citation
151
 
 
4
  language:
5
  - en
6
  base_model:
7
+ - SberRoboticsCenter/GreenVLA-5b-base-stride-4
8
  pipeline_tag: robotics
9
  tags:
10
  - robotics
 
14
  - flow-matching
15
  - action-prediction
16
  - green-vla
17
+ - fractal
18
+ - google-robot
19
  datasets:
20
+ - IPEC-COMMUNITY/fractal20220817_data_lerobot
21
  model-index:
22
+ - name: GreenVLA-5b-stride-4-R1-fractal
23
  results:
24
  - task:
25
  type: robotics
26
+ name: SimplerEnv Google Robot (Fractal)
27
  dataset:
28
+ type: IPEC-COMMUNITY/fractal20220817_data_lerobot
29
+ name: Fractal
30
  metrics:
31
  - type: success_rate
32
+ name: Matching Average
33
+ value: 77.0
34
  - type: success_rate
35
+ name: Variant Average
36
+ value: 66.7
37
+ - type: success_rate
38
+ name: Overall Average
39
+ value: 71.8
40
  ---
41
 
42
  <div align="center">
43
 
44
+ # GreenVLA-5b-stride-4-R1-fractal
45
 
46
+ ### Embodiment-Adapted VLA for Fractal (Google Robot)
47
 
48
  **Sber Robotics Center &middot; Manipulation Team**
49
 
 
57
 
58
  ## Overview
59
 
60
+ **GreenVLA-5b-stride-4-R1-fractal** is the R1 (embodiment-adapted) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Fractal](https://huggingface.co/datasets/IPEC-COMMUNITY/fractal20220817_data_lerobot) dataset for the Google Robot.
61
 
62
+ Starting from the [GreenVLA-5b-base-stride-4](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-4) pretrained checkpoint, this model was adapted via supervised fine-tuning (R1 stage) to the Fractal embodiment, achieving strong manipulation performance on the SimplerEnv benchmark.
63
 
64
  ## Evaluation
65
 
66
+ Evaluated on **SimplerEnv Google Robot (Fractal)** benchmark with default episode length:
 
 
67
 
68
+ ### Visual Matching
69
 
70
  | Task | Success Rate |
71
  |------|:---:|
72
+ | Coke Can | 85.7% |
73
+ | Move Near | 75.8% |
74
+ | Drawer | 64.8% |
75
+ | Apple in Drawer | 81.5% |
76
+ | **Average** | **77.0%** |
77
 
78
+ ### Variant Aggregation
79
 
80
  | Task | Success Rate |
81
  |------|:---:|
82
+ | Coke Can | 92.6% |
83
+ | Move Near | 71.9% |
84
+ | Drawer | 35.7% |
85
+ | Apple in Drawer | 66.7% |
86
+ | **Average** | **66.7%** |
87
+
88
+ ### Overall Average: **71.8%**
89
 
90
  ## Training
91
 
92
  | | Details |
93
  |---|---|
94
+ | **Base checkpoint** | [GreenVLA-5b-base-stride-4](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-4) |
95
  | **Stage** | R1 — Embodiment-specific adaptation |
96
  | **Method** | Supervised fine-tuning |
97
+ | **Dataset** | [IPEC-COMMUNITY/fractal20220817_data_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/fractal20220817_data_lerobot) |
98
+ | **Robot** | Google Robot (Fractal) |
99
  | **Parameters** | ~5B |
100
 
101
  ## Quick Start
 
121
 
122
  # 1. Load policy and transforms.
123
  policy, input_transforms, output_transforms = load_pretrained_policy(
124
+ "SberRoboticsCenter/GreenVLA-5b-stride-4-R1-fractal",
125
+ data_config_name="fractal",
126
  )
127
  policy.to("cuda").eval()
128
 
129
  # 2. Build an observation (replace with real sensor data).
130
  raw_obs = {
131
+ "observation/state": np.random.rand(8), # x, y, z, rx, ry, rz, rw, gripper
132
+ "observation/image": np.random.randint(256, size=(448, 448, 3), dtype=np.uint8),
133
+ "prompt": "move the coke can to the left of the table",
134
  }
135
 
136
  # 3. Transform, preprocess, and batch.
 
148
  # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
149
  ```
150
 
151
+ See [`examples/example_inference_fractal.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_fractal.py) for the full runnable script with argument parsing.
152
+
153
+ > **Note:** The Fractal embodiment uses an 8-dim proprioceptive state `[x, y, z, rx, ry, rz, rw, gripper]` and `data_config_name="fractal"` — this differs from Bridge which uses `data_config_name="bridge"` and a different state layout.
154
 
155
  ## Citation
156