Update README.md

Browse files

Files changed (1) hide show

README.md +45 -40

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ license: apache-2.0
 language:
   - en
 base_model:
-  - SberRoboticsCenter/GreenVLA-5b-base-stride-1
 pipeline_tag: robotics
 tags:
   - robotics
@@ -14,33 +14,36 @@ tags:
   - flow-matching
   - action-prediction
   - green-vla
-  - bridge
-  - widowx
 datasets:
-  - IPEC-COMMUNITY/bridge_orig_lerobot
 model-index:
-  - name: GreenVLA-5b-stride-1-R1-bridge
     results:
       - task:
           type: robotics
-          name: SimplerEnv WidowX (Bridge)
         dataset:
-          type: IPEC-COMMUNITY/bridge_orig_lerobot
-          name: Bridge
         metrics:
           - type: success_rate
-            name: Partial Average
-            value: 86.5
           - type: success_rate
-            name: Entire Average
-            value: 71.9
 ---
 <div align="center">
-# GreenVLA-5b-stride-1-R1-bridge
-### Embodiment-Adapted VLA for Bridge (WidowX)
 **Sber Robotics Center &middot; Manipulation Team**
@@ -54,45 +57,45 @@ model-index:
 ## Overview
-**GreenVLA-5b-stride-1-R1-bridge** is the R1 (embodiment-adapted) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Bridge](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) dataset for the WidowX robot arm.
-Starting from the [GreenVLA-5b-base-stride-1](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-1) pretrained checkpoint, this model was adapted via supervised fine-tuning (R1 stage) to the Bridge embodiment, achieving strong manipulation performance on the SimplerEnv benchmark.
 ## Evaluation
-Evaluated on **SimplerEnv WidowX (Bridge)** benchmark with default episode length.
-> **Note:** Bridge benchmark results can vary up to ±6% between runs. We recommend averaging over multiple evaluation runs for reliable comparisons.
-### Partial Success Rate
 | Task | Success Rate |
 |------|:---:|
-| Put Spoon on Towel | 87.5% |
-| Put Carrot on Plate | 83.3% |
-| Stack Blocks | 79.2% |
-| Put Eggplant in Basket | 95.8% |
-| **Average** | **86.5%** |
-### Entire Success Rate
 | Task | Success Rate |
 |------|:---:|
-| Put Spoon on Towel | 79.2% |
-| Put Carrot on Plate | 70.8% |
-| Stack Blocks | 41.7% |
-| Put Eggplant in Basket | 95.8% |
-| **Average** | **71.9%** |
 ## Training
 | | Details |
 |---|---|
-| **Base checkpoint** | [GreenVLA-5b-base-stride-1](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-1) |
 | **Stage** | R1 — Embodiment-specific adaptation |
 | **Method** | Supervised fine-tuning |
-| **Dataset** | [IPEC-COMMUNITY/bridge_orig_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) |
-| **Robot** | WidowX (Bridge) |
 | **Parameters** | ~5B |
 ## Quick Start
@@ -118,16 +121,16 @@ from lerobot.common.utils.torch_observation import (
 # 1. Load policy and transforms.
 policy, input_transforms, output_transforms = load_pretrained_policy(
-    "SberRoboticsCenter/GreenVLA-5b-stride-1-R1-bridge",
-    data_config_name="bridge",
 )
 policy.to("cuda").eval()
 # 2. Build an observation (replace with real sensor data).
 raw_obs = {
-    "observation/state": np.random.rand(8).astype(np.float32),  # x y z roll pitch yaw _pad_ gripper
-    "observation/image": np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8),
-    "prompt": "pick up the green block and place it on the plate",
 }
 # 3. Transform, preprocess, and batch.
@@ -145,7 +148,9 @@ actions = output_transforms(
 # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
 ```
-See [`examples/example_inference_bridge.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_bridge.py) for the full runnable script with argument parsing.
 ## Citation

 language:
   - en
 base_model:
+  - SberRoboticsCenter/GreenVLA-5b-base-stride-4
 pipeline_tag: robotics
 tags:
   - robotics
   - flow-matching
   - action-prediction
   - green-vla
+  - fractal
+  - google-robot
 datasets:
+  - IPEC-COMMUNITY/fractal20220817_data_lerobot
 model-index:
+  - name: GreenVLA-5b-stride-4-R1-fractal
     results:
       - task:
           type: robotics
+          name: SimplerEnv Google Robot (Fractal)
         dataset:
+          type: IPEC-COMMUNITY/fractal20220817_data_lerobot
+          name: Fractal
         metrics:
           - type: success_rate
+            name: Matching Average
+            value: 77.0
           - type: success_rate
+            name: Variant Average
+            value: 66.7
+          - type: success_rate
+            name: Overall Average
+            value: 71.8
 ---
 <div align="center">
+# GreenVLA-5b-stride-4-R1-fractal
+### Embodiment-Adapted VLA for Fractal (Google Robot)
 **Sber Robotics Center &middot; Manipulation Team**
 ## Overview
+**GreenVLA-5b-stride-4-R1-fractal** is the R1 (embodiment-adapted) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Fractal](https://huggingface.co/datasets/IPEC-COMMUNITY/fractal20220817_data_lerobot) dataset for the Google Robot.
+Starting from the [GreenVLA-5b-base-stride-4](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-4) pretrained checkpoint, this model was adapted via supervised fine-tuning (R1 stage) to the Fractal embodiment, achieving strong manipulation performance on the SimplerEnv benchmark.
 ## Evaluation
+Evaluated on **SimplerEnv Google Robot (Fractal)** benchmark with default episode length:
+### Visual Matching
 | Task | Success Rate |
 |------|:---:|
+| Coke Can | 85.7% |
+| Move Near | 75.8% |
+| Drawer | 64.8% |
+| Apple in Drawer | 81.5% |
+| **Average** | **77.0%** |
+### Variant Aggregation
 | Task | Success Rate |
 |------|:---:|
+| Coke Can | 92.6% |
+| Move Near | 71.9% |
+| Drawer | 35.7% |
+| Apple in Drawer | 66.7% |
+| **Average** | **66.7%** |
+### Overall Average: **71.8%**
 ## Training
 | | Details |
 |---|---|
+| **Base checkpoint** | [GreenVLA-5b-base-stride-4](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-4) |
 | **Stage** | R1 — Embodiment-specific adaptation |
 | **Method** | Supervised fine-tuning |
+| **Dataset** | [IPEC-COMMUNITY/fractal20220817_data_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/fractal20220817_data_lerobot) |
+| **Robot** | Google Robot (Fractal) |
 | **Parameters** | ~5B |
 ## Quick Start
 # 1. Load policy and transforms.
 policy, input_transforms, output_transforms = load_pretrained_policy(
+    "SberRoboticsCenter/GreenVLA-5b-stride-4-R1-fractal",
+    data_config_name="fractal",
 )
 policy.to("cuda").eval()
 # 2. Build an observation (replace with real sensor data).
 raw_obs = {
+    "observation/state": np.random.rand(8),  # x, y, z, rx, ry, rz, rw, gripper
+    "observation/image": np.random.randint(256, size=(448, 448, 3), dtype=np.uint8),
+    "prompt": "move the coke can to the left of the table",
 }
 # 3. Transform, preprocess, and batch.
 # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper]
 ```
+See [`examples/example_inference_fractal.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_fractal.py) for the full runnable script with argument parsing.
+> **Note:** The Fractal embodiment uses an 8-dim proprioceptive state `[x, y, z, rx, ry, rz, rw, gripper]` and `data_config_name="fractal"` — this differs from Bridge which uses `data_config_name="bridge"` and a different state layout.
 ## Citation