| | --- |
| | library_name: lerobot |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - SberRoboticsCenter/GreenVLA-5b-base-stride-1 |
| | pipeline_tag: robotics |
| | tags: |
| | - robotics |
| | - vla |
| | - vision-language-action |
| | - manipulation |
| | - flow-matching |
| | - action-prediction |
| | - green-vla |
| | - bridge |
| | - widowx |
| | datasets: |
| | - IPEC-COMMUNITY/bridge_orig_lerobot |
| | model-index: |
| | - name: GreenVLA-5b-stride-1-R1-bridge |
| | results: |
| | - task: |
| | type: robotics |
| | name: SimplerEnv WidowX (Bridge) |
| | dataset: |
| | type: IPEC-COMMUNITY/bridge_orig_lerobot |
| | name: Bridge |
| | metrics: |
| | - type: success_rate |
| | name: Partial Average |
| | value: 89.6 |
| | - type: success_rate |
| | name: Entire Average |
| | value: 72.9 |
| | --- |
| | |
| | <div align="center"> |
| |
|
| | # GreenVLA-5b-stride-1-R1-bridge |
| |
|
| | ### Embodiment-Adapted VLA for Bridge (WidowX) |
| |
|
| | **Sber Robotics Center · Manipulation Team** |
| |
|
| | [](https://arxiv.org/abs/2602.00919) |
| | [](https://greenvla.github.io/) |
| | [](https://github.com/greenvla/GreenVLA) |
| |
|
| | </div> |
| |
|
| | --- |
| |
|
| | ## Overview |
| |
|
| | **GreenVLA-5b-stride-1-R1-bridge** is the R1 (embodiment-adapted) checkpoint of the [Green-VLA](https://arxiv.org/abs/2602.00919) family, fine-tuned on the [Bridge](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) dataset for the WidowX robot arm. |
| |
|
| | Starting from the [GreenVLA-5b-base-stride-1](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-1) pretrained checkpoint, this model was adapted via supervised fine-tuning (R1 stage) to the Bridge embodiment, achieving strong manipulation performance on the SimplerEnv benchmark. |
| |
|
| | ## Evaluation |
| |
|
| | Evaluated on **SimplerEnv WidowX (Bridge)** benchmark with default episode length. |
| |
|
| | > **Note:** Bridge benchmark results can vary up to ±6% between runs. We recommend averaging over multiple evaluation runs for reliable comparisons. |
| |
|
| | ### Partial Success Rate |
| |
|
| | | Task | Success Rate | |
| | |------|:---:| |
| | | Put Spoon on Towel | 91.7% | |
| | | Put Carrot on Plate | 75.0% | |
| | | Stack Blocks | 91.7% | |
| | | Put Eggplant in Basket | 100.0% | |
| | | **Average** | **89.6%** | |
| |
|
| | ### Entire Success Rate |
| |
|
| | | Task | Success Rate | |
| | |------|:---:| |
| | | Put Spoon on Towel | 79.2% | |
| | | Put Carrot on Plate | 62.5% | |
| | | Stack Blocks | 58.3% | |
| | | Put Eggplant in Basket | 91.7% | |
| | | **Average** | **72.9%** | |
| |
|
| | ## Training |
| |
|
| | | | Details | |
| | |---|---| |
| | | **Base checkpoint** | [GreenVLA-5b-base-stride-1](https://huggingface.co/SberRoboticsCenter/GreenVLA-5b-base-stride-1) | |
| | | **Stage** | R1 — Embodiment-specific adaptation | |
| | | **Method** | Supervised fine-tuning | |
| | | **Dataset** | [IPEC-COMMUNITY/bridge_orig_lerobot](https://huggingface.co/datasets/IPEC-COMMUNITY/bridge_orig_lerobot) | |
| | | **Robot** | WidowX (Bridge) | |
| | | **Parameters** | ~5B | |
| |
|
| | ## Quick Start |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | git clone https://github.com/greenvla/GreenVLA.git |
| | cd GreenVLA |
| | uv sync # or: pip install -e . |
| | ``` |
| |
|
| | ### Inference |
| |
|
| | ```python |
| | import numpy as np |
| | import torch |
| | from lerobot.common.policies.factory import load_pretrained_policy |
| | from lerobot.common.utils.torch_observation import ( |
| | move_dict_to_batch_for_inference, |
| | torch_preprocess_dict_inference, |
| | ) |
| | |
| | # 1. Load policy and transforms. |
| | policy, input_transforms, output_transforms = load_pretrained_policy( |
| | "SberRoboticsCenter/GreenVLA-5b-stride-1-R1-bridge", |
| | data_config_name="bridge", |
| | ) |
| | policy.to("cuda").eval() |
| | |
| | # 2. Build an observation (replace with real sensor data). |
| | raw_obs = { |
| | "observation/state": np.random.rand(8).astype(np.float32), # x y z roll pitch yaw _pad_ gripper |
| | "observation/image": np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8), |
| | "prompt": "pick up the green block and place it on the plate", |
| | } |
| | |
| | # 3. Transform, preprocess, and batch. |
| | obs = input_transforms(raw_obs) |
| | obs = torch_preprocess_dict_inference(obs) |
| | batch = move_dict_to_batch_for_inference(obs, device="cuda") |
| | |
| | # 4. Predict actions and post-process. |
| | with torch.inference_mode(): |
| | raw_actions = policy.select_action(batch).cpu().numpy() |
| | |
| | actions = output_transforms( |
| | {"actions": raw_actions, "state": batch["state"].cpu().numpy()} |
| | )["actions"] |
| | # actions shape: (action_horizon, 7) — [x, y, z, roll, pitch, yaw, gripper] |
| | ``` |
| |
|
| | See [`examples/example_inference_bridge.py`](https://github.com/greenvla/GreenVLA/blob/main/examples/example_inference_bridge.py) for the full runnable script with argument parsing. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{apanasevich2026greenvlastagedvisionlanguageactionmodel, |
| | title = {Green-VLA: Staged Vision-Language-Action Model for Generalist Robots}, |
| | author = {I. Apanasevich and M. Artemyev and R. Babakyan and P. Fedotova and |
| | D. Grankin and E. Kupryashin and A. Misailidi and D. Nerus and |
| | A. Nutalapati and G. Sidorov and I. Efremov and M. Gerasyov and |
| | D. Pikurov and Y. Senchenko and S. Davidenko and D. Kulikov and |
| | M. Sultankin and K. Askarbek and O. Shamanin and D. Statovoy and |
| | E. Zalyaev and I. Zorin and A. Letkin and E. Rusakov and |
| | A. Silchenko and V. Vorobyov and S. Sobolnikov and A. Postnikov}, |
| | year = {2026}, |
| | eprint = {2602.00919}, |
| | archivePrefix = {arXiv}, |
| | primaryClass = {cs.RO}, |
| | url = {https://arxiv.org/abs/2602.00919}, |
| | } |
| | ``` |
| |
|
| | <div align="center"> |
| |
|
| | © 2026 Sber Robotics Center · Manipulation Team |
| |
|
| | </div> |
| |
|