Instructions to use arrow-hf/xvla-robotwin-stack-bowls-two-40pct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use arrow-hf/xvla-robotwin-stack-bowls-two-40pct with LeRobot:
- Notebooks
- Google Colab
- Kaggle
X-VLA fine-tuned on RoboTwin stack_bowls_two
X-VLA policy (879M params) fine-tuned on dual-arm bowl-stacking task in RoboTwin 2.0 simulator.
Training data
- Source: RoboTwin
stack_bowls_twotask,demo_cleanconfig. - Episodes: 300.
- Frames: ~94k @ effective 16.67 Hz.
- Images: native 240x320 (no offline resize; aspect-preserving letterbox via model's
resize_with_pad=[224,224]). - State / Action: 16-D dual-arm EEF, auto-padded to 20-D by X-VLA
action_mode=auto. - Language instruction: fixed
"stack the bowls"for all episodes.
Training config
- Batch size 16, 20000 steps, bf16, cosine warmup 1000 / decay 20000.
- Base:
lerobot/xvla-base(full fine-tune, VLM + transformer + soft prompts all unfrozen). - chunk_size=32, n_action_steps=32, num_denoising_steps=10.
- rename_map:
dual_cam_global -> image, cam_wrist_65 -> image2, cam_wrist_75 -> image3.
Evaluation (RoboTwin sim, max_steps=400, 10 episodes)
Success rate: 4/10 (40%) with task_text="stack the bowls" and --skip_resize.
| Episode | Result | Steps |
|---|---|---|
| 0 | FAIL | 400 (timeout) |
| 1 | FAIL | 400 |
| 2 | SUCCESS | 320 |
| 3 | FAIL | 400 |
| 4 | FAIL | 400 |
| 5 | FAIL | 400 |
| 6 | SUCCESS | 398 |
| 7 | SUCCESS | 259 |
| 8 | FAIL | 400 |
| 9 | SUCCESS | 340 |
Usage
from lerobot.policies.xvla.modeling_xvla import XVLAPolicy
policy = XVLAPolicy.from_pretrained("arrow-hf/xvla-robotwin-stack-bowls-two-40pct")
At inference, feed native-resolution images (e.g., 240x320 from RoboTwin D435) — the model's internal resize_with_pad handles target shape with letterbox.
- Downloads last month
- 52