X-VLA RoboTwin stack_bowls_two (300 ep, single instruction)

X-VLA policy fine-tuned on 300 demonstration episodes of the stack_bowls_two task from RoboTwin 2.0 (demo_clean config), starting from the lerobot/xvla-base base checkpoint (Florence2 backbone, 879M params).

Training

Config Value
Base checkpoint lerobot/xvla-base
Training data 300 RoboTwin demonstrations, single instruction "stack the bowls"
Batch size 32
Steps 40000
dtype bfloat16
Optimizer AdamW, lr=1e-4
Chunk size 32

Evaluation

RoboTwin 2.0 sim (demo_clean), 10 episodes, max_steps=400, action_chunk_exec=32.

Success rate: 4/10 (40%)

This task is challenging for X-VLA — even with 300ep and 40000 steps, success rate stays at 40%. The SmolVLA + RoboTwin pretrained base (50ep) outperforms this X-VLA model on the same task (70%).

Usage

from lerobot.policies.xvla import XVLAPolicy
policy = XVLAPolicy.from_pretrained("arrow-hf/xvla-robotwin-stack-bowls-two-300ep")

Important: Use action_chunk_exec=32 (full chunk). Default action_chunk_exec=16 causes 0% success on these tasks (TOPP replanning interference).

Downloads last month
16
Safetensors
Model size
0.9B params
Tensor type
BF16
·
Video Preview
loading

Model tree for arrow-hf/xvla-robotwin-stack-bowls-two-300ep

Finetuned
(7)
this model