X-VLA RoboTwin `stack_bowls_two` (300 ep, single instruction)

X-VLA policy fine-tuned on 300 demonstration episodes of the stack_bowls_two task from RoboTwin 2.0 (demo_clean config), starting from the lerobot/xvla-base base checkpoint (Florence2 backbone, 879M params).

Training

Config	Value
Base checkpoint	`lerobot/xvla-base`
Training data	300 RoboTwin demonstrations, single instruction `"stack the bowls"`
Batch size	32
Steps	40000
dtype	bfloat16
Optimizer	AdamW, lr=1e-4
Chunk size	32

Evaluation

RoboTwin 2.0 sim (demo_clean), 10 episodes, max_steps=400, action_chunk_exec=32.

Success rate: 4/10 (40%)

This task is challenging for X-VLA — even with 300ep and 40000 steps, success rate stays at 40%. The SmolVLA + RoboTwin pretrained base (50ep) outperforms this X-VLA model on the same task (70%).

Usage

from lerobot.policies.xvla import XVLAPolicy
policy = XVLAPolicy.from_pretrained("arrow-hf/xvla-robotwin-stack-bowls-two-300ep")

Important: Use action_chunk_exec=32 (full chunk). Default action_chunk_exec=16 causes 0% success on these tasks (TOPP replanning interference).

Downloads last month: 2

Safetensors

Model size

0.9B params

Tensor type

BF16

Video Preview

Robotics

Model tree for arrow-hf/xvla-robotwin-stack-bowls-two-300ep

Base model

lerobot/xvla-base

Finetuned

(21)

this model

X-VLA RoboTwin stack_bowls_two (300 ep, single instruction)

Training

Evaluation

Usage

Model tree for arrow-hf/xvla-robotwin-stack-bowls-two-300ep

X-VLA RoboTwin `stack_bowls_two` (300 ep, single instruction)