SmolVLA RoboTwin stack_bowls_two (50 ep, single instruction)

SmolVLA policy fine-tuned on 50 demonstration episodes of the stack_bowls_two task from RoboTwin 2.0 (demo_clean config), starting from the lerobot/smolvla_robotwin base checkpoint.

Task

  • Robot: Agilex dual-arm, end-effector control (16D state, 16D action)
  • Cameras: 3 RGB streams — dual_cam_global, cam_wrist_65, cam_wrist_75 (240×320, D435)
  • Control rate: 10 Hz (LeRobot metadata; underlying RoboTwin sim ≈ 30 Hz, same source for train/eval)
  • Fixed instruction: "stack the bowls" (Strategy A: single instruction, not multi)

Training

Config Value
Base checkpoint lerobot/smolvla_robotwin
Training data 50 RoboTwin demonstrations
Batch size 32
Steps 6000 (~10-25 epochs)
Optimizer AdamW, lr=1e-4
Scheduler Cosine, warmup=300, decay=6000
Chunk size 50

Evaluation

Evaluated in RoboTwin 2.0 simulator (demo_clean config), 10 episodes, max_steps=400, action_chunk_exec=50.

Success rate: 7/10 (70%)

Usage

from lerobot.policies.smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("arrow-hf/smolvla-robotwin-stack-bowls-two-50ep")

See LeRobot documentation for inference setup.

Citation

Built on SmolVLA and SmolVLA-RoboTwin pretrained base, fine-tuned on data collected from RoboTwin 2.0.

Downloads last month
16
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for arrow-hf/smolvla-robotwin-stack-bowls-two-50ep

Finetuned
(17)
this model