SmolVLA β€” SO-101 Space Decluttering

SmolVLA policy fine-tuned on the SO-101 Space Decluttering Dataset v1 for language-conditioned pick-and-place decluttering tasks on a 6-DoF SO-101 robotic arm. Trained using LeRobot.

Training Details

  • Policy: SmolVLA (Vision-Language-Action)
  • Steps: 20,000
  • Robot: SO-101 6-DoF leader-follower
  • Cameras: Dual-view β€” fixed top-view + wrist-mounted egocentric
  • Framework: LeRobot
  • Language conditioning: Task descriptions passed as natural language instructions

Dataset

Trained on ShubhamK32/so101_declutter_v1 β€” a multi-view teleoperation dataset with spatial distractors injected to prevent visual shortcut learning.

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("ShubhamK32/smolvla_so101_declutter")

Camera Views

  • observation.images.topview β€” Fixed overhead. Better for unoccluded pick-place tasks.
  • observation.images.wristview β€” Egocentric wrist-mounted. Better for overlapping and cluttered scenes.

Related

Downloads last month
20
Video Preview
loading

Model tree for ShubhamK32/smolvla_so101_declutter

Dataset used to train ShubhamK32/smolvla_so101_declutter