SmolVLA β SO-101 Space Decluttering
SmolVLA policy fine-tuned on the SO-101 Space Decluttering Dataset v1 for language-conditioned pick-and-place decluttering tasks on a 6-DoF SO-101 robotic arm. Trained using LeRobot.
Training Details
- Policy: SmolVLA (Vision-Language-Action)
- Steps: 20,000
- Robot: SO-101 6-DoF leader-follower
- Cameras: Dual-view β fixed top-view + wrist-mounted egocentric
- Framework: LeRobot
- Language conditioning: Task descriptions passed as natural language instructions
Dataset
Trained on ShubhamK32/so101_declutter_v1 β a multi-view teleoperation dataset with spatial distractors injected to prevent visual shortcut learning.
Usage
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("ShubhamK32/smolvla_so101_declutter")
Camera Views
observation.images.topviewβ Fixed overhead. Better for unoccluded pick-place tasks.observation.images.wristviewβ Egocentric wrist-mounted. Better for overlapping and cluttered scenes.
Related
- Dataset: ShubhamK32/so101_declutter_v1
- ACT checkpoint: ShubhamK32/act_so101_declutter
- Downloads last month
- 20
Model tree for ShubhamK32/smolvla_so101_declutter
Base model
HuggingFaceTB/SmolLM2-1.7B Quantized
HuggingFaceTB/SmolLM2-1.7B-Instruct Quantized
HuggingFaceTB/SmolVLM-Instruct