smolVLA · IsaacLab SO101 Pick & Place (single-task, 50 epoch)

lerobot/smolvla_base 를 IsaacLab 시뮬레이션 SO101 pick & place 단일 task 데이터셋 CoRL2026-CSI/IsaacLab-SO101_pick_place_baseCaP_100epi_10fps 으로 50 epoch 파인튜닝한 SmolVLA 정책.

이 체크포인트는 full model (model.safetensors) 입니다 — LoRA adapter 가 아니며, 그대로 로드해 사용합니다.

Model details

  • Base model: lerobot/smolvla_base (SmolVLM2-500M-Video-Instruct VLM + action expert)
  • Robot: SO101 (6-DOF, gripper 포함) — IsaacLab 시뮬레이션
  • Cameras: top, left_wrist (480×640) — 정책 키 camera1(left_wrist) / camera2(top) 로 rename
  • Inputs: observation.state[6] + 카메라 2개 + language instruction (task)
  • Output: action[6] (joint position)
  • Action chunking: chunk_size=50, n_action_steps=50

학습 방식

VLM frozen + action expert only — SmolVLA 공식 표준 학습 방식 (SmolVLA paper, arXiv:2506.01844).

구성요소 상태
VLM backbone (SmolVLM2) ❄️ 완전 Frozen (freeze_vision_encoder=true)
Action expert 🔥 학습 (train_expert_only=true)
PEFT / LoRA 사용 안 함

Training hyperparameters

항목
Dataset IsaacLab-SO101_pick_place_baseCaP_100epi_10fps — 100 episodes / 34,264 frames / 10 fps
Epochs / Steps 50 epoch / 6,700 steps
Global batch size 256 (micro batch 128 × 2 GPU)
Optimizer AdamW — lr 1e-4, weight_decay 1e-10, grad_clip_norm 10.0
LR scheduler cosine_decay_with_warmup — warmup 1,000 / decay 30,000 / peak_lr 1e-4 / decay_lr 2.5e-6
chunk_size / n_action_steps 50 / 50
Seed 1000
Dataloader workers 16
Mixed precision no (bf16 inference)
Image augmentation ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter — 기하학적 변형(회전/이동/반전) 없음 (VLA 좌우 의미 보존)
Hardware 2 × NVIDIA H100 80GB
Final loss 0.013

Camera rename

LeRobot dataset 의 카메라 키와 SmolVLA 정책 키 매핑:

Dataset key Policy key
observation.images.left_wrist observation.images.camera1
observation.images.top observation.images.camera2

Input / Output 규정

  • Input: observation.state[6] (joint position) + 카메라 2개 + language instruction(task) 만
  • Output: action[6] (joint position) 만
  • 데이터셋의 ee_pos / gripper_binary / state.radian_urdf0 / action.radian_urdf0 는 학습에서 제외
  • SmolVLA 정책은 카메라 슬롯이 3개(camera1/2/3)로 고정이라 camera3 슬롯이 config 에 존재하지만, 데이터셋 카메라는 2개뿐이라 실제로 데이터가 흐르는 카메라는 2개입니다.

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-IsaacLab-picknplace-50epoch")

Citation / Acknowledgement

Built on top of LeRobot and the SmolVLA base checkpoint. Project: CoRL 2026 CSI submission.

Framework versions

  • LeRobot 0.5.2
Downloads last month
1
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for Cache-SCA/smolVLA-IsaacLab-picknplace-50epoch

Finetuned
(6508)
this model

Dataset used to train Cache-SCA/smolVLA-IsaacLab-picknplace-50epoch

Paper for Cache-SCA/smolVLA-IsaacLab-picknplace-50epoch