--- license: apache-2.0 library_name: lerobot base_model: lerobot/smolvla_base pipeline_tag: robotics tags: - lerobot - smolvla - robotics - so101 - imitation-learning - isaaclab - sim - multi-task - code-as-policies - CoRL2026 datasets: - CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps --- # smolVLA · IsaacLab SO101 Multi-Task (11 tasks, 8 epoch) [lerobot/smolvla_base](https://huggingface.co/lerobot/smolvla_base) 를 IsaacLab 시뮬레이션 SO101 **11-task 멀티태스크** 데이터셋 [CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps) 으로 8 epoch 파인튜닝한 SmolVLA 정책. 이 체크포인트는 **full model** (`model.safetensors`) 입니다 — LoRA adapter 가 아니며, 그대로 로드해 사용합니다. ## Model details - **Base model**: `lerobot/smolvla_base` (SmolVLM2-500M-Video-Instruct VLM + action expert) - **Robot**: SO101 (6-DOF, gripper 포함) — IsaacLab 시뮬레이션 - **Cameras**: `top`, `left_wrist` (480×640) — 정책 키 `camera1`(left_wrist) / `camera2`(top) 로 rename - **Inputs**: `observation.state`[6] + 카메라 2개 + language instruction (task) - **Output**: `action`[6] (joint position) - **Action chunking**: `chunk_size=50`, `n_action_steps=50` ## 학습 방식 **VLM frozen + action expert only** — SmolVLA 공식 표준 학습 방식 ([SmolVLA paper, arXiv:2506.01844](https://arxiv.org/abs/2506.01844)). | 구성요소 | 상태 | |---|---| | VLM backbone (SmolVLM2) | ❄️ **완전 Frozen** (`freeze_vision_encoder=true`) | | Action expert | 🔥 **학습** (`train_expert_only=true`) | | PEFT / LoRA | 사용 안 함 | ## Training hyperparameters | 항목 | 값 | |---|---| | Dataset | [Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps) — 3,300 episodes / 1,175,352 frames / 11 tasks / 10 fps | | Epochs / Steps | 8 epoch / 36,800 steps | | Global batch size | 256 (micro batch 128 × 2 GPU) | | Optimizer | AdamW — lr `1e-4`, weight_decay `1e-10`, grad_clip_norm `10.0` | | LR scheduler | cosine_decay_with_warmup — warmup 1,000 / decay 30,000 / peak_lr `1e-4` / decay_lr `2.5e-6` | | chunk_size / n_action_steps | 50 / 50 | | Seed | 1000 | | Dataloader workers | 16 | | Mixed precision | no (bf16 inference) | | Image augmentation | ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter — **기하학적 변형(회전/이동/반전) 없음** (VLA 좌우 의미 보존) | | Hardware | 2 × NVIDIA H100 80GB | | Final loss | 0.020 | ## Camera rename LeRobot dataset 의 카메라 키와 SmolVLA 정책 키 매핑: | Dataset key | Policy key | |---|---| | `observation.images.left_wrist` | `observation.images.camera1` | | `observation.images.top` | `observation.images.camera2` | > 추론·평가 시 반드시 위와 동일한 rename 을 적용해야 합니다 (학습-추론 일관성). ## Input / Output 규정 - **Input**: `observation.state`[6] (joint position) + 카메라 2개 + language instruction(task) 만 - **Output**: `action`[6] (joint position) 만 - 데이터셋의 `ee_pos` / `gripper_binary` / `state.radian_urdf0` / `action.radian_urdf0` 는 학습에서 제외 - SmolVLA 정책은 카메라 슬롯이 3개(`camera1/2/3`)로 고정이라 `camera3` 슬롯이 config 에 존재하지만, 데이터셋 카메라는 2개뿐이라 실제로 데이터가 흐르는 카메라는 2개입니다. ## Usage ```python from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod") ``` ## Citation / Acknowledgement Built on top of [LeRobot](https://github.com/huggingface/lerobot) and the [SmolVLA](https://huggingface.co/lerobot/smolvla_base) base checkpoint. Project: CoRL 2026 CSI submission. ### Framework versions - LeRobot 0.5.2