File size: 3,973 Bytes

45f76f1

---
license: apache-2.0
library_name: lerobot
base_model: lerobot/smolvla_base
pipeline_tag: robotics
tags:
  - lerobot
  - smolvla
  - robotics
  - so101
  - imitation-learning
  - isaaclab
  - sim
  - multi-task
  - code-as-policies
  - CoRL2026
datasets:
  - CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps
---

# smolVLA · IsaacLab SO101 Multi-Task (11 tasks, 8 epoch)

[lerobot/smolvla_base](https://huggingface.co/lerobot/smolvla_base) 를 IsaacLab 시뮬레이션 SO101 **11-task 멀티태스크** 데이터셋
[CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps)
으로 8 epoch 파인튜닝한 SmolVLA 정책.

이 체크포인트는 **full model** (`model.safetensors`) 입니다 — LoRA adapter 가 아니며, 그대로 로드해 사용합니다.

## Model details

- **Base model**: `lerobot/smolvla_base` (SmolVLM2-500M-Video-Instruct VLM + action expert)
- **Robot**: SO101 (6-DOF, gripper 포함) — IsaacLab 시뮬레이션
- **Cameras**: `top`, `left_wrist` (480×640) — 정책 키 `camera1`(left_wrist) / `camera2`(top) 로 rename
- **Inputs**: `observation.state`[6] + 카메라 2개 + language instruction (task)
- **Output**: `action`[6] (joint position)
- **Action chunking**: `chunk_size=50`, `n_action_steps=50`

## 학습 방식

**VLM frozen + action expert only** — SmolVLA 공식 표준 학습 방식 ([SmolVLA paper, arXiv:2506.01844](https://arxiv.org/abs/2506.01844)).

| 구성요소 | 상태 |
|---|---|
| VLM backbone (SmolVLM2) | ❄️ **완전 Frozen** (`freeze_vision_encoder=true`) |
| Action expert | 🔥 **학습** (`train_expert_only=true`) |
| PEFT / LoRA | 사용 안 함 |

## Training hyperparameters

| 항목 | 값 |
|---|---|
| Dataset | [Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps) — 3,300 episodes / 1,175,352 frames / 11 tasks / 10 fps |
| Epochs / Steps | 8 epoch / 36,800 steps |
| Global batch size | 256 (micro batch 128 × 2 GPU) |
| Optimizer | AdamW — lr `1e-4`, weight_decay `1e-10`, grad_clip_norm `10.0` |
| LR scheduler | cosine_decay_with_warmup — warmup 1,000 / decay 30,000 / peak_lr `1e-4` / decay_lr `2.5e-6` |
| chunk_size / n_action_steps | 50 / 50 |
| Seed | 1000 |
| Dataloader workers | 16 |
| Mixed precision | no (bf16 inference) |
| Image augmentation | ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter — **기하학적 변형(회전/이동/반전) 없음** (VLA 좌우 의미 보존) |
| Hardware | 2 × NVIDIA H100 80GB |
| Final loss | 0.020 |

## Camera rename

LeRobot dataset 의 카메라 키와 SmolVLA 정책 키 매핑:

| Dataset key | Policy key |
|---|---|
| `observation.images.left_wrist` | `observation.images.camera1` |
| `observation.images.top` | `observation.images.camera2` |

> 추론·평가 시 반드시 위와 동일한 rename 을 적용해야 합니다 (학습-추론 일관성).

## Input / Output 규정

- **Input**: `observation.state`[6] (joint position) + 카메라 2개 + language instruction(task) 만
- **Output**: `action`[6] (joint position) 만
- 데이터셋의 `ee_pos` / `gripper_binary` / `state.radian_urdf0` / `action.radian_urdf0` 는 학습에서 제외
- SmolVLA 정책은 카메라 슬롯이 3개(`camera1/2/3`)로 고정이라 `camera3` 슬롯이 config 에 존재하지만, 데이터셋 카메라는 2개뿐이라 실제로 데이터가 흐르는 카메라는 2개입니다.

## Usage

```python
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod")
```

## Citation / Acknowledgement

Built on top of [LeRobot](https://github.com/huggingface/lerobot) and the
[SmolVLA](https://huggingface.co/lerobot/smolvla_base) base checkpoint. Project: CoRL 2026 CSI submission.

### Framework versions

- LeRobot 0.5.2