Upload folder using huggingface_hub

45f76f1 verified 10 days ago

3.97 kB

	---
	license: apache-2.0
	library_name: lerobot
	base_model: lerobot/smolvla_base
	pipeline_tag: robotics
	tags:
	- lerobot
	- smolvla
	- robotics
	- so101
	- imitation-learning
	- isaaclab
	- sim
	- multi-task
	- code-as-policies
	- CoRL2026
	datasets:
	- CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps
	---

	# smolVLA · IsaacLab SO101 Multi-Task (11 tasks, 8 epoch)

	[lerobot/smolvla_base](https://huggingface.co/lerobot/smolvla_base) 를 IsaacLab 시뮬레이션 SO101 11-task 멀티태스크 데이터셋
	[CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps)
	으로 8 epoch 파인튜닝한 SmolVLA 정책.

	이 체크포인트는 full model (`model.safetensors`) 입니다 — LoRA adapter 가 아니며, 그대로 로드해 사용합니다.

	## Model details

	- Base model: `lerobot/smolvla_base` (SmolVLM2-500M-Video-Instruct VLM + action expert)
	- Robot: SO101 (6-DOF, gripper 포함) — IsaacLab 시뮬레이션
	- Cameras: `top`, `left_wrist` (480×640) — 정책 키 `camera1`(left_wrist) / `camera2`(top) 로 rename
	- Inputs: `observation.state`[6] + 카메라 2개 + language instruction (task)
	- Output: `action`[6] (joint position)
	- Action chunking: `chunk_size=50`, `n_action_steps=50`

	## 학습 방식

	VLM frozen + action expert only — SmolVLA 공식 표준 학습 방식 ([SmolVLA paper, arXiv:2506.01844](https://arxiv.org/abs/2506.01844)).

	\| 구성요소 \| 상태 \|
	\|---\|---\|
	\| VLM backbone (SmolVLM2) \| ❄️ 완전 Frozen (`freeze_vision_encoder=true`) \|
	\| Action expert \| 🔥 학습 (`train_expert_only=true`) \|
	\| PEFT / LoRA \| 사용 안 함 \|

	## Training hyperparameters

	\| 항목 \| 값 \|
	\|---\|---\|
	\| Dataset \| [Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps) — 3,300 episodes / 1,175,352 frames / 11 tasks / 10 fps \|
	\| Epochs / Steps \| 8 epoch / 36,800 steps \|
	\| Global batch size \| 256 (micro batch 128 × 2 GPU) \|
	\| Optimizer \| AdamW — lr `1e-4`, weight_decay `1e-10`, grad_clip_norm `10.0` \|
	\| LR scheduler \| cosine_decay_with_warmup — warmup 1,000 / decay 30,000 / peak_lr `1e-4` / decay_lr `2.5e-6` \|
	\| chunk_size / n_action_steps \| 50 / 50 \|
	\| Seed \| 1000 \|
	\| Dataloader workers \| 16 \|
	\| Mixed precision \| no (bf16 inference) \|
	\| Image augmentation \| ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter — 기하학적 변형(회전/이동/반전) 없음 (VLA 좌우 의미 보존) \|
	\| Hardware \| 2 × NVIDIA H100 80GB \|
	\| Final loss \| 0.020 \|

	## Camera rename

	LeRobot dataset 의 카메라 키와 SmolVLA 정책 키 매핑:

	\| Dataset key \| Policy key \|
	\|---\|---\|
	\| `observation.images.left_wrist` \| `observation.images.camera1` \|
	\| `observation.images.top` \| `observation.images.camera2` \|

	> 추론·평가 시 반드시 위와 동일한 rename 을 적용해야 합니다 (학습-추론 일관성).

	## Input / Output 규정

	- Input: `observation.state`[6] (joint position) + 카메라 2개 + language instruction(task) 만
	- Output: `action`[6] (joint position) 만
	- 데이터셋의 `ee_pos` / `gripper_binary` / `state.radian_urdf0` / `action.radian_urdf0` 는 학습에서 제외
	- SmolVLA 정책은 카메라 슬롯이 3개(`camera1/2/3`)로 고정이라 `camera3` 슬롯이 config 에 존재하지만, 데이터셋 카메라는 2개뿐이라 실제로 데이터가 흐르는 카메라는 2개입니다.

	## Usage

	```python
	from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

	policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod")
	```

	## Citation / Acknowledgement

	Built on top of [LeRobot](https://github.com/huggingface/lerobot) and the
	[SmolVLA](https://huggingface.co/lerobot/smolvla_base) base checkpoint. Project: CoRL 2026 CSI submission.

	### Framework versions

	- LeRobot 0.5.2