Robotics
LeRobot
Safetensors
smolvla
so101
imitation-learning
isaaclab
sim
multi-task
code-as-policies
CoRL2026
Instructions to use CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: lerobot | |
| base_model: lerobot/smolvla_base | |
| pipeline_tag: robotics | |
| tags: | |
| - lerobot | |
| - smolvla | |
| - robotics | |
| - so101 | |
| - imitation-learning | |
| - isaaclab | |
| - sim | |
| - multi-task | |
| - code-as-policies | |
| - CoRL2026 | |
| datasets: | |
| - CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps | |
| # smolVLA ยท IsaacLab SO101 Multi-Task (11 tasks, 8 epoch) | |
| [lerobot/smolvla_base](https://huggingface.co/lerobot/smolvla_base) ๋ฅผ IsaacLab ์๋ฎฌ๋ ์ด์ SO101 **11-task ๋ฉํฐํ์คํฌ** ๋ฐ์ดํฐ์ | |
| [CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps) | |
| ์ผ๋ก 8 epoch ํ์ธํ๋ํ SmolVLA ์ ์ฑ . | |
| ์ด ์ฒดํฌํฌ์ธํธ๋ **full model** (`model.safetensors`) ์ ๋๋ค โ LoRA adapter ๊ฐ ์๋๋ฉฐ, ๊ทธ๋๋ก ๋ก๋ํด ์ฌ์ฉํฉ๋๋ค. | |
| ## Model details | |
| - **Base model**: `lerobot/smolvla_base` (SmolVLM2-500M-Video-Instruct VLM + action expert) | |
| - **Robot**: SO101 (6-DOF, gripper ํฌํจ) โ IsaacLab ์๋ฎฌ๋ ์ด์ | |
| - **Cameras**: `top`, `left_wrist` (480ร640) โ ์ ์ฑ ํค `camera1`(left_wrist) / `camera2`(top) ๋ก rename | |
| - **Inputs**: `observation.state`[6] + ์นด๋ฉ๋ผ 2๊ฐ + language instruction (task) | |
| - **Output**: `action`[6] (joint position) | |
| - **Action chunking**: `chunk_size=50`, `n_action_steps=50` | |
| ## ํ์ต ๋ฐฉ์ | |
| **VLM frozen + action expert only** โ SmolVLA ๊ณต์ ํ์ค ํ์ต ๋ฐฉ์ ([SmolVLA paper, arXiv:2506.01844](https://arxiv.org/abs/2506.01844)). | |
| | ๊ตฌ์ฑ์์ | ์ํ | | |
| |---|---| | |
| | VLM backbone (SmolVLM2) | โ๏ธ **์์ Frozen** (`freeze_vision_encoder=true`) | | |
| | Action expert | ๐ฅ **ํ์ต** (`train_expert_only=true`) | | |
| | PEFT / LoRA | ์ฌ์ฉ ์ ํจ | | |
| ## Training hyperparameters | |
| | ํญ๋ชฉ | ๊ฐ | | |
| |---|---| | |
| | Dataset | [Isaaclab-so101_11task_baseCaP_3300epi_10fps](https://huggingface.co/datasets/CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps) โ 3,300 episodes / 1,175,352 frames / 11 tasks / 10 fps | | |
| | Epochs / Steps | 8 epoch / 36,800 steps | | |
| | Global batch size | 256 (micro batch 128 ร 2 GPU) | | |
| | Optimizer | AdamW โ lr `1e-4`, weight_decay `1e-10`, grad_clip_norm `10.0` | | |
| | LR scheduler | cosine_decay_with_warmup โ warmup 1,000 / decay 30,000 / peak_lr `1e-4` / decay_lr `2.5e-6` | | |
| | chunk_size / n_action_steps | 50 / 50 | | |
| | Seed | 1000 | | |
| | Dataloader workers | 16 | | |
| | Mixed precision | no (bf16 inference) | | |
| | Image augmentation | ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter โ **๊ธฐํํ์ ๋ณํ(ํ์ /์ด๋/๋ฐ์ ) ์์** (VLA ์ข์ฐ ์๋ฏธ ๋ณด์กด) | | |
| | Hardware | 2 ร NVIDIA H100 80GB | | |
| | Final loss | 0.020 | | |
| ## Camera rename | |
| LeRobot dataset ์ ์นด๋ฉ๋ผ ํค์ SmolVLA ์ ์ฑ ํค ๋งคํ: | |
| | Dataset key | Policy key | | |
| |---|---| | |
| | `observation.images.left_wrist` | `observation.images.camera1` | | |
| | `observation.images.top` | `observation.images.camera2` | | |
| > ์ถ๋ก ยทํ๊ฐ ์ ๋ฐ๋์ ์์ ๋์ผํ rename ์ ์ ์ฉํด์ผ ํฉ๋๋ค (ํ์ต-์ถ๋ก ์ผ๊ด์ฑ). | |
| ## Input / Output ๊ท์ | |
| - **Input**: `observation.state`[6] (joint position) + ์นด๋ฉ๋ผ 2๊ฐ + language instruction(task) ๋ง | |
| - **Output**: `action`[6] (joint position) ๋ง | |
| - ๋ฐ์ดํฐ์ ์ `ee_pos` / `gripper_binary` / `state.radian_urdf0` / `action.radian_urdf0` ๋ ํ์ต์์ ์ ์ธ | |
| - SmolVLA ์ ์ฑ ์ ์นด๋ฉ๋ผ ์ฌ๋กฏ์ด 3๊ฐ(`camera1/2/3`)๋ก ๊ณ ์ ์ด๋ผ `camera3` ์ฌ๋กฏ์ด config ์ ์กด์ฌํ์ง๋ง, ๋ฐ์ดํฐ์ ์นด๋ฉ๋ผ๋ 2๊ฐ๋ฟ์ด๋ผ ์ค์ ๋ก ๋ฐ์ดํฐ๊ฐ ํ๋ฅด๋ ์นด๋ฉ๋ผ๋ 2๊ฐ์ ๋๋ค. | |
| ## Usage | |
| ```python | |
| from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy | |
| policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-IsaacLab-Multi-Task-8epoch-mod") | |
| ``` | |
| ## Citation / Acknowledgement | |
| Built on top of [LeRobot](https://github.com/huggingface/lerobot) and the | |
| [SmolVLA](https://huggingface.co/lerobot/smolvla_base) base checkpoint. Project: CoRL 2026 CSI submission. | |
| ### Framework versions | |
| - LeRobot 0.5.2 | |