Instructions to use Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 - Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
library_name: lerobot
base_model: lerobot/smolvla_base
pipeline_tag: robotics
tags:
- lerobot
- smolvla
- robotics
- ur7e
- imitation-learning
- code-as-policies
- CoRL2026
datasets:
- CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action
smolVLA · UR7e · Stack_Block (50 epoch, tp1)
lerobot/smolvla_base 를 CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action 데이터셋으로 50 epoch 파인튜닝한 SmolVLA 정책 모델.
Model details
- Base model:
lerobot/smolvla_base(SmolVLM2-500M-Video-Instruct + action expert) - Robot: UR7e (7-DOF, gripper 포함)
- Cameras:
realsense_topview,realsense_wrist(480×640 → 256×256 resize) - Action: 7D joint positions (6 joints + gripper)
- State variant:
state_tplus1_action(state at t+1, action at t)
Training
| Config | Value |
|---|---|
| Dataset | CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action (69932 frames, 100 episodes) |
| Steps | 13700 (= 50 epoch) |
| Global batch | 256 (BATCH=64 × NUM_GPUS=4) |
| Optimizer | AdamW (lerobot smolvla preset) |
| Mixed precision | no (bf16 inference) |
| Image augmentation | brightness / contrast / saturation / hue / sharpness / affine, max 3 random |
| Hardware | 4× H100 80GB |
학습 스크립트: scripts/ur7e_tplus1/train_smolvla_stack_block.sh (CoRL2026 lerobot fork).
Camera rename
LeRobot dataset 의 카메라 키와 SmolVLA 정책 키 매핑:
| Dataset key | Policy key |
|---|---|
observation.images.realsense_wrist |
observation.images.camera1 |
observation.images.realsense_topview |
observation.images.camera2 |
Usage
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA_UR7e_Stack_Block_50epoch_tp1")
Citation / Acknowledgement
Built on top of LeRobot and the SmolVLA checkpoint. Project: CoRL 2026 CSI submission.