Instructions to use Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=Cache-SCA/smolVLA_UR7e_Stack_Block_50epoch_tp1 - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: lerobot | |
| base_model: lerobot/smolvla_base | |
| pipeline_tag: robotics | |
| tags: | |
| - lerobot | |
| - smolvla | |
| - robotics | |
| - ur7e | |
| - imitation-learning | |
| - code-as-policies | |
| - CoRL2026 | |
| datasets: | |
| - CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action | |
| # smolVLA ยท UR7e ยท Stack_Block (50 epoch, tp1) | |
| [lerobot/smolvla_base](https://huggingface.co/lerobot/smolvla_base) ๋ฅผ [CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action) ๋ฐ์ดํฐ์ ์ผ๋ก 50 epoch ํ์ธํ๋ํ SmolVLA ์ ์ฑ ๋ชจ๋ธ. | |
| ## Model details | |
| - **Base model**: `lerobot/smolvla_base` (SmolVLM2-500M-Video-Instruct + action expert) | |
| - **Robot**: UR7e (7-DOF, gripper ํฌํจ) | |
| - **Cameras**: `realsense_topview`, `realsense_wrist` (480ร640 โ 256ร256 resize) | |
| - **Action**: 7D joint positions (6 joints + gripper) | |
| - **State variant**: `state_tplus1_action` (state at t+1, action at t) | |
| ## Training | |
| | Config | Value | | |
| |---|---| | |
| | Dataset | [CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP-Stack_Block-100epi_10fps_state_tplus1_action) (69932 frames, 100 episodes) | | |
| | Steps | 13700 (= 50 epoch) | | |
| | Global batch | 256 (BATCH=64 ร NUM_GPUS=4) | | |
| | Optimizer | AdamW (lerobot smolvla preset) | | |
| | Mixed precision | no (bf16 inference) | | |
| | Image augmentation | brightness / contrast / saturation / hue / sharpness / affine, max 3 random | | |
| | Hardware | 4ร H100 80GB | | |
| ํ์ต ์คํฌ๋ฆฝํธ: `scripts/ur7e_tplus1/train_smolvla_stack_block.sh` (CoRL2026 lerobot fork). | |
| ## Camera rename | |
| LeRobot dataset ์ ์นด๋ฉ๋ผ ํค์ SmolVLA ์ ์ฑ ํค ๋งคํ: | |
| | Dataset key | Policy key | | |
| |---|---| | |
| | `observation.images.realsense_wrist` | `observation.images.camera1` | | |
| | `observation.images.realsense_topview` | `observation.images.camera2` | | |
| ## Usage | |
| ```python | |
| from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy | |
| policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA_UR7e_Stack_Block_50epoch_tp1") | |
| ``` | |
| ## Citation / Acknowledgement | |
| Built on top of [LeRobot](https://github.com/huggingface/lerobot) and the [SmolVLA](https://huggingface.co/lerobot/smolvla_base) checkpoint. Project: CoRL 2026 CSI submission. | |