Robotics
LeRobot
Safetensors
smolvla
vla
so101
code-as-policies
cap
imitation-learning
50epochs
single-arm
dual-camera
stack-block
rgb-blocks
blue-dish
Instructions to use CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: lerobot | |
| pipeline_tag: robotics | |
| base_model: CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep | |
| datasets: | |
| - CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps | |
| tags: | |
| - lerobot | |
| - robotics | |
| - smolvla | |
| - vla | |
| - so101 | |
| - code-as-policies | |
| - cap | |
| - imitation-learning | |
| - 50epochs | |
| - single-arm | |
| - dual-camera | |
| - stack-block | |
| - rgb-blocks | |
| - blue-dish | |
| # SmolVLA-CaP-StackBlock-50epochs | |
| This repository contains a SmolVLA policy fine-tuned with LeRobot for the SO101 CAP task **Stack RGB Blocks on a Blue Dish**. The policy was initialized from `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` and trained for 50 epochs on `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps`. | |
| ## Model Details | |
| | Field | Value | | |
| |---|---| | |
| | Policy type | `smolvla` | | |
| | Task | stack red, green, and blue blocks on the blue dish from bottom to top | | |
| | Robot | SO101 follower | | |
| | Dataset | `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps` | | |
| | Base model | `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` | | |
| | Training steps | `17100` | | |
| | Completed step | `17100` | | |
| | Batch size | `128` per GPU | | |
| | Effective batch size | `256` | | |
| | Action chunk size | `50` | | |
| | Action horizon | `50` | | |
| | Observation steps | `1` | | |
| | Inference denoising steps | `50` | | |
| | Model weights | `model.safetensors` (864.7 MiB) | | |
| ## Training Setup | |
| The run used two CUDA processes with `batch_size=128` per process, image augmentation enabled, and camera key remapping from the dataset's raw cameras to the SmolVLA camera names: | |
| ```text | |
| observation.images.left_wrist -> observation.images.camera1 | |
| observation.images.top -> observation.images.camera2 | |
| ``` | |
| The checkpoint was saved locally at step `17100` with LeRobot's preprocessor and postprocessor artifacts included in this repository. | |
| ## Files | |
| ```text | |
| model.safetensors | |
| config.json | |
| train_config.json | |
| policy_preprocessor.json | |
| policy_preprocessor_step_5_normalizer_processor.safetensors | |
| policy_postprocessor.json | |
| policy_postprocessor_step_0_unnormalizer_processor.safetensors | |
| ``` | |
| ## Usage | |
| ```python | |
| from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy | |
| policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs") | |
| ``` | |
| For robot deployment, use the same camera mapping, normalization pipeline, and SO101 action/state conventions used by the training dataset. | |
| ## Intended Use | |
| This model is intended for imitation-learning experiments and SO101 tabletop manipulation research on the specified CAP task. It is not a general-purpose robot policy and should be validated in a controlled workspace before any hardware deployment. | |
| ## Limitations | |
| The model was trained on a single task dataset with fixed camera views, object set, action space, and workspace assumptions. No official evaluation success rate is included in this repository. | |