Robotics
LeRobot
Safetensors
smolvla
vla
so101
code-as-policies
cap
imitation-learning
50epochs
single-arm
dual-camera
stack-block
rgb-blocks
blue-dish
Instructions to use Cache-SCA/SmolVLA-CaP-StackBlock-50epochs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Cache-SCA/SmolVLA-CaP-StackBlock-50epochs with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=Cache-SCA/SmolVLA-CaP-StackBlock-50epochs \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=Cache-SCA/SmolVLA-CaP-StackBlock-50epochs - Notebooks
- Google Colab
- Kaggle
File size: 2,879 Bytes
3fe3d57 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
base_model: CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep
datasets:
- CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps
tags:
- lerobot
- robotics
- smolvla
- vla
- so101
- code-as-policies
- cap
- imitation-learning
- 50epochs
- single-arm
- dual-camera
- stack-block
- rgb-blocks
- blue-dish
---
# SmolVLA-CaP-StackBlock-50epochs
This repository contains a SmolVLA policy fine-tuned with LeRobot for the SO101 CAP task **Stack RGB Blocks on a Blue Dish**. The policy was initialized from `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` and trained for 50 epochs on `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps`.
## Model Details
| Field | Value |
|---|---|
| Policy type | `smolvla` |
| Task | stack red, green, and blue blocks on the blue dish from bottom to top |
| Robot | SO101 follower |
| Dataset | `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps` |
| Base model | `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` |
| Training steps | `17100` |
| Completed step | `17100` |
| Batch size | `128` per GPU |
| Effective batch size | `256` |
| Action chunk size | `50` |
| Action horizon | `50` |
| Observation steps | `1` |
| Inference denoising steps | `50` |
| Model weights | `model.safetensors` (864.7 MiB) |
## Training Setup
The run used two CUDA processes with `batch_size=128` per process, image augmentation enabled, and camera key remapping from the dataset's raw cameras to the SmolVLA camera names:
```text
observation.images.left_wrist -> observation.images.camera1
observation.images.top -> observation.images.camera2
```
The checkpoint was saved locally at step `17100` with LeRobot's preprocessor and postprocessor artifacts included in this repository.
## Files
```text
model.safetensors
config.json
train_config.json
policy_preprocessor.json
policy_preprocessor_step_5_normalizer_processor.safetensors
policy_postprocessor.json
policy_postprocessor_step_0_unnormalizer_processor.safetensors
```
## Usage
```python
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs")
```
For robot deployment, use the same camera mapping, normalization pipeline, and SO101 action/state conventions used by the training dataset.
## Intended Use
This model is intended for imitation-learning experiments and SO101 tabletop manipulation research on the specified CAP task. It is not a general-purpose robot policy and should be validated in a controlled workspace before any hardware deployment.
## Limitations
The model was trained on a single task dataset with fixed camera views, object set, action space, and workspace assumptions. No official evaluation success rate is included in this repository.
|