SmolVLA-CaP-StackBlock-50epochs

This repository contains a SmolVLA policy fine-tuned with LeRobot for the SO101 CAP task Stack RGB Blocks on a Blue Dish. The policy was initialized from CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep and trained for 50 epochs on CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps.

Model Details

Field Value
Policy type smolvla
Task stack red, green, and blue blocks on the blue dish from bottom to top
Robot SO101 follower
Dataset CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps
Base model CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep
Training steps 17100
Completed step 17100
Batch size 128 per GPU
Effective batch size 256
Action chunk size 50
Action horizon 50
Observation steps 1
Inference denoising steps 50
Model weights model.safetensors (864.7 MiB)

Training Setup

The run used two CUDA processes with batch_size=128 per process, image augmentation enabled, and camera key remapping from the dataset's raw cameras to the SmolVLA camera names:

observation.images.left_wrist -> observation.images.camera1
observation.images.top        -> observation.images.camera2

The checkpoint was saved locally at step 17100 with LeRobot's preprocessor and postprocessor artifacts included in this repository.

Files

model.safetensors
config.json
train_config.json
policy_preprocessor.json
policy_preprocessor_step_5_normalizer_processor.safetensors
policy_postprocessor.json
policy_postprocessor_step_0_unnormalizer_processor.safetensors

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs")

For robot deployment, use the same camera mapping, normalization pipeline, and SO101 action/state conventions used by the training dataset.

Intended Use

This model is intended for imitation-learning experiments and SO101 tabletop manipulation research on the specified CAP task. It is not a general-purpose robot policy and should be validated in a controlled workspace before any hardware deployment.

Limitations

The model was trained on a single task dataset with fixed camera views, object set, action space, and workspace assumptions. No official evaluation success rate is included in this repository.

Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs

Dataset used to train CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs