File size: 2,879 Bytes
3fe3d57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
base_model: CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep
datasets:
- CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps
tags:
- lerobot
- robotics
- smolvla
- vla
- so101
- code-as-policies
- cap
- imitation-learning
- 50epochs
- single-arm
- dual-camera
- stack-block
- rgb-blocks
- blue-dish
---
# SmolVLA-CaP-StackBlock-50epochs

This repository contains a SmolVLA policy fine-tuned with LeRobot for the SO101 CAP task **Stack RGB Blocks on a Blue Dish**. The policy was initialized from `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` and trained for 50 epochs on `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps`.

## Model Details

| Field | Value |
|---|---|
| Policy type | `smolvla` |
| Task | stack red, green, and blue blocks on the blue dish from bottom to top |
| Robot | SO101 follower |
| Dataset | `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps` |
| Base model | `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` |
| Training steps | `17100` |
| Completed step | `17100` |
| Batch size | `128` per GPU |
| Effective batch size | `256` |
| Action chunk size | `50` |
| Action horizon | `50` |
| Observation steps | `1` |
| Inference denoising steps | `50` |
| Model weights | `model.safetensors` (864.7 MiB) |

## Training Setup

The run used two CUDA processes with `batch_size=128` per process, image augmentation enabled, and camera key remapping from the dataset's raw cameras to the SmolVLA camera names:

```text
observation.images.left_wrist -> observation.images.camera1
observation.images.top        -> observation.images.camera2
```

The checkpoint was saved locally at step `17100` with LeRobot's preprocessor and postprocessor artifacts included in this repository.

## Files

```text
model.safetensors
config.json
train_config.json
policy_preprocessor.json
policy_preprocessor_step_5_normalizer_processor.safetensors
policy_postprocessor.json
policy_postprocessor_step_0_unnormalizer_processor.safetensors
```

## Usage

```python
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs")
```

For robot deployment, use the same camera mapping, normalization pipeline, and SO101 action/state conventions used by the training dataset.

## Intended Use

This model is intended for imitation-learning experiments and SO101 tabletop manipulation research on the specified CAP task. It is not a general-purpose robot policy and should be validated in a controlled workspace before any hardware deployment.

## Limitations

The model was trained on a single task dataset with fixed camera views, object set, action space, and workspace assumptions. No official evaluation success rate is included in this repository.