CoRL2026-CSI
/

SmolVLA-CaP-StackBlock-50epochs

code-as-policies

imitation-learning

Model card Files Files and versions

SmolVLA-CaP-StackBlock-50epochs / README.md

vpraise00's picture

Add files using upload-large-folder tool

3fe3d57 verified 27 days ago

|

history blame contribute delete

2.88 kB

	---
	license: apache-2.0
	library_name: lerobot
	pipeline_tag: robotics
	base_model: CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep
	datasets:
	- CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps
	tags:
	- lerobot
	- robotics
	- smolvla
	- vla
	- so101
	- code-as-policies
	- cap
	- imitation-learning
	- 50epochs
	- single-arm
	- dual-camera
	- stack-block
	- rgb-blocks
	- blue-dish
	---
	# SmolVLA-CaP-StackBlock-50epochs

	This repository contains a SmolVLA policy fine-tuned with LeRobot for the SO101 CAP task Stack RGB Blocks on a Blue Dish. The policy was initialized from `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` and trained for 50 epochs on `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps`.

	## Model Details

	\| Field \| Value \|
	\|---\|---\|
	\| Policy type \| `smolvla` \|
	\| Task \| stack red, green, and blue blocks on the blue dish from bottom to top \|
	\| Robot \| SO101 follower \|
	\| Dataset \| `CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps` \|
	\| Base model \| `CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep` \|
	\| Training steps \| `17100` \|
	\| Completed step \| `17100` \|
	\| Batch size \| `128` per GPU \|
	\| Effective batch size \| `256` \|
	\| Action chunk size \| `50` \|
	\| Action horizon \| `50` \|
	\| Observation steps \| `1` \|
	\| Inference denoising steps \| `50` \|
	\| Model weights \| `model.safetensors` (864.7 MiB) \|

	## Training Setup

	The run used two CUDA processes with `batch_size=128` per process, image augmentation enabled, and camera key remapping from the dataset's raw cameras to the SmolVLA camera names:

	```text
	observation.images.left_wrist -> observation.images.camera1
	observation.images.top -> observation.images.camera2
	```

	The checkpoint was saved locally at step `17100` with LeRobot's preprocessor and postprocessor artifacts included in this repository.

	## Files

	```text
	model.safetensors
	config.json
	train_config.json
	policy_preprocessor.json
	policy_preprocessor_step_5_normalizer_processor.safetensors
	policy_postprocessor.json
	policy_postprocessor_step_0_unnormalizer_processor.safetensors
	```

	## Usage

	```python
	from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

	policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs")
	```

	For robot deployment, use the same camera mapping, normalization pipeline, and SO101 action/state conventions used by the training dataset.

	## Intended Use

	This model is intended for imitation-learning experiments and SO101 tabletop manipulation research on the specified CAP task. It is not a general-purpose robot policy and should be validated in a controlled workspace before any hardware deployment.

	## Limitations

	The model was trained on a single task dataset with fixed camera views, object set, action space, and workspace assumptions. No official evaluation success rate is included in this repository.