LeHome Submission - SmolVLA 70k

Method

This submission fine-tunes lerobot/smolvla_base on the official LeHome merged four-garment dataset:

Datasets/example/four_types_merged

The final checkpoint is selected from a continued-finetuning run:

30k baseline -> +20k -> +20k

We use the 70k-equivalent checkpoint because it achieved the best self-reported validation score among the tested 30k, 70k, and 90k checkpoints.

The policy input is:

observation.state: 12-dim bimanual robot state
observation.images.top_rgb: top RGB image
observation.images.left_rgb: left wrist RGB image
observation.images.right_rgb: right wrist RGB image

The policy output is:

action: 12-dim bimanual joint action

Camera Key Remapping

The trained SmolVLA checkpoint expects image feature names:

observation.images.camera1
observation.images.camera2
observation.images.camera3

The LeHome environment provides:

observation.images.top_rgb
observation.images.left_rgb
observation.images.right_rgb

Therefore this submission includes custom_smolvla_policy.py, which remaps:

observation.images.top_rgb   -> observation.images.camera1
observation.images.left_rgb  -> observation.images.camera2
observation.images.right_rgb -> observation.images.camera3

Files

The Hugging Face repository should contain:

pretrained_model/
custom_smolvla_policy.py
README.md
rollout_results.txt
source_code_link.txt

Place custom_smolvla_policy.py under:

scripts/eval_policy/custom_smolvla_policy.py

and import it in:

scripts/eval_policy/__init__.py

so that the policy can be selected with:

--policy_type custom_smolvla

Environment

Python 3.11
Official LeHome Challenge repository
LeRobot / SmolVLA
Isaac Sim based LeHome evaluation pipeline
CPU simulation device was used for self-reported evaluation

Evaluation Command

Set the dataset root environment variable:

export LEHOME_DATASET_ROOT=Datasets/example/four_types_merged

Run evaluation with:

python -m scripts.eval \
  --headless \
  --policy_type custom_smolvla \
  --policy_path <PATH_TO_PRETRAINED_MODEL> \
  --garment_type top_long \
  --dataset_root Datasets/example/four_types_merged \
  --num_episodes 1 \
  --enable_cameras \
  --device cpu

Replace top_long with:

top_short
pant_long
pant_short

Self-Reported Results

Self-reported evaluation used num_episodes=1, headless mode, CPU simulation, and all 12 garments for each category.

Checkpoint	top_long	top_short	pant_long	pant_short	Average
SmolVLA 30k	23.50%	16.50%	17.50%	70.00%	31.88%
SmolVLA 70k	58.33%	25.00%	41.67%	83.33%	52.08%
SmolVLA 90k	41.67%	25.00%	33.33%	83.33%	45.83%

The 70k-equivalent checkpoint is submitted.

Notes

The submission does not require the training dataset for inference.
If datasets are removed from the evaluation environment, please keep the dataset metadata folders needed by LeRobot metadata loading, especially Datasets/example/four_types_merged/meta.
The included custom_smolvla_policy.py is required because the LeHome camera feature names differ from the SmolVLA checkpoint feature names.
The wrapper uses its default task description, Bimanual garment manipulation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for ruali-dev/lehome-smolvla-submit-4types-70k

Base model

lerobot/smolvla_base

Finetuned

(6679)

this model

ruali-dev
/

lehome-smolvla-submit-4types-70k