LeHome Submission - SmolVLA 70k

Method

This submission fine-tunes lerobot/smolvla_base on the official LeHome merged four-garment dataset:

Datasets/example/four_types_merged

The final checkpoint is selected from a continued-finetuning run:

30k baseline -> +20k -> +20k

We use the 70k-equivalent checkpoint because it achieved the best self-reported validation score among the tested 30k, 70k, and 90k checkpoints.

The policy input is:

  • observation.state: 12-dim bimanual robot state
  • observation.images.top_rgb: top RGB image
  • observation.images.left_rgb: left wrist RGB image
  • observation.images.right_rgb: right wrist RGB image

The policy output is:

  • action: 12-dim bimanual joint action

Camera Key Remapping

The trained SmolVLA checkpoint expects image feature names:

  • observation.images.camera1
  • observation.images.camera2
  • observation.images.camera3

The LeHome environment provides:

  • observation.images.top_rgb
  • observation.images.left_rgb
  • observation.images.right_rgb

Therefore this submission includes custom_smolvla_policy.py, which remaps:

observation.images.top_rgb   -> observation.images.camera1
observation.images.left_rgb  -> observation.images.camera2
observation.images.right_rgb -> observation.images.camera3

Files

The Hugging Face repository should contain:

pretrained_model/
custom_smolvla_policy.py
README.md
rollout_results.txt
source_code_link.txt

Place custom_smolvla_policy.py under:

scripts/eval_policy/custom_smolvla_policy.py

and import it in:

scripts/eval_policy/__init__.py

so that the policy can be selected with:

--policy_type custom_smolvla

Environment

  • Python 3.11
  • Official LeHome Challenge repository
  • LeRobot / SmolVLA
  • Isaac Sim based LeHome evaluation pipeline
  • CPU simulation device was used for self-reported evaluation

Evaluation Command

Set the dataset root environment variable:

export LEHOME_DATASET_ROOT=Datasets/example/four_types_merged

Run evaluation with:

python -m scripts.eval \
  --headless \
  --policy_type custom_smolvla \
  --policy_path <PATH_TO_PRETRAINED_MODEL> \
  --garment_type top_long \
  --dataset_root Datasets/example/four_types_merged \
  --num_episodes 1 \
  --enable_cameras \
  --device cpu

Replace top_long with:

top_short
pant_long
pant_short

Self-Reported Results

Self-reported evaluation used num_episodes=1, headless mode, CPU simulation, and all 12 garments for each category.

Checkpoint top_long top_short pant_long pant_short Average
SmolVLA 30k 23.50% 16.50% 17.50% 70.00% 31.88%
SmolVLA 70k 58.33% 25.00% 41.67% 83.33% 52.08%
SmolVLA 90k 41.67% 25.00% 33.33% 83.33% 45.83%

The 70k-equivalent checkpoint is submitted.

Notes

  • The submission does not require the training dataset for inference.
  • If datasets are removed from the evaluation environment, please keep the dataset metadata folders needed by LeRobot metadata loading, especially Datasets/example/four_types_merged/meta.
  • The included custom_smolvla_policy.py is required because the LeHome camera feature names differ from the SmolVLA checkpoint feature names.
  • The wrapper uses its default task description, Bimanual garment manipulation.
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for ruali-dev/lehome-smolvla-submit-4types-70k

Finetuned
(5876)
this model

Dataset used to train ruali-dev/lehome-smolvla-submit-4types-70k