Instructions to use ruali-dev/lehome-smolvla-submit-4types-70k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ruali-dev/lehome-smolvla-submit-4types-70k with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ruali-dev/lehome-smolvla-submit-4types-70k", dtype="auto") - LeRobot
How to use ruali-dev/lehome-smolvla-submit-4types-70k with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=ruali-dev/lehome-smolvla-submit-4types-70k \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=ruali-dev/lehome-smolvla-submit-4types-70k - Notebooks
- Google Colab
- Kaggle
LeHome Submission - SmolVLA 70k
Method
This submission fine-tunes lerobot/smolvla_base on the official LeHome merged four-garment dataset:
Datasets/example/four_types_merged
The final checkpoint is selected from a continued-finetuning run:
30k baseline -> +20k -> +20k
We use the 70k-equivalent checkpoint because it achieved the best self-reported validation score among the tested 30k, 70k, and 90k checkpoints.
The policy input is:
observation.state: 12-dim bimanual robot stateobservation.images.top_rgb: top RGB imageobservation.images.left_rgb: left wrist RGB imageobservation.images.right_rgb: right wrist RGB image
The policy output is:
action: 12-dim bimanual joint action
Camera Key Remapping
The trained SmolVLA checkpoint expects image feature names:
observation.images.camera1observation.images.camera2observation.images.camera3
The LeHome environment provides:
observation.images.top_rgbobservation.images.left_rgbobservation.images.right_rgb
Therefore this submission includes custom_smolvla_policy.py, which remaps:
observation.images.top_rgb -> observation.images.camera1
observation.images.left_rgb -> observation.images.camera2
observation.images.right_rgb -> observation.images.camera3
Files
The Hugging Face repository should contain:
pretrained_model/
custom_smolvla_policy.py
README.md
rollout_results.txt
source_code_link.txt
Place custom_smolvla_policy.py under:
scripts/eval_policy/custom_smolvla_policy.py
and import it in:
scripts/eval_policy/__init__.py
so that the policy can be selected with:
--policy_type custom_smolvla
Environment
- Python 3.11
- Official LeHome Challenge repository
- LeRobot / SmolVLA
- Isaac Sim based LeHome evaluation pipeline
- CPU simulation device was used for self-reported evaluation
Evaluation Command
Set the dataset root environment variable:
export LEHOME_DATASET_ROOT=Datasets/example/four_types_merged
Run evaluation with:
python -m scripts.eval \
--headless \
--policy_type custom_smolvla \
--policy_path <PATH_TO_PRETRAINED_MODEL> \
--garment_type top_long \
--dataset_root Datasets/example/four_types_merged \
--num_episodes 1 \
--enable_cameras \
--device cpu
Replace top_long with:
top_short
pant_long
pant_short
Self-Reported Results
Self-reported evaluation used num_episodes=1, headless mode, CPU simulation, and all 12 garments for each category.
| Checkpoint | top_long | top_short | pant_long | pant_short | Average |
|---|---|---|---|---|---|
| SmolVLA 30k | 23.50% | 16.50% | 17.50% | 70.00% | 31.88% |
| SmolVLA 70k | 58.33% | 25.00% | 41.67% | 83.33% | 52.08% |
| SmolVLA 90k | 41.67% | 25.00% | 33.33% | 83.33% | 45.83% |
The 70k-equivalent checkpoint is submitted.
Notes
- The submission does not require the training dataset for inference.
- If datasets are removed from the evaluation environment, please keep the dataset metadata folders needed by LeRobot metadata loading, especially
Datasets/example/four_types_merged/meta. - The included
custom_smolvla_policy.pyis required because the LeHome camera feature names differ from the SmolVLA checkpoint feature names. - The wrapper uses its default task description,
Bimanual garment manipulation.
Model tree for ruali-dev/lehome-smolvla-submit-4types-70k
Base model
lerobot/smolvla_base