SmolVLA: A vision-language-action model for affordable and efficient robotics

Resources and technical documentation:

This model has 450M parameters in total. You can use inside the LeRobot library.

Before proceeding to the next steps, you need to properly install the environment by following Installation Guide on the docs.

Install smolvla extra dependencies:

pip install -e ".[smolvla]"

Example of finetuning the smolvla pretrained model (smolvla_base):

python lerobot/scripts/train.py \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=lerobot/svla_so101_pickplace \
  --batch_size=64 \
  --steps=20000 \
  --output_dir=outputs/train/my_smolvla \
  --job_name=my_smolvla_training \
  --policy.device=cuda \
  --wandb.enable=true

Example of finetuning the smolvla neural network with pretrained VLM and action expert intialized from scratch:

python lerobot/scripts/train.py \
  --dataset.repo_id=lerobot/svla_so101_pickplace \
  --batch_size=64 \
  --steps=200000 \
  --output_dir=outputs/train/my_smolvla \
  --job_name=my_smolvla_training \
  --policy.device=cuda \
  --wandb.enable=true

Downloads last month: -

Video Preview

Robotics

Dataset used to train y1y2y3/smolvla_base1

Paper for y1y2y3/smolvla_base1

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 161