SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware.
This policy has been trained and pushed to the Hub using LeRobot.
See the full documentation at LeRobot Docs.
How to Get Started with the Model
For a complete walkthrough, see the training guide.
Below is the short version on how to train and run inference/eval: