metadata
license: apache-2.0
pipeline_tag: robotics
A Pragmatic VLA Foundation Model
LingBot-VLA is a Vision-Language-Action (VLA) foundation model designed for robotic manipulation, emphasizing pragmatic deployment, efficiency, and strong generalization across tasks and platforms.
- Paper: A Pragmatic VLA Foundation Model
- Repository: https://github.com/robbyant/lingbot-vla
- Project Page: https://technology.robbyant.com/lingbot-vla
Highlights
- Large-scale Pre-training Data: Trained on 20,000 hours of real-world data from 9 popular dual-arm robot configurations.
- Strong Performance: Achieves clear superiority over competitors on simulation and real-world benchmarks (GM-100 and RoboTwin 2.0).
- Training Efficiency: Offers a 1.5 ~ 2.8× speedup over existing VLA-oriented codebases, ensuring it is well-suited for real-world deployment.
Related Models
| Model Name | Hugging Face | ModelScope | Description |
|---|---|---|---|
| LingBot-VLA-4B | 🤗 lingbot-vla-4b | 🤖 lingbot-vla-4b | LingBot-VLA w/o Depth |
| LingBot-VLA-4B-Depth | 🤗 lingbot-vla-4b-depth | 🤖 lingbot-vla-4b-depth | LingBot-VLA w/ Depth |
Citation
@article{wu2026pragmatic,
title={A Pragmatic VLA Foundation Model},
author={Wei Wu and Fan Lu and Yunnan Wang and Shuai Yang and Shi Liu and Fangjing Wang and Shuailei Ma and He Sun and Yong Wang and Zhenqi Qiu and Houlong Xiong and Ziyu Wang and Shuai Zhou and Yiyu Ren and Kejia Zhang and Hui Yu and Jingmei Zhao and Qian Zhu and Ran Cheng and Yong-Lu Li and Yongtao Huang and Xing Zhu and Yujun Shen and Kecheng Zheng},
journal={arXiv preprint arXiv:2601.18692},
year={2026}
}
License Agreement
This project is licensed under the Apache-2.0 License.
Acknowledgement
This codebase is built on the VeOmni and LeRobot projects. We thank the authors for their excellent work!