nielsr's picture
nielsr HF Staff
Add metadata and link paper/code
de931d9 verified
|
raw
history blame
2.51 kB
metadata
license: apache-2.0
pipeline_tag: robotics

A Pragmatic VLA Foundation Model

LingBot-VLA is a Vision-Language-Action (VLA) foundation model designed for robotic manipulation, emphasizing pragmatic deployment, efficiency, and strong generalization across tasks and platforms.

Highlights

  • Large-scale Pre-training Data: Trained on 20,000 hours of real-world data from 9 popular dual-arm robot configurations.
  • Strong Performance: Achieves clear superiority over competitors on simulation and real-world benchmarks (GM-100 and RoboTwin 2.0).
  • Training Efficiency: Offers a 1.5 ~ 2.8× speedup over existing VLA-oriented codebases, ensuring it is well-suited for real-world deployment.

Related Models

Model Name Hugging Face ModelScope Description
LingBot-VLA-4B   🤗 lingbot-vla-4b 🤖 lingbot-vla-4b LingBot-VLA w/o Depth
LingBot-VLA-4B-Depth 🤗 lingbot-vla-4b-depth 🤖 lingbot-vla-4b-depth LingBot-VLA w/ Depth

Citation

@article{wu2026pragmatic,
  title={A Pragmatic VLA Foundation Model},
  author={Wei Wu and Fan Lu and Yunnan Wang and Shuai Yang and Shi Liu and Fangjing Wang and Shuailei Ma and He Sun and Yong Wang and Zhenqi Qiu and Houlong Xiong and Ziyu Wang and Shuai Zhou and Yiyu Ren and Kejia Zhang and Hui Yu and Jingmei Zhao and Qian Zhu and Ran Cheng and Yong-Lu Li and Yongtao Huang and Xing Zhu and Yujun Shen and Kecheng Zheng},
  journal={arXiv preprint arXiv:2601.18692},
  year={2026}
}

License Agreement

This project is licensed under the Apache-2.0 License.

Acknowledgement

This codebase is built on the VeOmni and LeRobot projects. We thank the authors for their excellent work!