| license: other | |
| pipeline_tag: robotics | |
| tags: | |
| - robotics | |
| - vision-language-action models | |
| # VLANeXt: Recipes for Building Strong VLA Models | |
| [](https://huggingface.co/papers/2602.18532) | |
| [](https://dravenalg.github.io/VLANeXt) | |
| [](https://github.com/DravenALG/VLANeXt) | |
| [](https://github.com/DravenALG/awesome-vla) | |
| VLANeXt is a Vision-Language-Action (VLA) model designed for general-purpose robotic policy learning. By systematically reexamining the VLA design space, the authors distill a set of 12 practical findings that significantly improve model performance and generalization across benchmarks like LIBERO and LIBERO-plus. | |
| ## 📖 Abstract | |
| Following the rise of large foundation models, Vision–Language–Action models (VLAs) emerged, leveraging strong visual and language understanding for general-purpose policy learning. Yet, the current VLA landscape remains fragmented and exploratory. VLANeXt reexamines the VLA design space under a unified framework and evaluation setup, dissecting design choices along three dimensions: foundational components, perception essentials, and action modelling perspectives. The resulting model outperforms prior state-of-the-art methods and demonstrates strong generalization in real-world experiments. | |
| ## 🛠️ Usage | |
| This repository hosts the checkpoints for evaluation on the LIBERO and LIBERO-plus benchmark suites. For environment setup, training, and evaluation instructions, please refer to the official [VLANeXt GitHub repository](https://github.com/DravenALG/VLANeXt). | |
| ## 📚 Citation | |
| If you find VLANeXt useful for your research or applications, please cite the paper: | |
| ```bibtex | |
| @article{wu2026vlanext, | |
| title={VLANeXt: Recipes for Building Strong VLA Models}, | |
| author={Xiao-Ming Wu and Bin Fan and Kang Liao and Jian-Jian Jiang and Runze Yang and Yihang Luo and Zhonghua Wu and Wei-Shi Zheng and Chen Change Loy}, | |
| journal={arXiv preprint arXiv:2602.18532}, | |
| year={2026} | |
| } | |
| ``` | |
| ## 🗞️ License | |
| This project is licensed under the [NTU S-Lab License 1.0](https://github.com/DravenALG/VLANeXt/blob/main/LICENSE). |