--- license: apache-2.0 language: - en - zh library_name: transformers tags: - robotics - vision-language-action - reinforcement-learning - embodied-ai - openpi - rlinf pipeline_tag: reinforcement-learning --- # SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching VLA Models SA-VLA is a spatially-aware reinforcement learning approach for flow-matching Vision-Language-Action (VLA) models. It is developed on top of the RLinf framework and targets robust embodied manipulation with stronger spatial generalization. - 📄 Paper: https://arxiv.org/abs/2602.00743 - 🌐 Project Page: https://xupan.top/Projects/savla - 🧩 Codebase: https://github.com/TwSphinx54/SA-VLA - 🏗️ RL Framework: https://github.com/RLinf/RLinf --- ## Model Summary SA-VLA fuses visual tokens and spatial tokens into geometry-aware embeddings, then optimizes the policy via: 1. **Step-level dense rewards** 2. **Spatially-conditioned exploration (SCAN)** 3. **RL fine-tuning on embodied benchmarks** This repository provides model weights used in SA-VLA experiments. --- ## Intended Use - RL fine-tuning and evaluation for embodied manipulation tasks - Experiments on LIBERO / LIBERO-PLUS style benchmarks - Research on spatial reasoning in VLA post-training > For complete environment setup, training scripts, and benchmark integration, use the full code repository: > https://github.com/TwSphinx54/SA-VLA --- ## Quick Start (with SA-VLA codebase) ### 1) Clone project ```bash git clone https://github.com/TwSphinx54/SA-VLA.git cd SA-VLA ``` ### 2) Setup environment Follow the RLinf setup in: - `README.RLinf.md` (framework/environment) - `scripts/setup_container.sh` (extra container setup) ### 3) Place weights Put downloaded checkpoints under: ```text weights/ ``` ### 4) Run training / evaluation ```bash # RL training bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05 # Evaluation bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval ``` --- ## Recommended Weight Layout ```text weights |-- Pi05-LIBERO |-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16 `-- RLinf-Pi05-SFT ``` --- ## Dataset Notes The SA-VLA experiments rely on LIBERO-family data and benchmark configs. For subset/full-set switching, modify benchmark mapping in your OpenPi LIBERO installation as documented in the main repo. --- ## Limitations - Requires non-trivial robotics simulation setup - Performance depends on environment/version consistency - Not intended for safety-critical real-world deployment without additional validation --- ## Citation ```bibtex @misc{pan2026savlaspatiallyawareflowmatchingvisionlanguageaction, title={SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning}, author={Xu Pan and Zhenglin Wan and Xingrui Yu and Xianwei Zheng and Youkai Ke and Ming Sun and Rui Wang and Ziwei Wang and Ivor Tsang}, year={2026}, eprint={2602.00743}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2602.00743} } ``` --- ## License Apache-2.0 --- ## Acknowledgments Built upon: - RLinf: https://github.com/RLinf/RLinf - OpenPi: https://github.com/Physical-Intelligence/openpi - LIBERO: https://github.com/Lifelong-Robot-Learning/LIBERO - LIBERO-PLUS: https://github.com/sylvestf/LIBERO-plus - VGGT: https://github.com/facebookresearch/vggt