| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - zh |
| | library_name: transformers |
| | tags: |
| | - robotics |
| | - vision-language-action |
| | - reinforcement-learning |
| | - embodied-ai |
| | - openpi |
| | - rlinf |
| | pipeline_tag: reinforcement-learning |
| | --- |
| | |
| | # SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching VLA Models |
| |
|
| | SA-VLA is a spatially-aware reinforcement learning approach for flow-matching Vision-Language-Action (VLA) models. |
| | It is developed on top of the RLinf framework and targets robust embodied manipulation with stronger spatial generalization. |
| |
|
| | - 📄 Paper: https://arxiv.org/abs/2602.00743 |
| | - 🌐 Project Page: https://xupan.top/Projects/savla |
| | - 🧩 Codebase: https://github.com/TwSphinx54/SA-VLA |
| | - 🏗️ RL Framework: https://github.com/RLinf/RLinf |
| |
|
| | --- |
| |
|
| | ## Model Summary |
| |
|
| | SA-VLA fuses visual tokens and spatial tokens into geometry-aware embeddings, then optimizes the policy via: |
| | 1. **Step-level dense rewards** |
| | 2. **Spatially-conditioned exploration (SCAN)** |
| | 3. **RL fine-tuning on embodied benchmarks** |
| |
|
| | This repository provides model weights used in SA-VLA experiments. |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | - RL fine-tuning and evaluation for embodied manipulation tasks |
| | - Experiments on LIBERO / LIBERO-PLUS style benchmarks |
| | - Research on spatial reasoning in VLA post-training |
| |
|
| | > For complete environment setup, training scripts, and benchmark integration, use the full code repository: |
| | > https://github.com/TwSphinx54/SA-VLA |
| |
|
| | --- |
| |
|
| | ## Quick Start (with SA-VLA codebase) |
| |
|
| | ### 1) Clone project |
| | ```bash |
| | git clone https://github.com/TwSphinx54/SA-VLA.git |
| | cd SA-VLA |
| | ``` |
| |
|
| | ### 2) Setup environment |
| | Follow the RLinf setup in: |
| | - `README.RLinf.md` (framework/environment) |
| | - `scripts/setup_container.sh` (extra container setup) |
| |
|
| | ### 3) Place weights |
| | Put downloaded checkpoints under: |
| | ```text |
| | weights/ |
| | ``` |
| |
|
| | ### 4) Run training / evaluation |
| | ```bash |
| | # RL training |
| | bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05 |
| | |
| | # Evaluation |
| | bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Recommended Weight Layout |
| |
|
| | ```text |
| | weights |
| | |-- Pi05-LIBERO |
| | |-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16 |
| | `-- RLinf-Pi05-SFT |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Dataset Notes |
| |
|
| | The SA-VLA experiments rely on LIBERO-family data and benchmark configs. |
| | For subset/full-set switching, modify benchmark mapping in your OpenPi LIBERO installation as documented in the main repo. |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | - Requires non-trivial robotics simulation setup |
| | - Performance depends on environment/version consistency |
| | - Not intended for safety-critical real-world deployment without additional validation |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{pan2026savlaspatiallyawareflowmatchingvisionlanguageaction, |
| | title={SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning}, |
| | author={Xu Pan and Zhenglin Wan and Xingrui Yu and Xianwei Zheng and Youkai Ke and Ming Sun and Rui Wang and Ziwei Wang and Ivor Tsang}, |
| | year={2026}, |
| | eprint={2602.00743}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.RO}, |
| | url={https://arxiv.org/abs/2602.00743} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | Apache-2.0 |
| |
|
| | --- |
| |
|
| | ## Acknowledgments |
| |
|
| | Built upon: |
| | - RLinf: https://github.com/RLinf/RLinf |
| | - OpenPi: https://github.com/Physical-Intelligence/openpi |
| | - LIBERO: https://github.com/Lifelong-Robot-Learning/LIBERO |
| | - LIBERO-PLUS: https://github.com/sylvestf/LIBERO-plus |
| | - VGGT: https://github.com/facebookresearch/vggt |