SSSSphinx
/

SA-VLA

+---
+license: apache-2.0
+language:
+- en
+- zh
+library_name: transformers
+tags:
+- robotics
+- vision-language-action
+- reinforcement-learning
+- embodied-ai
+- openpi
+- rlinf
+pipeline_tag: reinforcement-learning
+---
+# SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching VLA Models
+SA-VLA is a spatially-aware reinforcement learning approach for flow-matching Vision-Language-Action (VLA) models.
+It is developed on top of the RLinf framework and targets robust embodied manipulation with stronger spatial generalization.
+- 📄 Paper: https://arxiv.org/abs/2602.00743
+- 🌐 Project Page: https://xupan.top/Projects/savla
+- 🧩 Codebase: https://github.com/TwSphinx54/SA-VLA
+- 🏗️ RL Framework: https://github.com/RLinf/RLinf
+---
+## Model Summary
+SA-VLA fuses visual tokens and spatial tokens into geometry-aware embeddings, then optimizes the policy via:
+1. **Step-level dense rewards**
+2. **Spatially-conditioned exploration (SCAN)**
+3. **RL fine-tuning on embodied benchmarks**
+This repository provides model weights used in SA-VLA experiments.
+---
+## Intended Use
+- RL fine-tuning and evaluation for embodied manipulation tasks
+- Experiments on LIBERO / LIBERO-PLUS style benchmarks
+- Research on spatial reasoning in VLA post-training
+> For complete environment setup, training scripts, and benchmark integration, use the full code repository:
+> https://github.com/TwSphinx54/SA-VLA
+---
+## Quick Start (with SA-VLA codebase)
+### 1) Clone project
+```bash
+git clone https://github.com/TwSphinx54/SA-VLA.git
+cd SA-VLA
+```
+### 2) Setup environment
+Follow the RLinf setup in:
+- `README.RLinf.md` (framework/environment)
+- `scripts/setup_container.sh` (extra container setup)
+### 3) Place weights
+Put downloaded checkpoints under:
+```text
+weights/
+```
+### 4) Run training / evaluation
+```bash
+# RL training
+bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05
+# Evaluation
+bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval
+```
+---
+## Recommended Weight Layout
+```text
+weights
+|-- Pi05-LIBERO
+|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
+`-- RLinf-Pi05-SFT
+```
+---
+## Dataset Notes
+The SA-VLA experiments rely on LIBERO-family data and benchmark configs.
+For subset/full-set switching, modify benchmark mapping in your OpenPi LIBERO installation as documented in the main repo.
+---
+## Limitations
+- Requires non-trivial robotics simulation setup
+- Performance depends on environment/version consistency
+- Not intended for safety-critical real-world deployment without additional validation
+---
+## Citation
+```bibtex
+@misc{pan2026savlaspatiallyawareflowmatchingvisionlanguageaction,
+  title={SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning},
+  author={Xu Pan and Zhenglin Wan and Xingrui Yu and Xianwei Zheng and Youkai Ke and Ming Sun and Rui Wang and Ziwei Wang and Ivor Tsang},
+  year={2026},
+  eprint={2602.00743},
+  archivePrefix={arXiv},
+  primaryClass={cs.RO},
+  url={https://arxiv.org/abs/2602.00743}
+}
+```
+---
+## License
+Apache-2.0
+---
+## Acknowledgments
+Built upon:
+- RLinf: https://github.com/RLinf/RLinf
+- OpenPi: https://github.com/Physical-Intelligence/openpi
+- LIBERO: https://github.com/Lifelong-Robot-Learning/LIBERO
+- LIBERO-PLUS: https://github.com/sylvestf/LIBERO-plus
+- VGGT: https://github.com/facebookresearch/vggt