File size: 3,374 Bytes
5ad6790 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | ---
license: apache-2.0
language:
- en
- zh
library_name: transformers
tags:
- robotics
- vision-language-action
- reinforcement-learning
- embodied-ai
- openpi
- rlinf
pipeline_tag: reinforcement-learning
---
# SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching VLA Models
SA-VLA is a spatially-aware reinforcement learning approach for flow-matching Vision-Language-Action (VLA) models.
It is developed on top of the RLinf framework and targets robust embodied manipulation with stronger spatial generalization.
- 📄 Paper: https://arxiv.org/abs/2602.00743
- 🌐 Project Page: https://xupan.top/Projects/savla
- 🧩 Codebase: https://github.com/TwSphinx54/SA-VLA
- 🏗️ RL Framework: https://github.com/RLinf/RLinf
---
## Model Summary
SA-VLA fuses visual tokens and spatial tokens into geometry-aware embeddings, then optimizes the policy via:
1. **Step-level dense rewards**
2. **Spatially-conditioned exploration (SCAN)**
3. **RL fine-tuning on embodied benchmarks**
This repository provides model weights used in SA-VLA experiments.
---
## Intended Use
- RL fine-tuning and evaluation for embodied manipulation tasks
- Experiments on LIBERO / LIBERO-PLUS style benchmarks
- Research on spatial reasoning in VLA post-training
> For complete environment setup, training scripts, and benchmark integration, use the full code repository:
> https://github.com/TwSphinx54/SA-VLA
---
## Quick Start (with SA-VLA codebase)
### 1) Clone project
```bash
git clone https://github.com/TwSphinx54/SA-VLA.git
cd SA-VLA
```
### 2) Setup environment
Follow the RLinf setup in:
- `README.RLinf.md` (framework/environment)
- `scripts/setup_container.sh` (extra container setup)
### 3) Place weights
Put downloaded checkpoints under:
```text
weights/
```
### 4) Run training / evaluation
```bash
# RL training
bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05
# Evaluation
bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval
```
---
## Recommended Weight Layout
```text
weights
|-- Pi05-LIBERO
|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
`-- RLinf-Pi05-SFT
```
---
## Dataset Notes
The SA-VLA experiments rely on LIBERO-family data and benchmark configs.
For subset/full-set switching, modify benchmark mapping in your OpenPi LIBERO installation as documented in the main repo.
---
## Limitations
- Requires non-trivial robotics simulation setup
- Performance depends on environment/version consistency
- Not intended for safety-critical real-world deployment without additional validation
---
## Citation
```bibtex
@misc{pan2026savlaspatiallyawareflowmatchingvisionlanguageaction,
title={SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning},
author={Xu Pan and Zhenglin Wan and Xingrui Yu and Xianwei Zheng and Youkai Ke and Ming Sun and Rui Wang and Ziwei Wang and Ivor Tsang},
year={2026},
eprint={2602.00743},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.00743}
}
```
---
## License
Apache-2.0
---
## Acknowledgments
Built upon:
- RLinf: https://github.com/RLinf/RLinf
- OpenPi: https://github.com/Physical-Intelligence/openpi
- LIBERO: https://github.com/Lifelong-Robot-Learning/LIBERO
- LIBERO-PLUS: https://github.com/sylvestf/LIBERO-plus
- VGGT: https://github.com/facebookresearch/vggt |