File size: 3,374 Bytes

5ad6790

---
license: apache-2.0
language:
- en
- zh
library_name: transformers
tags:
- robotics
- vision-language-action
- reinforcement-learning
- embodied-ai
- openpi
- rlinf
pipeline_tag: reinforcement-learning
---

# SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching VLA Models

SA-VLA is a spatially-aware reinforcement learning approach for flow-matching Vision-Language-Action (VLA) models.  
It is developed on top of the RLinf framework and targets robust embodied manipulation with stronger spatial generalization.

- 📄 Paper: https://arxiv.org/abs/2602.00743  
- 🌐 Project Page: https://xupan.top/Projects/savla  
- 🧩 Codebase: https://github.com/TwSphinx54/SA-VLA  
- 🏗️ RL Framework: https://github.com/RLinf/RLinf

---

## Model Summary

SA-VLA fuses visual tokens and spatial tokens into geometry-aware embeddings, then optimizes the policy via:
1. **Step-level dense rewards**
2. **Spatially-conditioned exploration (SCAN)**
3. **RL fine-tuning on embodied benchmarks**

This repository provides model weights used in SA-VLA experiments.

---

## Intended Use

- RL fine-tuning and evaluation for embodied manipulation tasks
- Experiments on LIBERO / LIBERO-PLUS style benchmarks
- Research on spatial reasoning in VLA post-training

> For complete environment setup, training scripts, and benchmark integration, use the full code repository:
> https://github.com/TwSphinx54/SA-VLA

---

## Quick Start (with SA-VLA codebase)

### 1) Clone project
```bash
git clone https://github.com/TwSphinx54/SA-VLA.git
cd SA-VLA
```

### 2) Setup environment
Follow the RLinf setup in:
- `README.RLinf.md` (framework/environment)
- `scripts/setup_container.sh` (extra container setup)

### 3) Place weights
Put downloaded checkpoints under:
```text
weights/
```

### 4) Run training / evaluation
```bash
# RL training
bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05

# Evaluation
bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval
```

---

## Recommended Weight Layout

```text
weights
|-- Pi05-LIBERO
|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
`-- RLinf-Pi05-SFT
```

---

## Dataset Notes

The SA-VLA experiments rely on LIBERO-family data and benchmark configs.  
For subset/full-set switching, modify benchmark mapping in your OpenPi LIBERO installation as documented in the main repo.

---

## Limitations

- Requires non-trivial robotics simulation setup
- Performance depends on environment/version consistency
- Not intended for safety-critical real-world deployment without additional validation

---

## Citation

```bibtex
@misc{pan2026savlaspatiallyawareflowmatchingvisionlanguageaction,
  title={SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning},
  author={Xu Pan and Zhenglin Wan and Xingrui Yu and Xianwei Zheng and Youkai Ke and Ming Sun and Rui Wang and Ziwei Wang and Ivor Tsang},
  year={2026},
  eprint={2602.00743},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2602.00743}
}
```

---

## License

Apache-2.0

---

## Acknowledgments

Built upon:
- RLinf: https://github.com/RLinf/RLinf
- OpenPi: https://github.com/Physical-Intelligence/openpi
- LIBERO: https://github.com/Lifelong-Robot-Learning/LIBERO
- LIBERO-PLUS: https://github.com/sylvestf/LIBERO-plus
- VGGT: https://github.com/facebookresearch/vggt