File size: 1,520 Bytes
96a8501 5bdd604 96a8501 5bdd604 96a8501 5bdd604 96a8501 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
license: mit
tags:
- robotics
- multimodal
- finetuning
- vla
---
# Model Card
These are the model checkpoints used in the paper *VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models*.
Currently we release the Qwen2.5 VLM checkpoints as well as necessary networks for training. We will release all checkpoints after the paper gets accepted.
## Source
- Project Page: https://nus-lins-lab.github.io/vlaos/
- Paper: https://arxiv.org/abs/2506.17561
- Code: https://github.com/HeegerGao/VLA-OS
- Data: https://huggingface.co/datasets/Linslab/VLA-OS-Dataset
## Usage
Ensure you have installed git lfs:
```bash
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
```
Then download this repo:
```bash
git clone https://huggingface.co/Linslab/VLA-OS
```
## Model Description
Please refer to the codebase for more description and usage.
## Citation
If you find our work helpful, please cite us:
```bibtex
@article{gao2025vlaos,
title = {VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models},
author = {Gao, Chongkai and Liu, Zixuan and Chi, Zhenghao and Huang, Junshan and Fei, Xin and Hou, Yiwen and Zhang, Yuxuan and Lin, Yudi and Fang, Zhirui and Jiang, Zeyu and Shao, Lin},
journal = {arXiv preprint arXiv:2506.17561},
year = {2025},
url = {https://arxiv.org/abs/2506.17561}
}
```
Thank you! |