| --- |
| license: apache-2.0 |
| pipeline_tag: robotics |
| --- |
| |
| # CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models |
|
|
| [CapVector](https://capvector.github.io/) is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability. |
|
|
| - **Paper:** [CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models](https://arxiv.org/abs/2605.10903) |
| - **Project Page:** [https://capvector.github.io](https://capvector.github.io) |
| - **Code:** [https://github.com/OpenHelix-Team/CapVector](https://github.com/OpenHelix-Team/CapVector) |
|
|
| ## Summary |
|
|
| CapVector addresses the challenge where pretrained VLA models often fail to effectively improve performance or reduce adaptation costs during standard supervised finetuning. By decoupling the two core objectives of auxiliary-objective SFT—enhancing general capabilities and fitting task-specific action distributions—within the parameter space, CapVector creates a "capability vector." When merged with pretrained parameters and augmented with a lightweight orthogonal regularization loss, the model achieves performance comparable to auxiliary finetuned baselines with significantly reduced computational overhead. |
|
|
| ## 🌟 Key Features |
| - **Efficient downstream adaptation**: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT. |
| - **Versatility**: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones. |
| - **Generalization**: CapVector is designed to transfer across tasks, environments, and robot embodiments. |
|
|
| ## Citation |
|
|
| If you find this work useful, please cite: |
|
|
| ```bibtex |
| @article{song2026capvector, |
| title = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models}, |
| author = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang}, |
| journal = {arXiv preprint arXiv:2605.10903}, |
| year = {2026} |
| } |
| ``` |
|
|
| ## Acknowledgments |
|
|
| CapVector builds on and interfaces with several open-source projects, including [OpenVLA-OFT](https://github.com/moojink/openvla-oft) and [OpenPI](https://github.com/Physical-Intelligence/openpi). |