Improve model card structure and description
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -2,10 +2,37 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: robotics
|
| 4 |
---
|
| 5 |
-
This repository contains the CapVector official checkpoints.
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: robotics
|
| 4 |
---
|
|
|
|
| 5 |
|
| 6 |
+
# CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
|
| 7 |
|
| 8 |
+
[CapVector](https://capvector.github.io/) is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.
|
| 9 |
|
| 10 |
+
- **Paper:** [CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models](https://arxiv.org/abs/2605.10903)
|
| 11 |
+
- **Project Page:** [https://capvector.github.io](https://capvector.github.io)
|
| 12 |
+
- **Code:** [https://github.com/OpenHelix-Team/CapVector](https://github.com/OpenHelix-Team/CapVector)
|
| 13 |
+
|
| 14 |
+
## Summary
|
| 15 |
+
|
| 16 |
+
CapVector addresses the challenge where pretrained VLA models often fail to effectively improve performance or reduce adaptation costs during standard supervised finetuning. By decoupling the two core objectives of auxiliary-objective SFT—enhancing general capabilities and fitting task-specific action distributions—within the parameter space, CapVector creates a "capability vector." When merged with pretrained parameters and augmented with a lightweight orthogonal regularization loss, the model achieves performance comparable to auxiliary finetuned baselines with significantly reduced computational overhead.
|
| 17 |
+
|
| 18 |
+
## 🌟 Key Features
|
| 19 |
+
- **Efficient downstream adaptation**: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
|
| 20 |
+
- **Versatility**: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
|
| 21 |
+
- **Generalization**: CapVector is designed to transfer across tasks, environments, and robot embodiments.
|
| 22 |
+
|
| 23 |
+
## Citation
|
| 24 |
+
|
| 25 |
+
If you find this work useful, please cite:
|
| 26 |
+
|
| 27 |
+
```bibtex
|
| 28 |
+
@article{song2026capvector,
|
| 29 |
+
title = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
|
| 30 |
+
author = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
|
| 31 |
+
journal = {arXiv preprint arXiv:2605.10903},
|
| 32 |
+
year = {2026}
|
| 33 |
+
}
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Acknowledgments
|
| 37 |
+
|
| 38 |
+
CapVector builds on and interfaces with several open-source projects, including [OpenVLA-OFT](https://github.com/moojink/openvla-oft) and [OpenPI](https://github.com/Physical-Intelligence/openpi).
|