Improve model card structure and description
Browse filesHi! I'm Niels from the Hugging Face community science team. I'm opening this PR to enhance your model card. I've added a detailed summary of the CapVector method and its key features based on your research paper and GitHub repository. This will help the community better understand how capability vectors enable more efficient and versatile VLA fine-tuning across different environments and embodiments.
README.md
CHANGED
|
@@ -2,10 +2,37 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: robotics
|
| 4 |
---
|
| 5 |
-
This repository contains the CapVector official checkpoints.
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: robotics
|
| 4 |
---
|
|
|
|
| 5 |
|
| 6 |
+
# CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
|
| 7 |
|
| 8 |
+
[CapVector](https://capvector.github.io/) is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.
|
| 9 |
|
| 10 |
+
- **Paper:** [CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models](https://arxiv.org/abs/2605.10903)
|
| 11 |
+
- **Project Page:** [https://capvector.github.io](https://capvector.github.io)
|
| 12 |
+
- **Code:** [https://github.com/OpenHelix-Team/CapVector](https://github.com/OpenHelix-Team/CapVector)
|
| 13 |
+
|
| 14 |
+
## Summary
|
| 15 |
+
|
| 16 |
+
CapVector addresses the challenge where pretrained VLA models often fail to effectively improve performance or reduce adaptation costs during standard supervised finetuning. By decoupling the two core objectives of auxiliary-objective SFT—enhancing general capabilities and fitting task-specific action distributions—within the parameter space, CapVector creates a "capability vector." When merged with pretrained parameters and augmented with a lightweight orthogonal regularization loss, the model achieves performance comparable to auxiliary finetuned baselines with significantly reduced computational overhead.
|
| 17 |
+
|
| 18 |
+
## 🌟 Key Features
|
| 19 |
+
- **Efficient downstream adaptation**: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
|
| 20 |
+
- **Versatility**: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
|
| 21 |
+
- **Generalization**: CapVector is designed to transfer across tasks, environments, and robot embodiments.
|
| 22 |
+
|
| 23 |
+
## Citation
|
| 24 |
+
|
| 25 |
+
If you find this work useful, please cite:
|
| 26 |
+
|
| 27 |
+
```bibtex
|
| 28 |
+
@article{song2026capvector,
|
| 29 |
+
title = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
|
| 30 |
+
author = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
|
| 31 |
+
journal = {arXiv preprint arXiv:2605.10903},
|
| 32 |
+
year = {2026}
|
| 33 |
+
}
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Acknowledgments
|
| 37 |
+
|
| 38 |
+
CapVector builds on and interfaces with several open-source projects, including [OpenVLA-OFT](https://github.com/moojink/openvla-oft) and [OpenPI](https://github.com/Physical-Intelligence/openpi).
|