haofuly
/

capvector_models_collection

Robotics

Safetensors

Model card Files Files and versions

xet

Community

Improve model card structure and description

by nielsr HF Staff - opened 1 day ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+31

-4

Files changed (1) hide show

README.md +31 -4

README.md CHANGED Viewed

@@ -2,10 +2,37 @@
 license: apache-2.0
 pipeline_tag: robotics
 ---
-This repository contains the CapVector official checkpoints.
-Paper: [CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models](https://arxiv.org/abs/2605.10903)
-Project page:https://capvector.github.io
-Code:https://github.com/OpenHelix-Team/CapVector

 license: apache-2.0
 pipeline_tag: robotics
 ---
+# CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
+[CapVector](https://capvector.github.io/) is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.
+- **Paper:** [CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models](https://arxiv.org/abs/2605.10903)
+- **Project Page:** [https://capvector.github.io](https://capvector.github.io)
+- **Code:** [https://github.com/OpenHelix-Team/CapVector](https://github.com/OpenHelix-Team/CapVector)
+## Summary
+CapVector addresses the challenge where pretrained VLA models often fail to effectively improve performance or reduce adaptation costs during standard supervised finetuning. By decoupling the two core objectives of auxiliary-objective SFT—enhancing general capabilities and fitting task-specific action distributions—within the parameter space, CapVector creates a "capability vector." When merged with pretrained parameters and augmented with a lightweight orthogonal regularization loss, the model achieves performance comparable to auxiliary finetuned baselines with significantly reduced computational overhead.
+## 🌟 Key Features
+- **Efficient downstream adaptation**: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
+- **Versatility**: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
+- **Generalization**: CapVector is designed to transfer across tasks, environments, and robot embodiments.
+## Citation
+If you find this work useful, please cite:
+```bibtex
+@article{song2026capvector,
+  title   = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
+  author  = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
+  journal = {arXiv preprint arXiv:2605.10903},
+  year    = {2026}
+}
+```
+## Acknowledgments
+CapVector builds on and interfaces with several open-source projects, including [OpenVLA-OFT](https://github.com/moojink/openvla-oft) and [OpenPI](https://github.com/Physical-Intelligence/openpi).