nielsr HF Staff

Improve model card structure and description

36d62bd verified 4 days ago

2.66 kB

license: apache-2.0
pipeline_tag: robotics

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

CapVector is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.

Paper: CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
Project Page: https://capvector.github.io
Code: https://github.com/OpenHelix-Team/CapVector

Summary

CapVector addresses the challenge where pretrained VLA models often fail to effectively improve performance or reduce adaptation costs during standard supervised finetuning. By decoupling the two core objectives of auxiliary-objective SFT—enhancing general capabilities and fitting task-specific action distributions—within the parameter space, CapVector creates a "capability vector." When merged with pretrained parameters and augmented with a lightweight orthogonal regularization loss, the model achieves performance comparable to auxiliary finetuned baselines with significantly reduced computational overhead.

🌟 Key Features

Efficient downstream adaptation: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
Versatility: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
Generalization: CapVector is designed to transfer across tasks, environments, and robot embodiments.

Citation

If you find this work useful, please cite:

@article{song2026capvector,
  title   = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
  author  = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
  journal = {arXiv preprint arXiv:2605.10903},
  year    = {2026}
}

Acknowledgments

CapVector builds on and interfaces with several open-source projects, including OpenVLA-OFT and OpenPI.