haofuly
/

capvector_models_collection

Model card Files Files and versions

capvector_models_collection / README.md

nielsr's picture

nielsr HF Staff

Improve model card structure and description

36d62bd verified 5 days ago

|

2.66 kB

	---
	license: apache-2.0
	pipeline_tag: robotics
	---

	# CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

	[CapVector](https://capvector.github.io/) is a training recipe for vision-language-action (VLA) models that extracts a transferable capability vector from the parameter difference between auxiliary-objective SFT methods and standard SFT methods. This vector is merged into a pretrained VLA to form a stronger initialization, and downstream adaptation uses standard SFT with a lightweight orthogonal regularization loss to preserve the injected capability.

	- Paper: [CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models](https://arxiv.org/abs/2605.10903)
	- Project Page: [https://capvector.github.io](https://capvector.github.io)
	- Code: [https://github.com/OpenHelix-Team/CapVector](https://github.com/OpenHelix-Team/CapVector)

	## Summary

	CapVector addresses the challenge where pretrained VLA models often fail to effectively improve performance or reduce adaptation costs during standard supervised finetuning. By decoupling the two core objectives of auxiliary-objective SFT—enhancing general capabilities and fitting task-specific action distributions—within the parameter space, CapVector creates a "capability vector." When merged with pretrained parameters and augmented with a lightweight orthogonal regularization loss, the model achieves performance comparable to auxiliary finetuned baselines with significantly reduced computational overhead.

	## 🌟 Key Features
	- Efficient downstream adaptation: CapVector recovers much of the benefit of auxiliary-objective SFT methods, while keeping the downstream overhead close to standard SFT.
	- Versatility: CapVector fits for OpenVLA-based, OpenPi-based, and StarVLA-based backbones.
	- Generalization: CapVector is designed to transfer across tasks, environments, and robot embodiments.

	## Citation

	If you find this work useful, please cite:

	```bibtex
	@article{song2026capvector,
	title = {CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models},
	author = {Song, Wenxuan and Zhao, Han and Li, Fuhao and Zhou, Ziyang and Wang, Xi and Lyu, Jing and Ding, Pengxiang and Wang, Yan and Wang, Donglin and Li, Haoang},
	journal = {arXiv preprint arXiv:2605.10903},
	year = {2026}
	}
	```

	## Acknowledgments

	CapVector builds on and interfaces with several open-source projects, including [OpenVLA-OFT](https://github.com/moojink/openvla-oft) and [OpenPI](https://github.com/Physical-Intelligence/openpi).