Robotics
Transformers

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

This repository contains the AVA-VLA checkpoint trained on CALVIN ABC→D setting, as described in AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention. AVA-VLA reformulates vision-language-action policy learning from a partially observable perspective and uses a recurrent state to summarize task history for action generation.

Project Page: https://liauto-dsr.github.io/AVA-VLA-Page/

Code: https://github.com/LiAuto-DSR/AVA-VLA

Citation

@article{xiao2025ava,
  title={AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention},
  author={Xiao, Lei and Li, Jifeng and Gao, Juntao and Ye, Feiyang and Jin, Yan and Qian, Jingjing and Zhang, Jing and Wu, Yong and Yu, Xiaoyuan},
  journal={arXiv preprint arXiv:2511.18960},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for LiAuto-DSR/avavla-calvin-abc2d