AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

This repository contains the AVA-VLA checkpoint trained on CALVIN ABC→D setting, as described in AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention. AVA-VLA reformulates vision-language-action policy learning from a partially observable perspective and uses a recurrent state to summarize task history for action generation.

Project Page: https://liauto-dsr.github.io/AVA-VLA-Page/

Code: https://github.com/LiAuto-DSR/AVA-VLA

Citation

@article{xiao2025ava,
  title={AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention},
  author={Xiao, Lei and Li, Jifeng and Gao, Juntao and Ye, Feiyang and Jin, Yan and Qian, Jingjing and Zhang, Jing and Wu, Yong and Yu, Xiaoyuan},
  journal={arXiv preprint arXiv:2511.18960},
  year={2025}
}

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16

Video Preview

Robotics

Paper for LiAuto-DSR/avavla-calvin-abc2d

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

Paper • 2511.18960 • Published Apr 10