AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention
Paper • 2511.18960 • Published
How to use LiAuto-DSR/avavla-calvin-abc2d with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("LiAuto-DSR/avavla-calvin-abc2d", dtype="auto")This repository contains the AVA-VLA checkpoint trained on CALVIN ABC→D setting, as described in AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention. AVA-VLA reformulates vision-language-action policy learning from a partially observable perspective and uses a recurrent state to summarize task history for action generation.
Project Page: https://liauto-dsr.github.io/AVA-VLA-Page/
Code: https://github.com/LiAuto-DSR/AVA-VLA
@article{xiao2025ava,
title={AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention},
author={Xiao, Lei and Li, Jifeng and Gao, Juntao and Ye, Feiyang and Jin, Yan and Qian, Jingjing and Zhang, Jing and Wu, Yong and Yu, Xiaoyuan},
journal={arXiv preprint arXiv:2511.18960},
year={2025}
}