metadata
license: apache-2.0
language:
- en
tags:
- gui-agent
- multimodal
- qwen2-vl
base_model: Qwen/Qwen2-VL-2B-Instruct
pipeline_tag: image-text-to-text
iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception (CVPR'26)
This is the checkpoint for iSHIFT, a 2.5B parameter GUI agent that integrates latent thinking (implicit chain-of-thought) with an adaptive visual perception module built on DINOv2-Large.
Citation
@article{mehrotra2024ishift,
title={iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception},
author={Mehrotra, Sarthak and Rebbapragada, Sairam VC and Bonthu, Mani Hemanth Reddy and Balasubramanian, Vineeth N},
journal={arXiv preprint arXiv:2512.22009},
year={2024}
}