iSHIFT / README.md
SarthakM320's picture
Upload folder using huggingface_hub
84157e2 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - gui-agent
  - multimodal
  - qwen2-vl
base_model: Qwen/Qwen2-VL-2B-Instruct
pipeline_tag: image-text-to-text

iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception (CVPR'26)

arXiv GitHub

This is the checkpoint for iSHIFT, a 2.5B parameter GUI agent that integrates latent thinking (implicit chain-of-thought) with an adaptive visual perception module built on DINOv2-Large.

Citation

@article{mehrotra2024ishift,
  title={iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception},
  author={Mehrotra, Sarthak and Rebbapragada, Sairam VC and Bonthu, Mani Hemanth Reddy and Balasubramanian, Vineeth N},
  journal={arXiv preprint arXiv:2512.22009},
  year={2024}
}