iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception (CVPR'26)

arXiv GitHub

This is the checkpoint for iSHIFT, a 2.5B parameter GUI agent that integrates latent thinking (implicit chain-of-thought) with an adaptive visual perception module built on DINOv2-Large.

Citation

@article{mehrotra2024ishift,
  title={iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception},
  author={Mehrotra, Sarthak and Rebbapragada, Sairam VC and Bonthu, Mani Hemanth Reddy and Balasubramanian, Vineeth N},
  journal={arXiv preprint arXiv:2512.22009},
  year={2024}
}
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SarthakM320/iSHIFT

Base model

Qwen/Qwen2-VL-2B
Finetuned
(338)
this model

Paper for SarthakM320/iSHIFT