FocusUI-7B

This model was introduced in the paper:

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

🖼️ Project Page: https://showlab.github.io/FocusUI/
🏠 Github Repo: https://github.com/showlab/FocusUI
📝 Paper: https://arxiv.org/pdf/2601.03928

Model Zoo

Model	Backbone	🤗 HuggingFace
FocusUI-3B	Qwen2.5-VL-3B	https://huggingface.co/yyyang/FocusUI-3B
FocusUI-7B	Qwen2.5-VL-7B	https://huggingface.co/yyyang/FocusUI-7B
FocusUI-2B	Qwen3-VL-2B	https://huggingface.co/yyyang/FocusUI-Qwen3-VL-2B

Dataset & Benchmarks

For the training and evaluation data, see FocusUI-Training-Data and UI-Grounding-Benchmarks.

Citation

@article{ouyang2025focusui,
  title   = {FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection},
  author  = {Ouyang, Mingyu and Lin, Kevin Qinghong and Shou, Mike Zheng and Ng, Hwee Tou},
  year    = {2025},
  journal = {arXiv preprint},
}