File size: 1,300 Bytes
fd1b56c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# FocusUI-3B

This model was introduced in the paper:

**FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection**
- 🖼️ Project Page: https://showlab.github.io/FocusUI/
- 🏠 Github Repo: https://github.com/showlab/FocusUI
- 📝 Paper: https://arxiv.org/pdf/2601.03928

### Model Zoo 

| Model | Backbone | 🤗 HuggingFace |
|-------|----------|-------------|
| FocusUI-3B | Qwen2.5-VL-3B | [https://huggingface.co/yyyang/FocusUI-3B](https://huggingface.co/yyyang/FocusUI-3B) |
| FocusUI-7B | Qwen2.5-VL-7B | [https://huggingface.co/yyyang/FocusUI-7B](https://huggingface.co/yyyang/FocusUI-7B) |
| FocusUI-2B | Qwen3-VL-2B | [https://huggingface.co/yyyang/FocusUI-Qwen3-VL-2B](https://huggingface.co/yyyang/FocusUI-Qwen3-VL-2B) |

### Dataset & Benchmarks

For the training and evaluation data, see [FocusUI-Training-Data](https://huggingface.co/datasets/yyyang/FocusUI-Training-Data) and [UI-Grounding-Benchmarks](https://huggingface.co/datasets/yyyang/UI-Grounding-Benchmarks/).

### Citation
```
@article{ouyang2025focusui,
  title   = {FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection},
  author  = {Ouyang, Mingyu and Lin, Kevin Qinghong and Shou, Mike Zheng and Ng, Hwee Tou},
  year    = {2025},
  journal = {arXiv preprint},
}
```