File size: 2,261 Bytes
7555b5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
language:
- en
library_name: lerobot
pipeline_tag: robotics
tags:
- vision-language-action
- gui-agent
- flow-matching
- drag-and-drop
- lerobot
inference: false
---

# ShowUI-π

ShowUI-π is a Vision-Language-Action model for GUI drag-and-drop, built on [SmolVLA](https://huggingface.co/lerobot/smolvla_base) (500M). It uses a flow-matching action head to predict drag trajectories from a single screenshot and a natural-language instruction.

**Paper:** [ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands](https://arxiv.org/abs/2512.24965)

**Code:** [https://github.com/showlab/showui-pi](https://github.com/showlab/showui-pi)

**Training Data:** [showlab/ShowUI-pi-data](https://huggingface.co/datasets/showlab/ShowUI-pi-data)

**Evaluation Benchmark:** [h-siyuan/ScreenDrag](https://huggingface.co/datasets/h-siyuan/ScreenDrag)

## Quick start

```bash
git clone https://github.com/showlab/showui-pi.git
cd showui-pi
pip install -e .
```

### Inference

```python
import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

policy = SmolVLAPolicy.from_pretrained("showlab/ShowUI-pi").to("cuda").eval()

preprocessor, postprocessor = make_pre_post_processors(
    policy.config,
    "showlab/ShowUI-pi",
    preprocessor_overrides={"device_processor": {"device": "cuda"}},
)
```

## Training

```bash
bash scripts/train_showui_pi.sh
```

See the [training script](https://github.com/showlab/showui-pi/blob/main/scripts/train_showui_pi.sh) for all flags and defaults.

## Evaluation

### DEX Benchmark

```bash
PYTHONPATH=lerobot/src \
python scripts/eval_dex.py \
    --ckpt <path/to/checkpoint> \
    --output_dir outputs/eval_dex
```

### ScreenSpot-Pro

```bash
PYTHONPATH=lerobot/src \
python scripts/eval_screenspot_pro.py \
    --ckpt <path/to/checkpoint> \
    --annotations_root <path/to/ScreenSpot-Pro/annotations> \
    --images_root <path/to/ScreenSpot-Pro/images>
```

## Citation

```bibtex
@article{hu2025showui,
  title={ShowUI-$$\backslash$pi $: Flow-based Generative Models as GUI Dexterous Hands},
  author={Hu, Siyuan and Lin, Kevin Qinghong and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2512.24965},
  year={2025}
}
```