showlab
/

ShowUI-pi

vision-language-action

Model card Files Files and versions

ShowUI-pi / README.md

h-siyuan's picture

Upload README.md with huggingface_hub

7555b5a verified 25 days ago

|

history blame contribute delete

2.26 kB

	---
	language:
	- en
	library_name: lerobot
	pipeline_tag: robotics
	tags:
	- vision-language-action
	- gui-agent
	- flow-matching
	- drag-and-drop
	- lerobot
	inference: false
	---

	# ShowUI-π

	ShowUI-π is a Vision-Language-Action model for GUI drag-and-drop, built on [SmolVLA](https://huggingface.co/lerobot/smolvla_base) (500M). It uses a flow-matching action head to predict drag trajectories from a single screenshot and a natural-language instruction.

	Paper: [ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands](https://arxiv.org/abs/2512.24965)

	Code: [https://github.com/showlab/showui-pi](https://github.com/showlab/showui-pi)

	Training Data: [showlab/ShowUI-pi-data](https://huggingface.co/datasets/showlab/ShowUI-pi-data)

	Evaluation Benchmark: [h-siyuan/ScreenDrag](https://huggingface.co/datasets/h-siyuan/ScreenDrag)

	## Quick start

	```bash
	git clone https://github.com/showlab/showui-pi.git
	cd showui-pi
	pip install -e .
	```

	### Inference

	```python
	import torch
	from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
	from lerobot.policies.factory import make_pre_post_processors

	policy = SmolVLAPolicy.from_pretrained("showlab/ShowUI-pi").to("cuda").eval()

	preprocessor, postprocessor = make_pre_post_processors(
	policy.config,
	"showlab/ShowUI-pi",
	preprocessor_overrides={"device_processor": {"device": "cuda"}},
	)
	```

	## Training

	```bash
	bash scripts/train_showui_pi.sh
	```

	See the [training script](https://github.com/showlab/showui-pi/blob/main/scripts/train_showui_pi.sh) for all flags and defaults.

	## Evaluation

	### DEX Benchmark

	```bash
	PYTHONPATH=lerobot/src \
	python scripts/eval_dex.py \
	--ckpt <path/to/checkpoint> \
	--output_dir outputs/eval_dex
	```

	### ScreenSpot-Pro

	```bash
	PYTHONPATH=lerobot/src \
	python scripts/eval_screenspot_pro.py \
	--ckpt <path/to/checkpoint> \
	--annotations_root <path/to/ScreenSpot-Pro/annotations> \
	--images_root <path/to/ScreenSpot-Pro/images>
	```

	## Citation

	```bibtex
	@article{hu2025showui,
	title={ShowUI-$$\backslash$pi $: Flow-based Generative Models as GUI Dexterous Hands},
	author={Hu, Siyuan and Lin, Kevin Qinghong and Shou, Mike Zheng},
	journal={arXiv preprint arXiv:2512.24965},
	year={2025}
	}
	```