Instructions to use showlab/ShowUI-pi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use showlab/ShowUI-pi with LeRobot:
- Notebooks
- Google Colab
- Kaggle
metadata
language:
- en
library_name: lerobot
pipeline_tag: robotics
tags:
- vision-language-action
- gui-agent
- flow-matching
- drag-and-drop
- lerobot
inference: false
ShowUI-π
ShowUI-π is a Vision-Language-Action model for GUI drag-and-drop, built on SmolVLA (500M). It uses a flow-matching action head to predict drag trajectories from a single screenshot and a natural-language instruction.
Paper: ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Code: https://github.com/showlab/showui-pi
Training Data: showlab/ShowUI-pi-data
Evaluation Benchmark: h-siyuan/ScreenDrag
Quick start
git clone https://github.com/showlab/showui-pi.git
cd showui-pi
pip install -e .
Inference
import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors
policy = SmolVLAPolicy.from_pretrained("showlab/ShowUI-pi").to("cuda").eval()
preprocessor, postprocessor = make_pre_post_processors(
policy.config,
"showlab/ShowUI-pi",
preprocessor_overrides={"device_processor": {"device": "cuda"}},
)
Training
bash scripts/train_showui_pi.sh
See the training script for all flags and defaults.
Evaluation
DEX Benchmark
PYTHONPATH=lerobot/src \
python scripts/eval_dex.py \
--ckpt <path/to/checkpoint> \
--output_dir outputs/eval_dex
ScreenSpot-Pro
PYTHONPATH=lerobot/src \
python scripts/eval_screenspot_pro.py \
--ckpt <path/to/checkpoint> \
--annotations_root <path/to/ScreenSpot-Pro/annotations> \
--images_root <path/to/ScreenSpot-Pro/images>
Citation
@article{hu2025showui,
title={ShowUI-$$\backslash$pi $: Flow-based Generative Models as GUI Dexterous Hands},
author={Hu, Siyuan and Lin, Kevin Qinghong and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2512.24965},
year={2025}
}