metadata
license: mit
library_name: ultralytics
tags:
- object-detection
- yolo
- gui
- ui-detection
- omniparser
pipeline_tag: object-detection
GPA-GUI-Detector
A YOLO-based GUI element detection model for detecting interactive UI elements (icons, buttons, etc.) on screen for GUI Process Automation. This model is finetuned from the OmniParser ecosystem.
Model
The model weight file is model.pt. It is a YOLO model trained with the Ultralytics framework.
Installation
pip install ultralytics
Usage
Basic Inference
from ultralytics import YOLO
model = YOLO("model.pt")
results = model("screenshot.png")
Detection with Custom Parameters
from ultralytics import YOLO
from PIL import Image
# Load the model
model = YOLO("model.pt")
# Run inference with custom confidence and image size
results = model.predict(
source="screenshot.png",
conf=0.05, # confidence threshold
imgsz=640, # input image size
iou=0.7, # NMS IoU threshold
)
# Parse results
boxes = results[0].boxes.xyxy.cpu().numpy() # bounding boxes in [x1, y1, x2, y2]
scores = results[0].boxes.conf.cpu().numpy() # confidence scores
# Draw results on image
img = Image.open("screenshot.png")
for box, score in zip(boxes, scores):
x1, y1, x2, y2 = box
print(f"Detected UI element at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}] (conf: {score:.2f})")
# Or save the annotated image directly
results[0].save("result.png")
Integration with OmniParser
import sys
sys.path.append("/path/to/OmniParser")
from util.utils import get_yolo_model, predict_yolo
from PIL import Image
model = get_yolo_model("model.pt")
image = Image.open("screenshot.png")
boxes, confidences, phrases = predict_yolo(
model=model,
image=image,
box_threshold=0.05,
imgsz=640,
scale_img=False,
iou_threshold=0.7,
)
for i, (box, conf) in enumerate(zip(boxes, confidences)):
print(f"Element {i}: box={box.tolist()}, confidence={conf:.2f}")
Example
Detection results on a sample screenshot (1920x1080) from the ScreenSpot-Pro benchmark (conf=0.05, iou=0.1, imgsz=1280).
Input Screenshot
| OmniParser V2 | GPA-GUI-Detector |
|---|---|
![]() |
![]() |
License
This model is released under the MIT License.

