--- license: mit library_name: ultralytics tags: - object-detection - yolo - gui - ui-detection - omniparser pipeline_tag: object-detection --- # GPA-GUI-Detector A YOLO-based GUI element detection model for detecting interactive UI elements (icons, buttons, etc.) on screen for GUI Process Automation. This model is finetuned from the [OmniParser](https://github.com/microsoft/OmniParser) ecosystem. ## Model The model weight file is `model.pt`. It is a YOLO model trained with the [Ultralytics](https://github.com/ultralytics/ultralytics) framework. ## Installation ```bash pip install ultralytics ``` ## Usage ### Basic Inference ```python from ultralytics import YOLO model = YOLO("model.pt") results = model("screenshot.png") ``` ### Detection with Custom Parameters ```python from ultralytics import YOLO from PIL import Image # Load the model model = YOLO("model.pt") # Run inference with custom confidence and image size results = model.predict( source="screenshot.png", conf=0.05, # confidence threshold imgsz=640, # input image size iou=0.7, # NMS IoU threshold ) # Parse results boxes = results[0].boxes.xyxy.cpu().numpy() # bounding boxes in [x1, y1, x2, y2] scores = results[0].boxes.conf.cpu().numpy() # confidence scores # Draw results on image img = Image.open("screenshot.png") for box, score in zip(boxes, scores): x1, y1, x2, y2 = box print(f"Detected UI element at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}] (conf: {score:.2f})") # Or save the annotated image directly results[0].save("result.png") ``` ### Integration with OmniParser ```python import sys sys.path.append("/path/to/OmniParser") from util.utils import get_yolo_model, predict_yolo from PIL import Image model = get_yolo_model("model.pt") image = Image.open("screenshot.png") boxes, confidences, phrases = predict_yolo( model=model, image=image, box_threshold=0.05, imgsz=640, scale_img=False, iou_threshold=0.7, ) for i, (box, conf) in enumerate(zip(boxes, confidences)): print(f"Element {i}: box={box.tolist()}, confidence={conf:.2f}") ``` ## Example Detection results on a sample screenshot (1920x1080) from the [ScreenSpot-Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) benchmark (`conf=0.05`, `iou=0.1`, `imgsz=1280`). **Input Screenshot**
| OmniParser V2 | GPA-GUI-Detector |
|---|---|
![]() |
![]() |