File size: 3,014 Bytes

---

license: mit
library_name: ultralytics
tags:
  - object-detection
  - yolo
  - gui
  - ui-detection
  - omniparser
pipeline_tag: object-detection
---


# GPA-GUI-Detector

A YOLO-based GUI element detection model for detecting interactive UI elements (icons, buttons, etc.) on screen for GUI Process Automation. This model is finetuned from the [OmniParser](https://github.com/microsoft/OmniParser) ecosystem.

## Model

The model weight file is `model.pt`. It is a YOLO model trained with the [Ultralytics](https://github.com/ultralytics/ultralytics) framework.

## Installation

```bash

pip install ultralytics

```

## Usage

### Basic Inference

```python

from ultralytics import YOLO



model = YOLO("model.pt")

results = model("screenshot.png")

```

### Detection with Custom Parameters

```python

from ultralytics import YOLO

from PIL import Image



# Load the model

model = YOLO("model.pt")



# Run inference with custom confidence and image size

results = model.predict(

    source="screenshot.png",

    conf=0.05,        # confidence threshold

    imgsz=640,        # input image size

    iou=0.7,          # NMS IoU threshold

)



# Parse results

boxes = results[0].boxes.xyxy.cpu().numpy()   # bounding boxes in [x1, y1, x2, y2]

scores = results[0].boxes.conf.cpu().numpy()  # confidence scores



# Draw results on image

img = Image.open("screenshot.png")

for box, score in zip(boxes, scores):

    x1, y1, x2, y2 = box

    print(f"Detected UI element at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}] (conf: {score:.2f})")



# Or save the annotated image directly

results[0].save("result.png")

```

### Integration with OmniParser

```python

import sys

sys.path.append("/path/to/OmniParser")



from util.utils import get_yolo_model, predict_yolo

from PIL import Image



model = get_yolo_model("model.pt")

image = Image.open("screenshot.png")



boxes, confidences, phrases = predict_yolo(

    model=model,

    image=image,

    box_threshold=0.05,

    imgsz=640,

    scale_img=False,

    iou_threshold=0.7,

)



for i, (box, conf) in enumerate(zip(boxes, confidences)):

    print(f"Element {i}: box={box.tolist()}, confidence={conf:.2f}")

```

## Example

Detection results on a sample screenshot (1920x1080) from the [ScreenSpot-Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) benchmark (`conf=0.05`, `iou=0.1`, `imgsz=1280`).

**Input Screenshot**

<p align="center">
  <img src="images/example_input.png" width="80%" alt="Input Screenshot"/>
</p>

<table>
  <tr>
    <th align="center">OmniParser V2</th>

    <th align="center">GPA-GUI-Detector</th>

  </tr>

  <tr>

    <td align="center"><img src="images/example_omniparser.png" width="92%" alt="OmniParser V2"/></td>

    <td align="center"><img src="images/example_gpa.png" width="99%" alt="GPA-GUI-Detector"/></td>

  </tr>

</table>


## License

This model is released under the [MIT License](https://opensource.org/licenses/MIT).