Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitattributes +3 -0
README.md +117 -0
images/example_gpa.png +3 -0
images/example_input.png +3 -0
images/example_omniparser.png +3 -0
model.pt +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+images/example_gpa.png filter=lfs diff=lfs merge=lfs -text
+images/example_input.png filter=lfs diff=lfs merge=lfs -text
+images/example_omniparser.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,117 @@

+---
+license: mit
+library_name: ultralytics
+tags:
+  - object-detection
+  - yolo
+  - gui
+  - ui-detection
+  - omniparser
+pipeline_tag: object-detection
+---
+# GPA-GUI-Detector
+A YOLO-based GUI element detection model for detecting interactive UI elements (icons, buttons, etc.) on screen for GUI Process Automation. This model is finetuned from the [OmniParser](https://github.com/microsoft/OmniParser) ecosystem.
+## Model
+The model weight file is `model.pt`. It is a YOLO model trained with the [Ultralytics](https://github.com/ultralytics/ultralytics) framework.
+## Installation
+```bash
+pip install ultralytics
+```
+## Usage
+### Basic Inference
+```python
+from ultralytics import YOLO
+model = YOLO("model.pt")
+results = model("screenshot.png")
+```
+### Detection with Custom Parameters
+```python
+from ultralytics import YOLO
+from PIL import Image
+# Load the model
+model = YOLO("model.pt")
+# Run inference with custom confidence and image size
+results = model.predict(
+    source="screenshot.png",
+    conf=0.05,        # confidence threshold
+    imgsz=640,        # input image size
+    iou=0.7,          # NMS IoU threshold
+)
+# Parse results
+boxes = results[0].boxes.xyxy.cpu().numpy()   # bounding boxes in [x1, y1, x2, y2]
+scores = results[0].boxes.conf.cpu().numpy()  # confidence scores
+# Draw results on image
+img = Image.open("screenshot.png")
+for box, score in zip(boxes, scores):
+    x1, y1, x2, y2 = box
+    print(f"Detected UI element at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}] (conf: {score:.2f})")
+# Or save the annotated image directly
+results[0].save("result.png")
+```
+### Integration with OmniParser
+```python
+import sys
+sys.path.append("/path/to/OmniParser")
+from util.utils import get_yolo_model, predict_yolo
+from PIL import Image
+model = get_yolo_model("model.pt")
+image = Image.open("screenshot.png")
+boxes, confidences, phrases = predict_yolo(
+    model=model,
+    image=image,
+    box_threshold=0.05,
+    imgsz=640,
+    scale_img=False,
+    iou_threshold=0.7,
+)
+for i, (box, conf) in enumerate(zip(boxes, confidences)):
+    print(f"Element {i}: box={box.tolist()}, confidence={conf:.2f}")
+```
+## Example
+Detection results on a sample screenshot (1920x1080) from the [ScreenSpot-Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) benchmark (`conf=0.05`, `iou=0.1`, `imgsz=640`).
+**Input Screenshot**
+<p align="center">
+  <img src="images/example_input.png" width="80%" alt="Input Screenshot"/>
+</p>
+<table>
+  <tr>
+    <th align="center">OmniParser V2</th>
+    <th align="center">GPA-GUI-Detector</th>
+  </tr>
+  <tr>
+    <td align="center"><img src="images/example_omniparser.png" width="92%" alt="OmniParser V2"/></td>
+    <td align="center"><img src="images/example_gpa.png" width="99%" alt="GPA-GUI-Detector"/></td>
+  </tr>
+</table>
+## License
+This model is released under the [MIT License](https://opensource.org/licenses/MIT).

images/example_gpa.png ADDED Viewed

Git LFS Details

SHA256: 8683810a8c7a85e1802a6f7720a108feee60945bd1ae12069d4cddbc94303115
Pointer size: 131 Bytes
Size of remote file: 990 kB

images/example_input.png ADDED Viewed

Git LFS Details

SHA256: 50dbcdfb81ddb4fd01e7dd6c47cfe3a03a362326277d35ea1149e735ef168eab
Pointer size: 131 Bytes
Size of remote file: 949 kB

images/example_omniparser.png ADDED Viewed

Git LFS Details

SHA256: 7941784c459b3547e76512715cf744fc3e8ef8656879c6a83f3ea60e37793d34
Pointer size: 131 Bytes
Size of remote file: 994 kB

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd404f25b7f329998c7ed97a67827a174dd626dc2683f6844ab33a5219c05f71
+size 40572716