HelloKKMe commited on
Commit
2115c3c
·
verified ·
1 Parent(s): 884e333

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ images/example_gpa.png filter=lfs diff=lfs merge=lfs -text
37
+ images/example_input.png filter=lfs diff=lfs merge=lfs -text
38
+ images/example_omniparser.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: ultralytics
4
+ tags:
5
+ - object-detection
6
+ - yolo
7
+ - gui
8
+ - ui-detection
9
+ - omniparser
10
+ pipeline_tag: object-detection
11
+ ---
12
+
13
+ # GPA-GUI-Detector
14
+
15
+ A YOLO-based GUI element detection model for detecting interactive UI elements (icons, buttons, etc.) on screen for GUI Process Automation. This model is finetuned from the [OmniParser](https://github.com/microsoft/OmniParser) ecosystem.
16
+
17
+ ## Model
18
+
19
+ The model weight file is `model.pt`. It is a YOLO model trained with the [Ultralytics](https://github.com/ultralytics/ultralytics) framework.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install ultralytics
25
+ ```
26
+
27
+ ## Usage
28
+
29
+ ### Basic Inference
30
+
31
+ ```python
32
+ from ultralytics import YOLO
33
+
34
+ model = YOLO("model.pt")
35
+ results = model("screenshot.png")
36
+ ```
37
+
38
+ ### Detection with Custom Parameters
39
+
40
+ ```python
41
+ from ultralytics import YOLO
42
+ from PIL import Image
43
+
44
+ # Load the model
45
+ model = YOLO("model.pt")
46
+
47
+ # Run inference with custom confidence and image size
48
+ results = model.predict(
49
+ source="screenshot.png",
50
+ conf=0.05, # confidence threshold
51
+ imgsz=640, # input image size
52
+ iou=0.7, # NMS IoU threshold
53
+ )
54
+
55
+ # Parse results
56
+ boxes = results[0].boxes.xyxy.cpu().numpy() # bounding boxes in [x1, y1, x2, y2]
57
+ scores = results[0].boxes.conf.cpu().numpy() # confidence scores
58
+
59
+ # Draw results on image
60
+ img = Image.open("screenshot.png")
61
+ for box, score in zip(boxes, scores):
62
+ x1, y1, x2, y2 = box
63
+ print(f"Detected UI element at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}] (conf: {score:.2f})")
64
+
65
+ # Or save the annotated image directly
66
+ results[0].save("result.png")
67
+ ```
68
+
69
+ ### Integration with OmniParser
70
+
71
+ ```python
72
+ import sys
73
+ sys.path.append("/path/to/OmniParser")
74
+
75
+ from util.utils import get_yolo_model, predict_yolo
76
+ from PIL import Image
77
+
78
+ model = get_yolo_model("model.pt")
79
+ image = Image.open("screenshot.png")
80
+
81
+ boxes, confidences, phrases = predict_yolo(
82
+ model=model,
83
+ image=image,
84
+ box_threshold=0.05,
85
+ imgsz=640,
86
+ scale_img=False,
87
+ iou_threshold=0.7,
88
+ )
89
+
90
+ for i, (box, conf) in enumerate(zip(boxes, confidences)):
91
+ print(f"Element {i}: box={box.tolist()}, confidence={conf:.2f}")
92
+ ```
93
+
94
+ ## Example
95
+
96
+ Detection results on a sample screenshot (1920x1080) from the [ScreenSpot-Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) benchmark (`conf=0.05`, `iou=0.1`, `imgsz=640`).
97
+
98
+ **Input Screenshot**
99
+
100
+ <p align="center">
101
+ <img src="images/example_input.png" width="80%" alt="Input Screenshot"/>
102
+ </p>
103
+
104
+ <table>
105
+ <tr>
106
+ <th align="center">OmniParser V2</th>
107
+ <th align="center">GPA-GUI-Detector</th>
108
+ </tr>
109
+ <tr>
110
+ <td align="center"><img src="images/example_omniparser.png" width="92%" alt="OmniParser V2"/></td>
111
+ <td align="center"><img src="images/example_gpa.png" width="99%" alt="GPA-GUI-Detector"/></td>
112
+ </tr>
113
+ </table>
114
+
115
+ ## License
116
+
117
+ This model is released under the [MIT License](https://opensource.org/licenses/MIT).
images/example_gpa.png ADDED

Git LFS Details

  • SHA256: 8683810a8c7a85e1802a6f7720a108feee60945bd1ae12069d4cddbc94303115
  • Pointer size: 131 Bytes
  • Size of remote file: 990 kB
images/example_input.png ADDED

Git LFS Details

  • SHA256: 50dbcdfb81ddb4fd01e7dd6c47cfe3a03a362326277d35ea1149e735ef168eab
  • Pointer size: 131 Bytes
  • Size of remote file: 949 kB
images/example_omniparser.png ADDED

Git LFS Details

  • SHA256: 7941784c459b3547e76512715cf744fc3e8ef8656879c6a83f3ea60e37793d34
  • Pointer size: 131 Bytes
  • Size of remote file: 994 kB
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd404f25b7f329998c7ed97a67827a174dd626dc2683f6844ab33a5219c05f71
3
+ size 40572716