woofah commited on
Commit
fed44cb
·
verified ·
1 Parent(s): acee903

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - object-detection
5
+ - yolo11
6
+ - ui-elements
7
+ - windows
8
+ - ultralytics
9
+ datasets:
10
+ - ui_synth_v2
11
+ pipeline_tag: object-detection
12
+ ---
13
+
14
+ # Local UI Locator — YOLO11s for Windows UI Elements
15
+
16
+ ## Model Summary
17
+
18
+ A YOLO11s (small) model fine-tuned on 3 000 synthetic Windows-style UI screenshots to detect interactive UI elements. Designed as a lightweight computer-vision fallback for Windows UI automation agents when native UI Automation APIs fail.
19
+
20
+ ## Classes
21
+
22
+ | ID | Class |
23
+ |----|------------|
24
+ | 0 | button |
25
+ | 1 | textbox |
26
+ | 2 | checkbox |
27
+ | 3 | dropdown |
28
+ | 4 | icon |
29
+ | 5 | tab |
30
+ | 6 | menu_item |
31
+
32
+ ## Training Data
33
+
34
+ Trained on `ui_synth_v2`, a synthetic dataset of 3 000 Windows-style UI screenshots generated via HTML/CSS templates rendered with Playwright. Includes domain randomization (themes, fonts, scaling, noise).
35
+
36
+ ## Metrics
37
+
38
+ | Metric | Value |
39
+ |--------------|--------|
40
+ | mAP50 | 0.9886 |
41
+ | mAP50-95 | 0.9543 |
42
+ | Precision | 0.9959 |
43
+ | Recall | 0.9730 |
44
+
45
+ ### Per-Class AP@50
46
+
47
+ | Class | AP@50 |
48
+ |------------|--------|
49
+ | button | 0.9919 |
50
+ | textbox | 0.9771 |
51
+ | checkbox | 0.9864 |
52
+ | dropdown | 0.9829 |
53
+ | icon | 0.9950 |
54
+ | tab | 0.9950 |
55
+ | menu_item | 0.9915 |
56
+
57
+ ## Usage
58
+
59
+ ```python
60
+ from local_ui_locator import detect_elements, find_by_text, safe_click_point
61
+
62
+ # Detect all UI elements in a screenshot
63
+ detections = detect_elements("screenshot.png", conf=0.3)
64
+ for det in detections:
65
+ print(f"{det.type}: {det.bbox} score={det.score:.2f}")
66
+
67
+ # Find element by text
68
+ match = find_by_text("screenshot.png", query="Submit")
69
+ if match:
70
+ x, y = safe_click_point(match.bbox)
71
+ print(f"Click at ({x}, {y})")
72
+ ```
73
+
74
+ ### Direct Ultralytics usage
75
+
76
+ ```python
77
+ from ultralytics import YOLO
78
+
79
+ model = YOLO("best.pt")
80
+ results = model.predict("screenshot.png", conf=0.3)
81
+ ```
82
+
83
+ ## Architecture
84
+
85
+ - **Base model:** YOLO11s (Ultralytics)
86
+ - **Input size:** 640px
87
+ - **Parameters:** ~9.4M
88
+ - **GFLOPs:** ~21.3
89
+ - **Inference speed:** ~44-80ms on CPU (M2 Pro), ~2-5ms on GPU (RTX 5060)
90
+
91
+ ## Training
92
+
93
+ - **GPU:** NVIDIA RTX 5060 8GB (Blackwell)
94
+ - **Dataset:** 3 000 synthetic images (2 400 train / 300 val / 300 test)
95
+ - **Epochs:** 120 (early stopping with patience=25)
96
+ - **Batch size:** 16
97
+ - **Image size:** 640px
98
+ - **Optimizer:** SGD with cosine LR scheduler
99
+
100
+ ## Limitations
101
+
102
+ - Trained on synthetic data only — real-world Windows UI may show domain gap
103
+ - Best on standard Windows 10/11 UI; custom-styled applications may perform worse
104
+ - Does not detect text content (use OCR for that)
105
+ - 7 classes only; complex widget types are not supported
106
+
107
+ ## License
108
+
109
+ MIT