IndextDataLab
/

windows-ui-locator

Object Detection

Model card Files Files and versions

windows-ui-locator / README.md

Mypa1's picture

Update README.md

cca5029 verified 16 days ago

|

history blame contribute delete

2.68 kB

	---
	license: mit
	tags:
	- object-detection
	- yolo11
	- ui-elements
	- windows
	- ultralytics
	datasets:
	- ui_synth_v2
	pipeline_tag: object-detection
	---

	# Windows UI Element Detector — YOLO11s for Windows UI Elements

	## Model Summary

	A YOLO11s (small) model fine-tuned on 3 000 synthetic Windows-style UI screenshots to detect interactive UI elements. Designed as a lightweight computer-vision fallback for Windows UI automation agents when native UI Automation APIs fail.

	## Classes

	\| ID \| Class \|
	\|----\|------------\|
	\| 0 \| button \|
	\| 1 \| textbox \|
	\| 2 \| checkbox \|
	\| 3 \| dropdown \|
	\| 4 \| icon \|
	\| 5 \| tab \|
	\| 6 \| menu_item \|

	## Training Data

	Trained on `ui_synth_v2`, a synthetic dataset of 3 000 Windows-style UI screenshots generated via HTML/CSS templates rendered with Playwright. Includes domain randomization (themes, fonts, scaling, noise).

	## Metrics

	\| Metric \| Value \|
	\|--------------\|--------\|
	\| mAP50 \| 0.9886 \|
	\| mAP50-95 \| 0.9543 \|
	\| Precision \| 0.9959 \|
	\| Recall \| 0.9730 \|

	### Per-Class AP@50

	\| Class \| AP@50 \|
	\|------------\|--------\|
	\| button \| 0.9919 \|
	\| textbox \| 0.9771 \|
	\| checkbox \| 0.9864 \|
	\| dropdown \| 0.9829 \|
	\| icon \| 0.9950 \|
	\| tab \| 0.9950 \|
	\| menu_item \| 0.9915 \|

	## Usage

	```python
	from local_ui_locator import detect_elements, find_by_text, safe_click_point

	# Detect all UI elements in a screenshot
	detections = detect_elements("screenshot.png", conf=0.3)
	for det in detections:
	print(f"{det.type}: {det.bbox} score={det.score:.2f}")

	# Find element by text
	match = find_by_text("screenshot.png", query="Submit")
	if match:
	x, y = safe_click_point(match.bbox)
	print(f"Click at ({x}, {y})")
	```

	### Direct Ultralytics usage

	```python
	from ultralytics import YOLO

	model = YOLO("best.pt")
	results = model.predict("screenshot.png", conf=0.3)
	```

	## Architecture

	- Base model: YOLO11s (Ultralytics)
	- Input size: 640px
	- Parameters: ~9.4M
	- GFLOPs: ~21.3
	- Inference speed: ~44-80ms on CPU (M2 Pro), ~2-5ms on GPU (RTX 5060)

	## Training

	- GPU: NVIDIA RTX 5060 8GB (Blackwell)
	- Dataset: 3 000 synthetic images (2 400 train / 300 val / 300 test)
	- Epochs: 120 (early stopping with patience=25)
	- Batch size: 16
	- Image size: 640px
	- Optimizer: SGD with cosine LR scheduler

	## Limitations

	- Trained on synthetic data only — real-world Windows UI may show domain gap
	- Best on standard Windows 10/11 UI; custom-styled applications may perform worse
	- Does not detect text content (use OCR for that)
	- 7 classes only; complex widget types are not supported

	## License

	MIT