orasul
/

deki-yolo

Object Detection

computer-vision

Model card Files Files and versions

deki-yolo / README.md

orasul's picture

Update README

287f12d 8 months ago

|

history blame contribute delete

1.53 kB

	---
	license: gpl-3.0
	tags:
	- ultralytics
	- yolo
	- object-detection
	- ui-detection
	- computer-vision
	- agent
	---

	# deki-yolo: Mobile UI Element Detection Model

	This is a YOLO model trained to identify common UI elements in mobile
	screenshots. It is the core detection model for the [deki huggingface space](https://huggingface.co/spaces/orasul/deki)
	or [deki github](https://github.com/RasulOs/deki)

	## Model Description

	The model is trained to detect the following four classes of UI elements:
	* `View`: General-purpose containers.
	* `ImageView`: Icons and images.
	* `Text`: Text elements.
	* `Line`: Separators and lines.

	This model can be used as a foundational component for applications that need
	to understand screen layouts, such as AI agents for mobile automation,
	accessibility tools, and UI code generation.

	---

	## YOLO examples

	Bounding boxes with classes for bb_1:

	<img src="res/bb_1_yolo.jpeg" alt="example1" width="60%">

	Bounding boxes without classes but with IDs after NMS for bb_1:

	<img src="res/bb_1_yolo_updated.jpeg" alt="example2" width="60%">

	Bounding boxes with classes for bb_2:

	<img src="res/bb_2_yolo.jpeg" alt="example3" width="60%">

	Bounding boxes without classes but with IDs after NMS for bb_2:

	<img src="res/bb_2_yolo_updated.jpeg" alt="example4" width="60%">

	---

	## YOLO model accuracy

	The model was trained on 486 images and was tested on 60 images.

	Current YOLO model accuracy:
	![example5](./res/YOLO_accuracy.png)