Instructions to use docling-project/ScreenParser with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use docling-project/ScreenParser with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("docling-project/ScreenParser") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
Upload folder using huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- docling-project/screenparse
|
| 5 |
+
tags:
|
| 6 |
+
- object-detection
|
| 7 |
+
- yolo
|
| 8 |
+
- ui-understanding
|
| 9 |
+
- screen-parsing
|
| 10 |
+
- grounding
|
| 11 |
+
- web
|
| 12 |
+
- ultralytics
|
| 13 |
+
language:
|
| 14 |
+
- en
|
| 15 |
+
pipeline_tag: object-detection
|
| 16 |
+
library_name: ultralytics
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# ScreenParser
|
| 20 |
+
|
| 21 |
+
**ScreenParser** is a YOLO-based UI element detector fine-tuned on [ScreenParse](https://huggingface.co/docling-project/screenparse), a large-scale dataset of 771K web page screenshots with dense annotations across **55 UI element classes**. Given a screenshot, it detects and classifies every visible UI component with bounding boxes and confidence scores.
|
| 22 |
+
|
| 23 |
+
- **Developed by**: IBM Research - ETH Zurich
|
| 24 |
+
- **Model type**: Object detection (YOLO11-L)
|
| 25 |
+
- **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
| 26 |
+
- **Paper**: [ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing](TODO)
|
| 27 |
+
- **Code**: [GitHub](TODO)
|
| 28 |
+
- **Dataset**: [docling-project/screenparse](https://huggingface.co/docling-project/screenparse)
|
| 29 |
+
|
| 30 |
+
## Model Summary
|
| 31 |
+
|
| 32 |
+
ScreenParser is a [YOLO11-Large](https://docs.ultralytics.com/models/yolo11/) model (25.4M parameters) fine-tuned at 1280px resolution on ScreenParse.
|
| 33 |
+
|
| 34 |
+
### Supported Classes (55)
|
| 35 |
+
|
| 36 |
+
Table, Column/Browser, Button, Utility Button, App Icon, Navigation Bar, Status Bar, Search Field, Toolbar, Tooltip, Video, Tab Bar, Side Bar, Slider, Picker, ContextMenu, DockMenu, EditMenu, Image, Scroll, Switch, File Icon, Chart, Window, Screen, List, List Item, PopUp Menu, Steppers, Toggles, Text Input, Rating Indicator, Checkbox, Radiobox, Select, Avatar, Badge, Alert, Progress bar, Bottom navigation, Breadcrumb, Page control, Link, Menu, Pagination, Tab, Search Bar, Date-Time picker, Calendar, Text, Heading, Code snippet, Carousel, Notification, Logo
|
| 37 |
+
|
| 38 |
+
## Usage
|
| 39 |
+
|
| 40 |
+
### Single Image Inference
|
| 41 |
+
|
| 42 |
+
```python
|
| 43 |
+
from ultralytics import YOLO
|
| 44 |
+
from PIL import Image
|
| 45 |
+
|
| 46 |
+
model = YOLO("docling-project/ScreenParser")
|
| 47 |
+
|
| 48 |
+
results = model.predict("screenshot.png", imgsz=1280, conf=0.10, iou=0.10)
|
| 49 |
+
|
| 50 |
+
for r in results:
|
| 51 |
+
for box, cls_id, conf in zip(r.boxes.xyxy, r.boxes.cls, r.boxes.conf):
|
| 52 |
+
x1, y1, x2, y2 = box.tolist()
|
| 53 |
+
label = model.names[int(cls_id)]
|
| 54 |
+
print(f"{label:20s} conf={conf:.2f} bbox=({int(x1)}, {int(y1)}, {int(x2-x1)}, {int(y2-y1)})")
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### Batch Inference
|
| 58 |
+
|
| 59 |
+
```python
|
| 60 |
+
import os
|
| 61 |
+
from ultralytics import YOLO
|
| 62 |
+
|
| 63 |
+
model = YOLO("docling-project/ScreenParser")
|
| 64 |
+
IMAGE_DIR = "screenshots/"
|
| 65 |
+
|
| 66 |
+
images = sorted(
|
| 67 |
+
os.path.join(IMAGE_DIR, f) for f in os.listdir(IMAGE_DIR)
|
| 68 |
+
if f.lower().endswith((".png", ".jpg", ".jpeg"))
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
results = model.predict(images, imgsz=1280, conf=0.10, iou=0.10, batch=16)
|
| 72 |
+
|
| 73 |
+
for path, r in zip(images, results):
|
| 74 |
+
print(f"--- {os.path.basename(path)} ({len(r.boxes)} elements) ---")
|
| 75 |
+
for box, cls_id, conf in zip(r.boxes.xyxy, r.boxes.cls, r.boxes.conf):
|
| 76 |
+
x1, y1, x2, y2 = box.tolist()
|
| 77 |
+
label = model.names[int(cls_id)]
|
| 78 |
+
print(f" {label:20s} conf={conf:.2f} bbox=({int(x1)}, {int(y1)}, {int(x2-x1)}, {int(y2-y1)})")
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### Save Visualizations
|
| 82 |
+
|
| 83 |
+
```python
|
| 84 |
+
from ultralytics import YOLO
|
| 85 |
+
|
| 86 |
+
model = YOLO("docling-project/ScreenParser")
|
| 87 |
+
results = model.predict("screenshot.png", imgsz=1280, conf=0.10, iou=0.10, save=True)
|
| 88 |
+
# Annotated image saved under runs/detect/predict/
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
**Training data**: [ScreenParse](https://huggingface.co/docling-project/screenparse) — 771K web page screenshots with dense annotations across 55 UI element classes. Annotations were generated through automated DOM extraction, IoU-based filtering, and VLM-based refinement.
|
| 92 |
+
|
| 93 |
+
## Limitations
|
| 94 |
+
|
| 95 |
+
- Does not produce text content for detected elements (bounding boxes and labels only) — pair with an OCR model or [ScreenVLM](https://huggingface.co/docling-project/ScreenVLM) for text extraction
|
| 96 |
+
|
| 97 |
+
## Citation
|
| 98 |
+
|
| 99 |
+
```bibtex
|
| 100 |
+
@inproceedings{screenparse2026,
|
| 101 |
+
title={ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing},
|
| 102 |
+
author={TODO},
|
| 103 |
+
booktitle={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
|
| 104 |
+
year={2026}
|
| 105 |
+
}
|
| 106 |
+
```
|
best.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1d16bb335e6f38280dafbb3d2f2937975b62d8ef68ab3cf474b15b145b73286
|
| 3 |
+
size 51358361
|