PSynx's picture
Upload README.md with huggingface_hub
603158f verified
---
language:
- en
license: mit
library_name: ultralytics
tags:
- yolo11
- object-detection
- document-ai
- form-understanding
- vision
pipeline_tag: object-detection
---
# YOLO11m Widget Detector
YOLO11m Widget Detector is a 20.1 million parameter object detector trained on the dataset from the paper *CommonForms: A Large, Diverse Dataset for Form Field Detection*. The model detects widgets from among three classes: TextBoxes (`text_input`), ChoiceButtons (`choice_button` / checkboxes), and Signature fields (`signature`).
## Results
| Model | Text | Choice | Signature | mAP@50 (↑) |
|---|---|---|---|---|
| YOLO11m v3 (1024px) | 81.4 | 70.9 | 83.8 | 78.7 |
| **YOLO11m v4 (1024px)** | **83.9** | **72.1** | **86.6** | **80.9** |
## Installation
The `psynx-widget-detector` package can be installed with either `uv` or `pip`, feel free to choose your package manager flavor. The `uv` command:
```bash
uv pip install psynx-widget-detector
```
The `pip` command:
```bash
pip install psynx-widget-detector
```
Once it's installed, you should be able to run inference on ~any PDF.
## Python API
The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.
```python
from widget_detector import WidgetDetector
# Initialize the detector
# (Downloads PSynx/widget-detector-yolo automatically)
detector = WidgetDetector(
conf=0.25, # Confidence threshold
iou=0.45, # NMS IoU threshold
imgsz=1024, # Inference resolution
device="cpu" # "cuda" for GPU, "cpu" for CPU
)
# Process a PDF or Image
result = detector.detect_path("input.pdf")
# Print results
for page in result.pages:
print(f"Page {page.page}: Found {len(page.widgets)} widgets")
for w in page.widgets:
print(f" - {w.class_name} ({w.confidence:.2f})")
# Save output to JSON
result.save("output.json")
```
## Example Output
Here is an example of the model's output on a sample document:
![Sample Detection Output](sample_output.jpg)
## References
*CommonForms: A Large, Diverse Dataset for Form Field Detection*