File size: 2,660 Bytes
dc4dc63
 
 
 
 
 
 
 
 
 
 
 
 
 
0b8822a
dc4dc63
b911a36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dc4dc63
0b8822a
 
 
 
 
 
 
 
 
ea93f6c
dc4dc63
ea93f6c
 
603158f
 
dc4dc63
ea93f6c
dc4dc63
ea93f6c
dc4dc63
ea93f6c
 
 
dc4dc63
ea93f6c
dc4dc63
 
545a4c6
dc4dc63
 
ea93f6c
 
 
 
 
 
dc4dc63
 
 
ea93f6c
 
 
 
 
 
 
 
dc4dc63
ea93f6c
 
dc4dc63
 
 
 
 
ea93f6c
dc4dc63
ea93f6c
dc4dc63
 
 
c6fe3cd
dc4dc63
c6fe3cd
dc4dc63
c6fe3cd
 
 
 
 
dc4dc63
ea93f6c
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
language:
- en
license: mit
library_name: ultralytics
tags:
- yolo11
- object-detection
- document-ai
- form-understanding
- vision
pipeline_tag: object-detection
---

## YOLO11m Widget Detector

YOLO11m Widget Detector is a lightweight, high-performance document widget detector designed for scanned forms and PDFs.

The model detects three common form widget types:

- text_input
- choice_button
- signature

It is optimized for:

- scanned forms
- enterprise PDFs
- OCR pipelines
- intelligent document processing (IDP)
- form digitization workflows

The detector supports both CPU and GPU inference and can process PDFs or images directly.

## Features
- Detects form fields from scanned PDFs and images
- Supports text boxes, checkboxes/radio buttons, and signatures
- Works directly on PDFs
- Optimized for document layouts
- JSON export support
- CPU and GPU compatible
- Hugging Face auto-download support

## Results

| Model | Text | Choice | Signature | mAP@50 (↑) |
|---|---|---|---|---|
| YOLO11m v3 (1024px) | 81.4 | 70.9 | 83.8 | 78.7 |
| **YOLO11m v4 (1024px)** | **83.9** | **72.1** | **86.6** | **80.9** |

## Installation

The `psynx-widget-detector` package can be installed with either `uv` or `pip`, feel free to choose your package manager flavor. The `uv` command:

```bash
uv pip install psynx-widget-detector
```

The `pip` command:

```bash
pip install psynx-widget-detector
```

Once it's installed, you should be able to run inference on ~any PDF.

## Python API

The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.

```python
from widget_detector import WidgetDetector

# Initialize the detector
# (Downloads PSynx/widget-detector-yolo automatically)
detector = WidgetDetector(
    conf=0.25,        # Confidence threshold
    iou=0.45,         # NMS IoU threshold
    imgsz=1024,       # Inference resolution
    device="cpu"      # "cuda" for GPU, "cpu" for CPU
)

# Process a PDF or Image
result = detector.detect_path("input.pdf")

# Print results
for page in result.pages:
    print(f"Page {page.page}: Found {len(page.widgets)} widgets")
    for w in page.widgets:
        print(f" - {w.class_name} ({w.confidence:.2f})")

# Save output to JSON
result.save("output.json")
```

## Example Input and Output

Here is an example of a document before and after widget detection:

**Input Document:**
![Input Image](006b4939d67f3f1f190092be507996e5058c27f7bfbecb2e1d35ab665a0a658a.png)

**Detection Output:**
![Output Image](image.webp)

## References
*CommonForms: A Large, Diverse Dataset for Form Field Detection*