PSynx
/

widget-detector-yolo

Object Detection

form-understanding

Model card Files Files and versions

widget-detector-yolo / README.md

PSynx's picture

Upload README.md with huggingface_hub

603158f verified 3 days ago

|

history blame contribute delete

2.15 kB

	---
	language:
	- en
	license: mit
	library_name: ultralytics
	tags:
	- yolo11
	- object-detection
	- document-ai
	- form-understanding
	- vision
	pipeline_tag: object-detection
	---

	# YOLO11m Widget Detector

	YOLO11m Widget Detector is a 20.1 million parameter object detector trained on the dataset from the paper CommonForms: A Large, Diverse Dataset for Form Field Detection. The model detects widgets from among three classes: TextBoxes (`text_input`), ChoiceButtons (`choice_button` / checkboxes), and Signature fields (`signature`).

	## Results

	\| Model \| Text \| Choice \| Signature \| mAP@50 (↑) \|
	\|---\|---\|---\|---\|---\|
	\| YOLO11m v3 (1024px) \| 81.4 \| 70.9 \| 83.8 \| 78.7 \|
	\| YOLO11m v4 (1024px) \| 83.9 \| 72.1 \| 86.6 \| 80.9 \|

	## Installation

	The `psynx-widget-detector` package can be installed with either `uv` or `pip`, feel free to choose your package manager flavor. The `uv` command:

	```bash
	uv pip install psynx-widget-detector
	```

	The `pip` command:

	```bash
	pip install psynx-widget-detector
	```

	Once it's installed, you should be able to run inference on ~any PDF.

	## Python API

	The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.

	```python
	from widget_detector import WidgetDetector

	# Initialize the detector
	# (Downloads PSynx/widget-detector-yolo automatically)
	detector = WidgetDetector(
	conf=0.25, # Confidence threshold
	iou=0.45, # NMS IoU threshold
	imgsz=1024, # Inference resolution
	device="cpu" # "cuda" for GPU, "cpu" for CPU
	)

	# Process a PDF or Image
	result = detector.detect_path("input.pdf")

	# Print results
	for page in result.pages:
	print(f"Page {page.page}: Found {len(page.widgets)} widgets")
	for w in page.widgets:
	print(f" - {w.class_name} ({w.confidence:.2f})")

	# Save output to JSON
	result.save("output.json")
	```

	## Example Output

	Here is an example of the model's output on a sample document:

	![Sample Detection Output](sample_output.jpg)

	## References
	CommonForms: A Large, Diverse Dataset for Form Field Detection