YOLO11m Widget Detector

YOLO11m Widget Detector is a lightweight, high-performance document widget detector designed for scanned forms and PDFs.

The model detects three common form widget types:

  • text_input
  • choice_button
  • signature

It is optimized for:

  • scanned forms
  • enterprise PDFs
  • OCR pipelines
  • intelligent document processing (IDP)
  • form digitization workflows

The detector supports both CPU and GPU inference and can process PDFs or images directly.

Features

  • Detects form fields from scanned PDFs and images
  • Supports text boxes, checkboxes/radio buttons, and signatures
  • Works directly on PDFs
  • Optimized for document layouts
  • JSON export support
  • CPU and GPU compatible
  • Hugging Face auto-download support

Results

Model Text Choice Signature mAP@50 (โ†‘)
YOLO11m v3 (1024px) 81.4 70.9 83.8 78.7
YOLO11m v4 (1024px) 83.9 72.1 86.6 80.9

Installation

The psynx-widget-detector package can be installed with either uv or pip, feel free to choose your package manager flavor. The uv command:

uv pip install psynx-widget-detector

The pip command:

pip install psynx-widget-detector

Once it's installed, you should be able to run inference on ~any PDF.

Python API

The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.

from widget_detector import WidgetDetector

# Initialize the detector
# (Downloads PSynx/widget-detector-yolo automatically)
detector = WidgetDetector(
    conf=0.25,        # Confidence threshold
    iou=0.45,         # NMS IoU threshold
    imgsz=1024,       # Inference resolution
    device="cpu"      # "cuda" for GPU, "cpu" for CPU
)

# Process a PDF or Image
result = detector.detect_path("input.pdf")

# Print results
for page in result.pages:
    print(f"Page {page.page}: Found {len(page.widgets)} widgets")
    for w in page.widgets:
        print(f" - {w.class_name} ({w.confidence:.2f})")

# Save output to JSON
result.save("output.json")

Example Input and Output

Here is an example of a document before and after widget detection:

Input Document: Input Image

Detection Output: Output Image

References

CommonForms: A Large, Diverse Dataset for Form Field Detection

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using PSynx/widget-detector-yolo 1