--- language: - en license: mit library_name: ultralytics tags: - yolo11 - object-detection - document-ai - form-understanding - vision pipeline_tag: object-detection --- # YOLO11m Widget Detector YOLO11m Widget Detector is a 20.1 million parameter object detector trained on the dataset from the paper *CommonForms: A Large, Diverse Dataset for Form Field Detection*. The model detects widgets from among three classes: TextBoxes (`text_input`), ChoiceButtons (`choice_button` / checkboxes), and Signature fields (`signature`). ## Results | Model | Text | Choice | Signature | mAP@50 (↑) | |---|---|---|---|---| | YOLO11m v3 (1024px) | 81.4 | 70.9 | 83.8 | 78.7 | | **YOLO11m v4 (1024px)** | **83.9** | **72.1** | **86.6** | **80.9** | ## Installation The `psynx-widget-detector` package can be installed with either `uv` or `pip`, feel free to choose your package manager flavor. The `uv` command: ```bash uv pip install psynx-widget-detector ``` The `pip` command: ```bash pip install psynx-widget-detector ``` Once it's installed, you should be able to run inference on ~any PDF. ## Python API The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run. ```python from widget_detector import WidgetDetector # Initialize the detector # (Downloads PSynx/widget-detector-yolo automatically) detector = WidgetDetector( conf=0.25, # Confidence threshold iou=0.45, # NMS IoU threshold imgsz=1024, # Inference resolution device="cpu" # "cuda" for GPU, "cpu" for CPU ) # Process a PDF or Image result = detector.detect_path("input.pdf") # Print results for page in result.pages: print(f"Page {page.page}: Found {len(page.widgets)} widgets") for w in page.widgets: print(f" - {w.class_name} ({w.confidence:.2f})") # Save output to JSON result.save("output.json") ``` ## Example Output Here is an example of the model's output on a sample document: ![Sample Detection Output](sample_output.jpg) ## References *CommonForms: A Large, Diverse Dataset for Form Field Detection*