PSynx
/

widget-detector-yolo

@@ -12,70 +12,66 @@ tags:
 pipeline_tag: object-detection
 ---
-# YOLO11m Document Widget Detector
-This is a fine-tuned YOLO11m model for detecting interactive form widgets (text inputs, checkboxes/radio buttons, and signatures) in document images and PDFs.
-It was trained on the [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) dataset (100,000 document images) and achieves high accuracy across diverse document layouts.
-## Model Details
-- **Architecture:** YOLO11m
-- **Task:** Object Detection (Document Widgets)
-- **Classes:**
-  - `0`: `text_input`
-  - `1`: `choice_button` (checkboxes & radio buttons)
-  - `2`: `signature`
-- **Input Size:** 1024x1024
-## Performance (mAP@50)
-- **text_input:** 0.814
-- **choice_button:** 0.709
-- **signature:** 0.838
-- **Overall mAP@50:** 0.787
-## Usage
-### Using the Python Package
-You can install the official inference package to automatically download this model and process PDFs or images.
 ```bash
 pip install psynx-widget-detector
 ```
 ```python
 from widget_detector import WidgetDetector
-# Initialize without a path to auto-download from Hugging Face
-detector = WidgetDetector()
-# Run inference on a PDF (auto-renders pages to images)
-result = detector.detect_path("sample_form.pdf")
 # Print results
 for page in result.pages:
     print(f"Page {page.page}: Found {len(page.widgets)} widgets")
     for w in page.widgets:
-        print(f" - {w.class_name} ({w.confidence:.2f}) at {w.bbox.x1:.1f}, {w.bbox.y1:.1f}")
-# Save to JSON
 result.save("output.json")
 ```
-### Using Ultralytics Directly
-If you prefer to use the raw Ultralytics library:
-```python
-from ultralytics import YOLO
-from huggingface_hub import hf_hub_download
-# Download the model weights
-model_path = hf_hub_download(repo_id="PSynx/widget-detector-yolo", filename="best.pt")
-# Load the model
-model = YOLO(model_path)
-# Run inference
-results = model("document_image.png", imgsz=1024, conf=0.25)
-```

 pipeline_tag: object-detection
 ---
+# YOLO11m Widget Detector
+YOLO11m Widget Detector is a 20.1 million parameter object detector trained on the dataset from the paper *CommonForms: A Large, Diverse Dataset for Form Field Detection*. The model detects widgets from among three classes: TextBoxes (`text_input`), ChoiceButtons (`choice_button` / checkboxes), and Signature fields (`signature`).
+## Results
+| Model | Text | Choice | Signature | mAP@50 (↑) |
+|---|---|---|---|---|
+| YOLO11m (1024px) | 81.4 | 70.9 | 83.8 | 78.7 |
+## Installation
+The `psynx-widget-detector` package can be installed with either `uv` or `pip`, feel free to choose your package manager flavor. The `uv` command:
+```bash
+uv pip install psynx-widget-detector
+```
+The `pip` command:
 ```bash
 pip install psynx-widget-detector
 ```
+Once it's installed, you should be able to run inference on ~any PDF.
+## Python API
+The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.
 ```python
 from widget_detector import WidgetDetector
+# Initialize the detector
+# (Downloads PSynx/widget-detector-yolo automatically)
+detector = WidgetDetector(
+    conf=0.25,        # Confidence threshold
+    iou=0.45,         # NMS IoU threshold
+    imgsz=1024,       # Inference resolution
+    device="cpu"      # "cuda" for GPU, "cpu" for CPU
+)
+# Process a PDF or Image
+result = detector.detect_path("input.pdf")
 # Print results
 for page in result.pages:
     print(f"Page {page.page}: Found {len(page.widgets)} widgets")
     for w in page.widgets:
+        print(f" - {w.class_name} ({w.confidence:.2f})")
+# Save output to JSON
 result.save("output.json")
 ```
+## Example Output
+Here is an example of the model's output on a sample document:
+![Sample Detection Output](sample_output.jpg)
+## References
+*CommonForms: A Large, Diverse Dataset for Form Field Detection*