PSynx commited on
Commit
ea93f6c
·
verified ·
1 Parent(s): 1073421

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +35 -39
README.md CHANGED
@@ -12,70 +12,66 @@ tags:
12
  pipeline_tag: object-detection
13
  ---
14
 
15
- # YOLO11m Document Widget Detector
16
 
17
- This is a fine-tuned YOLO11m model for detecting interactive form widgets (text inputs, checkboxes/radio buttons, and signatures) in document images and PDFs.
18
 
19
- It was trained on the [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) dataset (100,000 document images) and achieves high accuracy across diverse document layouts.
20
 
21
- ## Model Details
22
- - **Architecture:** YOLO11m
23
- - **Task:** Object Detection (Document Widgets)
24
- - **Classes:**
25
- - `0`: `text_input`
26
- - `1`: `choice_button` (checkboxes & radio buttons)
27
- - `2`: `signature`
28
- - **Input Size:** 1024x1024
29
 
30
- ## Performance (mAP@50)
31
- - **text_input:** 0.814
32
- - **choice_button:** 0.709
33
- - **signature:** 0.838
34
- - **Overall mAP@50:** 0.787
35
 
36
- ## Usage
37
 
38
- ### Using the Python Package
 
 
39
 
40
- You can install the official inference package to automatically download this model and process PDFs or images.
41
 
42
  ```bash
43
  pip install psynx-widget-detector
44
  ```
45
 
 
 
 
 
 
 
46
  ```python
47
  from widget_detector import WidgetDetector
48
 
49
- # Initialize without a path to auto-download from Hugging Face
50
- detector = WidgetDetector()
 
 
 
 
 
 
51
 
52
- # Run inference on a PDF (auto-renders pages to images)
53
- result = detector.detect_path("sample_form.pdf")
54
 
55
  # Print results
56
  for page in result.pages:
57
  print(f"Page {page.page}: Found {len(page.widgets)} widgets")
58
  for w in page.widgets:
59
- print(f" - {w.class_name} ({w.confidence:.2f}) at {w.bbox.x1:.1f}, {w.bbox.y1:.1f}")
60
 
61
- # Save to JSON
62
  result.save("output.json")
63
  ```
64
 
65
- ### Using Ultralytics Directly
66
 
67
- If you prefer to use the raw Ultralytics library:
68
 
69
- ```python
70
- from ultralytics import YOLO
71
- from huggingface_hub import hf_hub_download
72
-
73
- # Download the model weights
74
- model_path = hf_hub_download(repo_id="PSynx/widget-detector-yolo", filename="best.pt")
75
 
76
- # Load the model
77
- model = YOLO(model_path)
78
-
79
- # Run inference
80
- results = model("document_image.png", imgsz=1024, conf=0.25)
81
- ```
 
12
  pipeline_tag: object-detection
13
  ---
14
 
15
+ # YOLO11m Widget Detector
16
 
17
+ YOLO11m Widget Detector is a 20.1 million parameter object detector trained on the dataset from the paper *CommonForms: A Large, Diverse Dataset for Form Field Detection*. The model detects widgets from among three classes: TextBoxes (`text_input`), ChoiceButtons (`choice_button` / checkboxes), and Signature fields (`signature`).
18
 
19
+ ## Results
20
 
21
+ | Model | Text | Choice | Signature | mAP@50 (↑) |
22
+ |---|---|---|---|---|
23
+ | YOLO11m (1024px) | 81.4 | 70.9 | 83.8 | 78.7 |
 
 
 
 
 
24
 
25
+ ## Installation
 
 
 
 
26
 
27
+ The `psynx-widget-detector` package can be installed with either `uv` or `pip`, feel free to choose your package manager flavor. The `uv` command:
28
 
29
+ ```bash
30
+ uv pip install psynx-widget-detector
31
+ ```
32
 
33
+ The `pip` command:
34
 
35
  ```bash
36
  pip install psynx-widget-detector
37
  ```
38
 
39
+ Once it's installed, you should be able to run inference on ~any PDF.
40
+
41
+ ## Python API
42
+
43
+ The simplest usage will run inference using the default suggested settings. The model weights will automatically download from Hugging Face on your first run.
44
+
45
  ```python
46
  from widget_detector import WidgetDetector
47
 
48
+ # Initialize the detector
49
+ # (Downloads PSynx/widget-detector-yolo automatically)
50
+ detector = WidgetDetector(
51
+ conf=0.25, # Confidence threshold
52
+ iou=0.45, # NMS IoU threshold
53
+ imgsz=1024, # Inference resolution
54
+ device="cpu" # "cuda" for GPU, "cpu" for CPU
55
+ )
56
 
57
+ # Process a PDF or Image
58
+ result = detector.detect_path("input.pdf")
59
 
60
  # Print results
61
  for page in result.pages:
62
  print(f"Page {page.page}: Found {len(page.widgets)} widgets")
63
  for w in page.widgets:
64
+ print(f" - {w.class_name} ({w.confidence:.2f})")
65
 
66
+ # Save output to JSON
67
  result.save("output.json")
68
  ```
69
 
70
+ ## Example Output
71
 
72
+ Here is an example of the model's output on a sample document:
73
 
74
+ ![Sample Detection Output](sample_output.jpg)
 
 
 
 
 
75
 
76
+ ## References
77
+ *CommonForms: A Large, Diverse Dataset for Form Field Detection*