| --- |
| license: apache-2.0 |
| tags: |
| - checkbox-detection |
| - document-ai |
| - yolo |
| - onnx |
| - object-detection |
| library_name: natural-pdf |
| pipeline_tag: object-detection |
| --- |
| |
| # checkbox-detector |
|
|
| A YOLO12n model that detects **checked** and **unchecked** checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency. |
|
|
| ## Quick start |
|
|
| ```python |
| import natural_pdf as npdf |
| |
| pdf = npdf.PDF("form.pdf") |
| checkboxes = pdf.pages[0].detect_checkboxes() |
| |
| for cb in checkboxes: |
| print(cb.is_checked, cb.confidence, cb.bbox) |
| ``` |
|
|
| The model downloads automatically via `huggingface_hub`. |
|
|
| ## Model details |
|
|
| | | | |
| |---|---| |
| | Architecture | YOLO12n (Ultralytics) | |
| | Format | ONNX (opset 18, onnxslim) | |
| | Input | 1024 x 1024 RGB | |
| | Output | 2 classes: `checkbox_checked`, `checkbox_unchecked` | |
| | Size | 10.3 MB | |
| | Runtime | onnxruntime (CPU) | |
|
|
| ## Training data |
|
|
| ~5,100 document page images from two sources: |
|
|
| - **DocumentCloud**: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like `"check all that apply"` and `"inspection checklist"`. Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters. |
| - **Derived from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms)** (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples. |
|
|
| The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked). |
|
|
| | Split | Source images | Tiles | |
| |-------|--------------|-------| |
| | Train | 4,095 | 16,243 | |
| | Val | 1,026 | 4,026 | |
| | Test | 37 | 37 (untiled) | |
|
|
| ## Performance |
|
|
| Final validation metrics (best checkpoint, 200 epoch run on A100 80GB): |
|
|
| | Class | Precision | Recall | mAP50 | mAP50-95 | |
| |-------|-----------|--------|-------|----------| |
| | All | 0.945 | 0.912 | 0.941 | 0.657 | |
| | checkbox_checked | 0.964 | 0.962 | 0.975 | 0.684 | |
| | checkbox_unchecked | 0.926 | 0.862 | 0.915 | 0.635 | |
|
|
| ## Inference details |
|
|
| natural-pdf renders pages at 72 DPI so that checkboxes appear at ~26-35px — close to the training distribution (~30-35px from ~1000px full-page images). If the rendered image exceeds 1024px in either dimension, it is sliced into overlapping 1024x1024 tiles (SAHI-style, 20% overlap) and detections are merged with NMS. Smaller images are letterboxed directly to 1024x1024. |
|
|
| Only needs `onnxruntime`, `numpy`, `Pillow`, and `huggingface_hub` — no PyTorch or Ultralytics at inference time. |
|
|
| ## Background |
|
|
| This model was inspired by [FFDNet-L](https://huggingface.co/jbarrow/FFDNet-L-cpu), a form field detector that can find unchecked checkboxes (as `choice_button`) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector. |
|
|
| ## License |
|
|
| Apache 2.0. Training data derived in part from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) (Apache 2.0). |
|
|