checkbox-detector

A YOLO12n model that detects checked and unchecked checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.

Quick start

import natural_pdf as npdf

pdf = npdf.PDF("form.pdf")
checkboxes = pdf.pages[0].detect_checkboxes()

for cb in checkboxes:
    print(cb.is_checked, cb.confidence, cb.bbox)

The model downloads automatically via huggingface_hub.

Model details

Architecture YOLO12n (Ultralytics)
Format ONNX (opset 18, onnxslim)
Input 1024 x 1024 RGB
Output 2 classes: checkbox_checked, checkbox_unchecked
Size 10.3 MB
Runtime onnxruntime (CPU)

Training data

~5,100 document page images from two sources:

  • DocumentCloud: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like "check all that apply" and "inspection checklist". Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters.
  • Derived from CommonForms (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.

The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).

Split Source images Tiles
Train 4,095 16,243
Val 1,026 4,026
Test 37 37 (untiled)

Performance

Final validation metrics (best checkpoint, 200 epoch run on A100 80GB):

Class Precision Recall mAP50 mAP50-95
All 0.945 0.912 0.941 0.657
checkbox_checked 0.964 0.962 0.975 0.684
checkbox_unchecked 0.926 0.862 0.915 0.635

Inference details

natural-pdf renders pages at 72 DPI so that checkboxes appear at 26-35px โ€” close to the training distribution (30-35px from ~1000px full-page images). If the rendered image exceeds 1024px in either dimension, it is sliced into overlapping 1024x1024 tiles (SAHI-style, 20% overlap) and detections are merged with NMS. Smaller images are letterboxed directly to 1024x1024.

Only needs onnxruntime, numpy, Pillow, and huggingface_hub โ€” no PyTorch or Ultralytics at inference time.

Background

This model was inspired by FFDNet-L, a form field detector that can find unchecked checkboxes (as choice_button) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.

License

Apache 2.0. Training data derived in part from CommonForms (Apache 2.0).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support