checkbox-detector
A YOLO12n model that detects checked and unchecked checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.
Quick start
import natural_pdf as npdf
pdf = npdf.PDF("form.pdf")
checkboxes = pdf.pages[0].detect_checkboxes()
for cb in checkboxes:
print(cb.is_checked, cb.confidence, cb.bbox)
The model downloads automatically via huggingface_hub.
Model details
| Architecture | YOLO12n (Ultralytics) |
| Format | ONNX (opset 18, onnxslim) |
| Input | 1024 x 1024 RGB |
| Output | 2 classes: checkbox_checked, checkbox_unchecked |
| Size | 10.3 MB |
| Runtime | onnxruntime (CPU) |
Training data
~5,100 document page images from two sources:
- DocumentCloud: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like
"check all that apply"and"inspection checklist". Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters. - Derived from CommonForms (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.
The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).
| Split | Source images | Tiles |
|---|---|---|
| Train | 4,095 | 16,243 |
| Val | 1,026 | 4,026 |
| Test | 37 | 37 (untiled) |
Performance
Final validation metrics (best checkpoint, 200 epoch run on A100 80GB):
| Class | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|
| All | 0.945 | 0.912 | 0.941 | 0.657 |
| checkbox_checked | 0.964 | 0.962 | 0.975 | 0.684 |
| checkbox_unchecked | 0.926 | 0.862 | 0.915 | 0.635 |
Inference details
natural-pdf renders pages at 72 DPI so that checkboxes appear at 26-35px โ close to the training distribution (30-35px from ~1000px full-page images). If the rendered image exceeds 1024px in either dimension, it is sliced into overlapping 1024x1024 tiles (SAHI-style, 20% overlap) and detections are merged with NMS. Smaller images are letterboxed directly to 1024x1024.
Only needs onnxruntime, numpy, Pillow, and huggingface_hub โ no PyTorch or Ultralytics at inference time.
Background
This model was inspired by FFDNet-L, a form field detector that can find unchecked checkboxes (as choice_button) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.
License
Apache 2.0. Training data derived in part from CommonForms (Apache 2.0).