checkbox-detector / README.md
wendys-llc's picture
Upload README.md with huggingface_hub
f97da21 verified
---
license: apache-2.0
tags:
- checkbox-detection
- document-ai
- yolo
- onnx
- object-detection
library_name: natural-pdf
pipeline_tag: object-detection
---
# checkbox-detector
A YOLO12n model that detects **checked** and **unchecked** checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.
## Quick start
```python
import natural_pdf as npdf
pdf = npdf.PDF("form.pdf")
checkboxes = pdf.pages[0].detect_checkboxes()
for cb in checkboxes:
print(cb.is_checked, cb.confidence, cb.bbox)
```
The model downloads automatically via `huggingface_hub`.
## Model details
| | |
|---|---|
| Architecture | YOLO12n (Ultralytics) |
| Format | ONNX (opset 18, onnxslim) |
| Input | 1024 x 1024 RGB |
| Output | 2 classes: `checkbox_checked`, `checkbox_unchecked` |
| Size | 10.3 MB |
| Runtime | onnxruntime (CPU) |
## Training data
~5,100 document page images from two sources:
- **DocumentCloud**: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like `"check all that apply"` and `"inspection checklist"`. Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters.
- **Derived from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms)** (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.
The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).
| Split | Source images | Tiles |
|-------|--------------|-------|
| Train | 4,095 | 16,243 |
| Val | 1,026 | 4,026 |
| Test | 37 | 37 (untiled) |
## Performance
Final validation metrics (best checkpoint, 200 epoch run on A100 80GB):
| Class | Precision | Recall | mAP50 | mAP50-95 |
|-------|-----------|--------|-------|----------|
| All | 0.945 | 0.912 | 0.941 | 0.657 |
| checkbox_checked | 0.964 | 0.962 | 0.975 | 0.684 |
| checkbox_unchecked | 0.926 | 0.862 | 0.915 | 0.635 |
## Inference details
natural-pdf renders pages at 72 DPI so that checkboxes appear at ~26-35px — close to the training distribution (~30-35px from ~1000px full-page images). If the rendered image exceeds 1024px in either dimension, it is sliced into overlapping 1024x1024 tiles (SAHI-style, 20% overlap) and detections are merged with NMS. Smaller images are letterboxed directly to 1024x1024.
Only needs `onnxruntime`, `numpy`, `Pillow`, and `huggingface_hub` — no PyTorch or Ultralytics at inference time.
## Background
This model was inspired by [FFDNet-L](https://huggingface.co/jbarrow/FFDNet-L-cpu), a form field detector that can find unchecked checkboxes (as `choice_button`) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.
## License
Apache 2.0. Training data derived in part from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) (Apache 2.0).