File size: 3,217 Bytes
0ea09bc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f97da21
0ea09bc
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
license: apache-2.0
tags:
  - checkbox-detection
  - document-ai
  - yolo
  - onnx
  - object-detection
library_name: natural-pdf
pipeline_tag: object-detection
---

# checkbox-detector

A YOLO12n model that detects **checked** and **unchecked** checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.

## Quick start

```python
import natural_pdf as npdf

pdf = npdf.PDF("form.pdf")
checkboxes = pdf.pages[0].detect_checkboxes()

for cb in checkboxes:
    print(cb.is_checked, cb.confidence, cb.bbox)
```

The model downloads automatically via `huggingface_hub`.

## Model details

| | |
|---|---|
| Architecture | YOLO12n (Ultralytics) |
| Format | ONNX (opset 18, onnxslim) |
| Input | 1024 x 1024 RGB |
| Output | 2 classes: `checkbox_checked`, `checkbox_unchecked` |
| Size | 10.3 MB |
| Runtime | onnxruntime (CPU) |

## Training data

~5,100 document page images from two sources:

- **DocumentCloud**: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like `"check all that apply"` and `"inspection checklist"`. Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters.
- **Derived from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms)** (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.

The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).

| Split | Source images | Tiles |
|-------|--------------|-------|
| Train | 4,095 | 16,243 |
| Val | 1,026 | 4,026 |
| Test | 37 | 37 (untiled) |

## Performance

Final validation metrics (best checkpoint, 200 epoch run on A100 80GB):

| Class | Precision | Recall | mAP50 | mAP50-95 |
|-------|-----------|--------|-------|----------|
| All | 0.945 | 0.912 | 0.941 | 0.657 |
| checkbox_checked | 0.964 | 0.962 | 0.975 | 0.684 |
| checkbox_unchecked | 0.926 | 0.862 | 0.915 | 0.635 |

## Inference details

natural-pdf renders pages at 72 DPI so that checkboxes appear at ~26-35px — close to the training distribution (~30-35px from ~1000px full-page images). If the rendered image exceeds 1024px in either dimension, it is sliced into overlapping 1024x1024 tiles (SAHI-style, 20% overlap) and detections are merged with NMS. Smaller images are letterboxed directly to 1024x1024.

Only needs `onnxruntime`, `numpy`, `Pillow`, and `huggingface_hub` — no PyTorch or Ultralytics at inference time.

## Background

This model was inspired by [FFDNet-L](https://huggingface.co/jbarrow/FFDNet-L-cpu), a form field detector that can find unchecked checkboxes (as `choice_button`) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.

## License

Apache 2.0. Training data derived in part from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) (Apache 2.0).