wendys-llc
/

checkbox-detector

Object Detection

checkbox-detection

Model card Files Files and versions

checkbox-detector / README.md

wendys-llc's picture

Upload README.md with huggingface_hub

f97da21 verified about 1 month ago

|

history blame contribute delete

3.22 kB

	---
	license: apache-2.0
	tags:
	- checkbox-detection
	- document-ai
	- yolo
	- onnx
	- object-detection
	library_name: natural-pdf
	pipeline_tag: object-detection
	---

	# checkbox-detector

	A YOLO12n model that detects checked and unchecked checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.

	## Quick start

	```python
	import natural_pdf as npdf

	pdf = npdf.PDF("form.pdf")
	checkboxes = pdf.pages[0].detect_checkboxes()

	for cb in checkboxes:
	print(cb.is_checked, cb.confidence, cb.bbox)
	```

	The model downloads automatically via `huggingface_hub`.

	## Model details

	\| \| \|
	\|---\|---\|
	\| Architecture \| YOLO12n (Ultralytics) \|
	\| Format \| ONNX (opset 18, onnxslim) \|
	\| Input \| 1024 x 1024 RGB \|
	\| Output \| 2 classes: `checkbox_checked`, `checkbox_unchecked` \|
	\| Size \| 10.3 MB \|
	\| Runtime \| onnxruntime (CPU) \|

	## Training data

	~5,100 document page images from two sources:

	- DocumentCloud: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like `"check all that apply"` and `"inspection checklist"`. Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters.
	- Derived from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.

	The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).

	\| Split \| Source images \| Tiles \|
	\|-------\|--------------\|-------\|
	\| Train \| 4,095 \| 16,243 \|
	\| Val \| 1,026 \| 4,026 \|
	\| Test \| 37 \| 37 (untiled) \|

	## Performance

	Final validation metrics (best checkpoint, 200 epoch run on A100 80GB):

	\| Class \| Precision \| Recall \| mAP50 \| mAP50-95 \|
	\|-------\|-----------\|--------\|-------\|----------\|
	\| All \| 0.945 \| 0.912 \| 0.941 \| 0.657 \|
	\| checkbox_checked \| 0.964 \| 0.962 \| 0.975 \| 0.684 \|
	\| checkbox_unchecked \| 0.926 \| 0.862 \| 0.915 \| 0.635 \|

	## Inference details

	natural-pdf renders pages at 72 DPI so that checkboxes appear at ~26-35px — close to the training distribution (~30-35px from ~1000px full-page images). If the rendered image exceeds 1024px in either dimension, it is sliced into overlapping 1024x1024 tiles (SAHI-style, 20% overlap) and detections are merged with NMS. Smaller images are letterboxed directly to 1024x1024.

	Only needs `onnxruntime`, `numpy`, `Pillow`, and `huggingface_hub` — no PyTorch or Ultralytics at inference time.

	## Background

	This model was inspired by [FFDNet-L](https://huggingface.co/jbarrow/FFDNet-L-cpu), a form field detector that can find unchecked checkboxes (as `choice_button`) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.

	## License

	Apache 2.0. Training data derived in part from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) (Apache 2.0).