Upload folder using huggingface_hub
Browse files- README.md +79 -0
- best_run2_interim.pt +3 -0
- checkbox_yolo12n.onnx +3 -0
README.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- checkbox-detection
|
| 5 |
+
- document-ai
|
| 6 |
+
- yolo
|
| 7 |
+
- onnx
|
| 8 |
+
- object-detection
|
| 9 |
+
library_name: natural-pdf
|
| 10 |
+
pipeline_tag: object-detection
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# checkbox-detector
|
| 14 |
+
|
| 15 |
+
A YOLO12n model that detects **checked** and **unchecked** checkboxes in document images. Exported to ONNX for fast CPU inference with no PyTorch dependency.
|
| 16 |
+
|
| 17 |
+
## Quick start
|
| 18 |
+
|
| 19 |
+
```python
|
| 20 |
+
import natural_pdf as npdf
|
| 21 |
+
|
| 22 |
+
pdf = npdf.PDF("form.pdf")
|
| 23 |
+
checkboxes = pdf.pages[0].detect_checkboxes()
|
| 24 |
+
|
| 25 |
+
for cb in checkboxes:
|
| 26 |
+
print(cb.is_checked, cb.confidence, cb.bbox)
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
The model downloads automatically via `huggingface_hub`.
|
| 30 |
+
|
| 31 |
+
## Model details
|
| 32 |
+
|
| 33 |
+
| | |
|
| 34 |
+
|---|---|
|
| 35 |
+
| Architecture | YOLO12n (Ultralytics) |
|
| 36 |
+
| Format | ONNX (opset 18, onnxslim) |
|
| 37 |
+
| Input | 1024 x 1024 RGB |
|
| 38 |
+
| Output | 2 classes: `checkbox_checked`, `checkbox_unchecked` |
|
| 39 |
+
| Size | 10.3 MB |
|
| 40 |
+
| Runtime | onnxruntime (CPU) |
|
| 41 |
+
|
| 42 |
+
## Training data
|
| 43 |
+
|
| 44 |
+
~5,100 document page images from two sources:
|
| 45 |
+
|
| 46 |
+
- **DocumentCloud**: Public government forms, medical intake forms, inspection checklists, voter registration forms, etc. Searched with queries like `"check all that apply"` and `"inspection checklist"`. Pages were annotated with Gemini (bounding boxes for checked/unchecked checkboxes), then validated with size, aspect ratio, and duplicate filters.
|
| 47 |
+
- **Derived from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms)** (Apache 2.0): We took a subset of their form page images, re-annotated them for our 2-class task, and synthetically filled in a portion of the unchecked checkboxes to create checked examples.
|
| 48 |
+
|
| 49 |
+
The combined dataset was tiled with SAHI-style 1024x1024 sliding windows (20% overlap) to handle small checkboxes on full-page scans. The final class ratio is roughly 1:1.8 (checked:unchecked).
|
| 50 |
+
|
| 51 |
+
| Split | Source images | Tiles |
|
| 52 |
+
|-------|--------------|-------|
|
| 53 |
+
| Train | 4,095 | 16,243 |
|
| 54 |
+
| Val | 1,026 | 4,026 |
|
| 55 |
+
| Test | 37 | 37 (untiled) |
|
| 56 |
+
|
| 57 |
+
## Performance
|
| 58 |
+
|
| 59 |
+
Final validation metrics (best checkpoint, 200 epoch run on A100 80GB):
|
| 60 |
+
|
| 61 |
+
| Class | Precision | Recall | mAP50 | mAP50-95 |
|
| 62 |
+
|-------|-----------|--------|-------|----------|
|
| 63 |
+
| All | 0.945 | 0.912 | 0.941 | 0.657 |
|
| 64 |
+
| checkbox_checked | 0.964 | 0.962 | 0.975 | 0.684 |
|
| 65 |
+
| checkbox_unchecked | 0.926 | 0.862 | 0.915 | 0.635 |
|
| 66 |
+
|
| 67 |
+
## Inference details
|
| 68 |
+
|
| 69 |
+
natural-pdf uses SAHI-style tiling at inference: the page is rendered at 150 DPI, sliced into overlapping 1024x1024 tiles, each tile is run through the model, and detections are merged with NMS. This matches the training pipeline and is important for detecting small checkboxes on full-page images.
|
| 70 |
+
|
| 71 |
+
Only needs `onnxruntime`, `numpy`, `Pillow`, and `huggingface_hub` — no PyTorch or Ultralytics at inference time.
|
| 72 |
+
|
| 73 |
+
## Background
|
| 74 |
+
|
| 75 |
+
This model was inspired by [FFDNet-L](https://huggingface.co/jbarrow/FFDNet-L-cpu), a form field detector that can find unchecked checkboxes (as `choice_button`) but doesn't distinguish checked from unchecked. We needed both states for document processing, so we built a dedicated 2-class detector.
|
| 76 |
+
|
| 77 |
+
## License
|
| 78 |
+
|
| 79 |
+
Apache 2.0. Training data derived in part from [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) (Apache 2.0).
|
best_run2_interim.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:25cf34289f794ed5817669e7c0b0142631c7a2597b5d189da122255810d4fbe6
|
| 3 |
+
size 10907058
|
checkbox_yolo12n.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:962011a5fcf171731155aafbfe5eb4e391bcbb2bd63763e2300ea658265cdc11
|
| 3 |
+
size 10585046
|