Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- document-scanner
|
| 5 |
+
- document-detection
|
| 6 |
+
- perspective-correction
|
| 7 |
+
- object-detection
|
| 8 |
+
- segmentation
|
| 9 |
+
- yolo
|
| 10 |
+
- sam
|
| 11 |
+
library_name: pagescan
|
| 12 |
+
pipeline_tag: object-detection
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# pagescan weights
|
| 16 |
+
|
| 17 |
+
Pre-trained model weights used by [pagescan](https://github.com/7RPlus-GmbH/pagescan), a privacy-first document scanner that turns phone photos into clean, deskewed, print-ready PDFs.
|
| 18 |
+
|
| 19 |
+
These weights power the default detection cascade:
|
| 20 |
+
|
| 21 |
+
1. **YOLO11** detects a coarse axis-aligned bounding box around the document.
|
| 22 |
+
2. **HQ-SAM ViT-B**, prompted by that bbox, returns a precise binary mask.
|
| 23 |
+
3. The mask is fitted to a 4-corner quadrilateral for perspective correction.
|
| 24 |
+
|
| 25 |
+
## Files
|
| 26 |
+
|
| 27 |
+
| File | Size | Description |
|
| 28 |
+
|---|---|---|
|
| 29 |
+
| `yolo_doc_v1.onnx` | 11 MB | Exported YOLO11 detector — used at inference time by `pagescan.detector` (no torch required). |
|
| 30 |
+
| `yolo_doc_v1.pt` | 5.3 MB | Original PyTorch checkpoint — kept for re-export / fine-tuning. |
|
| 31 |
+
| `sam_hq_vit_b.pth` | 362 MB | HQ-SAM ViT-B checkpoint — re-hosted copy (see attribution below). |
|
| 32 |
+
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
These weights are downloaded automatically by pagescan on first use:
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
pip install pagescan
|
| 39 |
+
python -c "from pagescan import scan; scan('photo.jpg', 'out.pdf')"
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
To pin a specific revision or use a local cache:
|
| 43 |
+
|
| 44 |
+
```python
|
| 45 |
+
from huggingface_hub import hf_hub_download
|
| 46 |
+
|
| 47 |
+
onnx_path = hf_hub_download(
|
| 48 |
+
repo_id="7rplus/pagescan-weights",
|
| 49 |
+
filename="yolo_doc_v1.onnx",
|
| 50 |
+
)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Training
|
| 54 |
+
|
| 55 |
+
`yolo_doc_v1` was trained from `yolo11n-obb.pt` on a 1000-photo private corpus
|
| 56 |
+
of phone-captured documents (Dec 2025) with oriented bounding box (OBB) labels.
|
| 57 |
+
The training script lives in [`training/yolo/`](https://github.com/7RPlus-GmbH/pagescan/tree/main/training/yolo).
|
| 58 |
+
A v2 trained on an extended, distribution-balanced corpus is in progress.
|
| 59 |
+
|
| 60 |
+
## Attribution — HQ-SAM
|
| 61 |
+
|
| 62 |
+
`sam_hq_vit_b.pth` is **not original work from this repository**. It is a re-hosted copy of the ViT-B HQ-SAM checkpoint from the HQ-SAM authors, included here so pagescan installs ship with a single, stable weight source.
|
| 63 |
+
|
| 64 |
+
> **HQ-SAM** — *Segment Anything in High Quality*
|
| 65 |
+
> Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu. *NeurIPS 2023*.
|
| 66 |
+
> Paper: <https://arxiv.org/abs/2306.01567>
|
| 67 |
+
> Code & original weights: <https://github.com/SysCV/sam-hq>
|
| 68 |
+
> Original Hugging Face mirror: <https://huggingface.co/lkeab/hq-sam>
|
| 69 |
+
|
| 70 |
+
HQ-SAM is released under the **Apache 2.0** license. We re-host the ViT-B checkpoint unchanged; all credit for the model itself belongs to the original authors. If you use pagescan in research that depends on the cascade's segmentation quality, please cite the HQ-SAM paper:
|
| 71 |
+
|
| 72 |
+
```bibtex
|
| 73 |
+
@inproceedings{ke2023segment,
|
| 74 |
+
title = {Segment Anything in High Quality},
|
| 75 |
+
author = {Ke, Lei and Ye, Mingqiao and Danelljan, Martin and Liu, Yifan and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
|
| 76 |
+
booktitle = {Advances in Neural Information Processing Systems},
|
| 77 |
+
year = {2023}
|
| 78 |
+
}
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## License
|
| 82 |
+
|
| 83 |
+
- `yolo_doc_v1.{pt,onnx}` — released under Apache 2.0 by 7R+ GmbH.
|
| 84 |
+
- `sam_hq_vit_b.pth` — Apache 2.0, original authors (see above).
|
| 85 |
+
|
| 86 |
+
The pagescan package itself is MIT-licensed; see the [main repository](https://github.com/7RPlus-GmbH/pagescan).
|