xaitalk commited on
Commit
6c8e312
·
verified ·
1 Parent(s): 1cc5dae

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - document-scanner
5
+ - document-detection
6
+ - perspective-correction
7
+ - object-detection
8
+ - segmentation
9
+ - yolo
10
+ - sam
11
+ library_name: pagescan
12
+ pipeline_tag: object-detection
13
+ ---
14
+
15
+ # pagescan weights
16
+
17
+ Pre-trained model weights used by [pagescan](https://github.com/7RPlus-GmbH/pagescan), a privacy-first document scanner that turns phone photos into clean, deskewed, print-ready PDFs.
18
+
19
+ These weights power the default detection cascade:
20
+
21
+ 1. **YOLO11** detects a coarse axis-aligned bounding box around the document.
22
+ 2. **HQ-SAM ViT-B**, prompted by that bbox, returns a precise binary mask.
23
+ 3. The mask is fitted to a 4-corner quadrilateral for perspective correction.
24
+
25
+ ## Files
26
+
27
+ | File | Size | Description |
28
+ |---|---|---|
29
+ | `yolo_doc_v1.onnx` | 11 MB | Exported YOLO11 detector — used at inference time by `pagescan.detector` (no torch required). |
30
+ | `yolo_doc_v1.pt` | 5.3 MB | Original PyTorch checkpoint — kept for re-export / fine-tuning. |
31
+ | `sam_hq_vit_b.pth` | 362 MB | HQ-SAM ViT-B checkpoint — re-hosted copy (see attribution below). |
32
+
33
+ ## Usage
34
+
35
+ These weights are downloaded automatically by pagescan on first use:
36
+
37
+ ```bash
38
+ pip install pagescan
39
+ python -c "from pagescan import scan; scan('photo.jpg', 'out.pdf')"
40
+ ```
41
+
42
+ To pin a specific revision or use a local cache:
43
+
44
+ ```python
45
+ from huggingface_hub import hf_hub_download
46
+
47
+ onnx_path = hf_hub_download(
48
+ repo_id="7rplus/pagescan-weights",
49
+ filename="yolo_doc_v1.onnx",
50
+ )
51
+ ```
52
+
53
+ ## Training
54
+
55
+ `yolo_doc_v1` was trained from `yolo11n-obb.pt` on a 1000-photo private corpus
56
+ of phone-captured documents (Dec 2025) with oriented bounding box (OBB) labels.
57
+ The training script lives in [`training/yolo/`](https://github.com/7RPlus-GmbH/pagescan/tree/main/training/yolo).
58
+ A v2 trained on an extended, distribution-balanced corpus is in progress.
59
+
60
+ ## Attribution — HQ-SAM
61
+
62
+ `sam_hq_vit_b.pth` is **not original work from this repository**. It is a re-hosted copy of the ViT-B HQ-SAM checkpoint from the HQ-SAM authors, included here so pagescan installs ship with a single, stable weight source.
63
+
64
+ > **HQ-SAM** — *Segment Anything in High Quality*
65
+ > Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu. *NeurIPS 2023*.
66
+ > Paper: <https://arxiv.org/abs/2306.01567>
67
+ > Code & original weights: <https://github.com/SysCV/sam-hq>
68
+ > Original Hugging Face mirror: <https://huggingface.co/lkeab/hq-sam>
69
+
70
+ HQ-SAM is released under the **Apache 2.0** license. We re-host the ViT-B checkpoint unchanged; all credit for the model itself belongs to the original authors. If you use pagescan in research that depends on the cascade's segmentation quality, please cite the HQ-SAM paper:
71
+
72
+ ```bibtex
73
+ @inproceedings{ke2023segment,
74
+ title = {Segment Anything in High Quality},
75
+ author = {Ke, Lei and Ye, Mingqiao and Danelljan, Martin and Liu, Yifan and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
76
+ booktitle = {Advances in Neural Information Processing Systems},
77
+ year = {2023}
78
+ }
79
+ ```
80
+
81
+ ## License
82
+
83
+ - `yolo_doc_v1.{pt,onnx}` — released under Apache 2.0 by 7R+ GmbH.
84
+ - `sam_hq_vit_b.pth` — Apache 2.0, original authors (see above).
85
+
86
+ The pagescan package itself is MIT-licensed; see the [main repository](https://github.com/7RPlus-GmbH/pagescan).