mapo80 commited on
Commit
35b6636
·
verified ·
1 Parent(s): 3a6d599

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +209 -0
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - image-quality-assessment
7
+ - document-quality
8
+ - mplug-owl2
9
+ - vision-language
10
+ - document-analysis
11
+ - IQA
12
+ pipeline_tag: image-to-text
13
+ library_name: transformers
14
+ ---
15
+
16
+ # DeQA-Doc-Overall: Document Image Quality Assessment
17
+
18
+ **DeQA-Doc-Overall** is a vision-language model for assessing the **overall quality** of document images. It provides a quality score from 1 (bad) to 5 (excellent) that reflects the general visual quality of scanned or photographed documents.
19
+
20
+ ## Model Family
21
+
22
+ This model is part of the **DeQA-Doc** family, which includes three specialized models:
23
+
24
+ | Model | Description | HuggingFace |
25
+ |-------|-------------|-------------|
26
+ | **DeQA-Doc-Overall** | Overall document quality (this model) | [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) |
27
+ | **DeQA-Doc-Color** | Color quality assessment | [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) |
28
+ | **DeQA-Doc-Sharpness** | Sharpness/clarity assessment | [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) |
29
+
30
+ ## Quick Start
31
+
32
+ ```python
33
+ import torch
34
+ from transformers import AutoModelForCausalLM
35
+ from PIL import Image
36
+
37
+ # Load the model
38
+ model = AutoModelForCausalLM.from_pretrained(
39
+ "mapo80/DeQA-Doc-Overall",
40
+ trust_remote_code=True,
41
+ torch_dtype=torch.float16,
42
+ device_map="auto",
43
+ )
44
+
45
+ # Score an image
46
+ image = Image.open("document.jpg").convert("RGB")
47
+ score = model.score([image])
48
+ print(f"Overall Quality Score: {score.item():.2f} / 5.0")
49
+ ```
50
+
51
+ ## Batch Processing
52
+
53
+ You can score multiple images at once:
54
+
55
+ ```python
56
+ images = [
57
+ Image.open("doc1.jpg").convert("RGB"),
58
+ Image.open("doc2.jpg").convert("RGB"),
59
+ Image.open("doc3.jpg").convert("RGB"),
60
+ ]
61
+
62
+ scores = model.score(images)
63
+ for i, score in enumerate(scores):
64
+ print(f"Document {i+1}: {score.item():.2f} / 5.0")
65
+ ```
66
+
67
+ ## Score Interpretation
68
+
69
+ | Score Range | Quality Level | Description |
70
+ |-------------|---------------|-------------|
71
+ | 4.5 - 5.0 | **Excellent** | Perfect quality, no visible defects |
72
+ | 3.5 - 4.5 | **Good** | Minor imperfections, highly readable |
73
+ | 2.5 - 3.5 | **Fair** | Noticeable issues but still usable |
74
+ | 1.5 - 2.5 | **Poor** | Significant quality problems |
75
+ | 1.0 - 1.5 | **Bad** | Severe degradation, hard to read |
76
+
77
+ ## Model Architecture
78
+
79
+ - **Base Model**: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
80
+ - **Vision Encoder**: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
81
+ - **Language Model**: LLaMA2-7B
82
+ - **Training**: Full fine-tuning on document quality datasets
83
+ - **Input Resolution**: Images are resized to 448x448 (with aspect ratio preservation)
84
+
85
+ ## Technical Details
86
+
87
+ | Property | Value |
88
+ |----------|-------|
89
+ | Model Size | ~16 GB (float16) |
90
+ | Parameters | ~7.2B |
91
+ | Input | RGB images (any resolution) |
92
+ | Output | Quality score (1.0 - 5.0) |
93
+ | Inference | ~2-3 seconds per image on A100 |
94
+
95
+ ## Hardware Requirements
96
+
97
+ | Setup | VRAM Required | Recommended |
98
+ |-------|---------------|-------------|
99
+ | Full precision (fp32) | ~32 GB | A100, H100 |
100
+ | Half precision (fp16) | ~16 GB | A100, A40, RTX 4090 |
101
+ | With CPU offload | ~8 GB GPU + RAM | RTX 3090, RTX 4080 |
102
+
103
+ ### GPU Inference (Recommended)
104
+
105
+ ```python
106
+ model = AutoModelForCausalLM.from_pretrained(
107
+ "mapo80/DeQA-Doc-Overall",
108
+ trust_remote_code=True,
109
+ torch_dtype=torch.float16,
110
+ device_map="auto",
111
+ )
112
+ ```
113
+
114
+ ### CPU Offload (Lower VRAM)
115
+
116
+ ```python
117
+ model = AutoModelForCausalLM.from_pretrained(
118
+ "mapo80/DeQA-Doc-Overall",
119
+ trust_remote_code=True,
120
+ torch_dtype=torch.float16,
121
+ device_map="auto",
122
+ offload_folder="/tmp/offload",
123
+ )
124
+ ```
125
+
126
+ ## Installation
127
+
128
+ ```bash
129
+ pip install torch transformers accelerate pillow sentencepiece protobuf
130
+ ```
131
+
132
+ **Note**: Use `transformers>=4.36.0` for best compatibility.
133
+
134
+ ## Use Cases
135
+
136
+ - **Document Scanning QA**: Automatically flag low-quality scans for re-scanning
137
+ - **Archive Digitization**: Prioritize documents needing restoration
138
+ - **OCR Preprocessing**: Filter images likely to produce poor OCR results
139
+ - **Document Management**: Sort and categorize documents by quality
140
+ - **Quality Control**: Automated quality checks in document processing pipelines
141
+
142
+ ## Example: Quality-Based Filtering
143
+
144
+ ```python
145
+ import torch
146
+ from transformers import AutoModelForCausalLM
147
+ from PIL import Image
148
+ from pathlib import Path
149
+
150
+ model = AutoModelForCausalLM.from_pretrained(
151
+ "mapo80/DeQA-Doc-Overall",
152
+ trust_remote_code=True,
153
+ torch_dtype=torch.float16,
154
+ device_map="auto",
155
+ )
156
+
157
+ # Filter documents by quality
158
+ def filter_by_quality(image_paths, min_score=3.0):
159
+ good_docs = []
160
+ bad_docs = []
161
+
162
+ for path in image_paths:
163
+ img = Image.open(path).convert("RGB")
164
+ score = model.score([img]).item()
165
+
166
+ if score >= min_score:
167
+ good_docs.append((path, score))
168
+ else:
169
+ bad_docs.append((path, score))
170
+
171
+ return good_docs, bad_docs
172
+
173
+ # Usage
174
+ docs = list(Path("documents/").glob("*.jpg"))
175
+ good, bad = filter_by_quality(docs, min_score=3.5)
176
+
177
+ print(f"Good quality: {len(good)} documents")
178
+ print(f"Need review: {len(bad)} documents")
179
+ ```
180
+
181
+ ## Limitations
182
+
183
+ - Optimized for document images (forms, letters, reports, etc.)
184
+ - May not perform well on natural photos or artistic images
185
+ - Requires GPU with sufficient VRAM for efficient inference
186
+ - Score is subjective and based on training data distribution
187
+
188
+ ## Citation
189
+
190
+ If you use this model in your research, please cite:
191
+
192
+ ```bibtex
193
+ @misc{deqa-doc-2024,
194
+ title={DeQA-Doc: Document Image Quality Assessment},
195
+ author={mapo80},
196
+ year={2024},
197
+ publisher={HuggingFace},
198
+ url={https://huggingface.co/mapo80/DeQA-Doc-Overall}
199
+ }
200
+ ```
201
+
202
+ ## License
203
+
204
+ Apache 2.0
205
+
206
+ ## Related Models
207
+
208
+ - [DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) - Color quality assessment
209
+ - [DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) - Sharpness assessment