File size: 9,378 Bytes
07fdac1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0800028
 
 
 
 
 
 
 
07fdac1
 
0800028
 
07fdac1
0800028
 
 
 
 
07fdac1
 
 
0800028
 
07fdac1
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
license: apache-2.0
language:
  - en
tags:
  - image-quality-assessment
  - document-quality
  - mplug-owl2
  - vision-language
  - document-analysis
  - sharpness
  - blur-detection
  - IQA
pipeline_tag: image-to-text
library_name: transformers
---

# DeQA-Doc-Sharpness: Document Image Sharpness Assessment

**DeQA-Doc-Sharpness** is a vision-language model specialized in assessing the **sharpness and clarity** of document images. It evaluates focus quality, blur levels, and text legibility in scanned or photographed documents.

## Model Family

This model is part of the **DeQA-Doc** family, which includes three specialized models:

| Model | Description | HuggingFace |
|-------|-------------|-------------|
| **DeQA-Doc-Overall** | Overall document quality | [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) |
| **DeQA-Doc-Color** | Color quality assessment | [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) |
| **DeQA-Doc-Sharpness** | Sharpness/clarity assessment (this model) | [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) |

## Quick Start

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    "mapo80/DeQA-Doc-Sharpness",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Score an image
image = Image.open("document.jpg").convert("RGB")
score = model.score([image])
print(f"Sharpness Score: {score.item():.2f} / 5.0")
```

## What Does Sharpness Quality Measure?

The sharpness score evaluates:

- **Focus Quality**: How well the document is in focus
- **Motion Blur**: Absence of blur from camera/scanner movement
- **Text Clarity**: Sharpness of text edges and characters
- **Detail Preservation**: Fine details are visible and crisp
- **Resolution Quality**: Adequate resolution for the content

## Score Interpretation

| Score Range | Quality Level | Typical Issues |
|-------------|---------------|----------------|
| 4.5 - 5.0 | **Excellent** | Perfectly sharp, crisp text |
| 3.5 - 4.5 | **Good** | Slight softness, still very readable |
| 2.5 - 3.5 | **Fair** | Noticeable blur, readable with effort |
| 1.5 - 2.5 | **Poor** | Significant blur, hard to read |
| 1.0 - 1.5 | **Bad** | Severe blur, text illegible |

## Batch Processing

```python
images = [
    Image.open("doc1.jpg").convert("RGB"),
    Image.open("doc2.jpg").convert("RGB"),
    Image.open("doc3.jpg").convert("RGB"),
]

scores = model.score(images)
for i, score in enumerate(scores):
    print(f"Document {i+1} Sharpness: {score.item():.2f} / 5.0")
```

## Use Cases

- **OCR Preprocessing**: Filter blurry images before OCR to improve accuracy
- **Document Capture QA**: Real-time feedback for mobile document scanning
- **Archive Quality Control**: Identify documents needing re-scanning
- **Blur Detection**: Automatic detection of out-of-focus captures
- **Scanner Maintenance**: Detect scanner focus issues

## Example: OCR Quality Gate

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image

model = AutoModelForCausalLM.from_pretrained(
    "mapo80/DeQA-Doc-Sharpness",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

def check_ocr_readiness(image_path, min_sharpness=3.5):
    """Check if document is sharp enough for reliable OCR."""
    img = Image.open(image_path).convert("RGB")
    score = model.score([img]).item()

    if score >= min_sharpness:
        return True, score, "Ready for OCR"
    elif score >= 2.5:
        return False, score, "May produce OCR errors - consider rescanning"
    else:
        return False, score, "Too blurry for OCR - rescan required"

ready, score, message = check_ocr_readiness("scan.jpg")
print(f"Sharpness: {score:.2f}/5.0 - {message}")

if ready:
    # Proceed with OCR
    pass
else:
    # Request rescan
    pass
```

## Example: Batch Quality Sorting

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image
from pathlib import Path

model = AutoModelForCausalLM.from_pretrained(
    "mapo80/DeQA-Doc-Sharpness",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

def sort_by_sharpness(image_folder):
    """Sort documents into quality buckets based on sharpness."""
    results = {"excellent": [], "good": [], "fair": [], "poor": [], "bad": []}

    for img_path in Path(image_folder).glob("*.jpg"):
        img = Image.open(img_path).convert("RGB")
        score = model.score([img]).item()

        if score >= 4.5:
            results["excellent"].append((img_path, score))
        elif score >= 3.5:
            results["good"].append((img_path, score))
        elif score >= 2.5:
            results["fair"].append((img_path, score))
        elif score >= 1.5:
            results["poor"].append((img_path, score))
        else:
            results["bad"].append((img_path, score))

    return results

# Usage
quality_report = sort_by_sharpness("scanned_docs/")
print(f"Excellent: {len(quality_report['excellent'])} documents")
print(f"Need rescan: {len(quality_report['poor']) + len(quality_report['bad'])} documents")
```

## Multi-Dimensional Quality Assessment

Combine with other DeQA-Doc models for comprehensive assessment:

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image

# Load all three models
models = {
    "overall": AutoModelForCausalLM.from_pretrained(
        "mapo80/DeQA-Doc-Overall", trust_remote_code=True,
        torch_dtype=torch.float16, device_map="auto"
    ),
    "color": AutoModelForCausalLM.from_pretrained(
        "mapo80/DeQA-Doc-Color", trust_remote_code=True,
        torch_dtype=torch.float16, device_map="auto"
    ),
    "sharpness": AutoModelForCausalLM.from_pretrained(
        "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True,
        torch_dtype=torch.float16, device_map="auto"
    ),
}

def full_quality_report(image_path):
    img = Image.open(image_path).convert("RGB")

    scores = {}
    for name, model in models.items():
        scores[name] = model.score([img]).item()

    return scores

report = full_quality_report("document.jpg")
print(f"Overall:   {report['overall']:.2f}/5.0")
print(f"Color:     {report['color']:.2f}/5.0")
print(f"Sharpness: {report['sharpness']:.2f}/5.0")
```

## Model Architecture

- **Base Model**: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
- **Vision Encoder**: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
- **Language Model**: LLaMA2-7B
- **Training**: Full fine-tuning on document sharpness quality datasets
- **Input Resolution**: Images are resized to 448x448 (with aspect ratio preservation)

## Technical Details

| Property | Value |
|----------|-------|
| Model Size | ~16 GB (float16) |
| Parameters | ~7.2B |
| Input | RGB images (any resolution) |
| Output | Sharpness quality score (1.0 - 5.0) |
| Inference | ~2-3 seconds per image on A100 |

## Hardware Requirements

| Setup | VRAM Required | Recommended |
|-------|---------------|-------------|
| Full precision (fp32) | ~32 GB | A100, H100 |
| Half precision (fp16) | ~16 GB | A100, A40, RTX 4090 |
| With CPU offload | ~8 GB GPU + RAM | RTX 3090, RTX 4080 |

## Installation

```bash
pip install torch transformers accelerate pillow sentencepiece protobuf
```

**Note**: Use `transformers>=4.36.0` for best compatibility.

## Comparison with Traditional Methods

| Method | Pros | Cons |
|--------|------|------|
| **Laplacian Variance** | Fast, simple | Only measures edge intensity |
| **FFT-based** | Frequency analysis | Sensitive to image content |
| **Gradient-based** | Good for text | Requires tuning |
| **DeQA-Doc-Sharpness** | Content-aware, trained on documents | Requires GPU |

DeQA-Doc-Sharpness understands document context and can differentiate between intentionally smooth backgrounds and unintentional blur.

## Limitations

- Optimized for document images (text, forms, letters)
- May not generalize well to natural photos
- Requires GPU with sufficient VRAM for efficient inference
- Sharpness assessment is relative to training data distribution

## Credits & Attribution

This model is based on the **DeQA-Doc** project by Junjie Gao et al., which won the **Championship** in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge.

**Original Repository**: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc)

All credit for the research, training methodology, and model architecture goes to the original authors.

## Citation

If you use this model in your research, please cite the original paper:

```bibtex
@inproceedings{deqadoc,
  title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
  author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
  year={2025},
}
```

**ArXiv**: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796)

## License

Apache 2.0

## Related Models

- [DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) - Overall quality assessment
- [DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) - Color quality assessment