File size: 9,378 Bytes

---
license: apache-2.0
language:
  - en
tags:
  - image-quality-assessment
  - document-quality
  - mplug-owl2
  - vision-language
  - document-analysis
  - sharpness
  - blur-detection
  - IQA
pipeline_tag: image-to-text
library_name: transformers
---

# DeQA-Doc-Sharpness: Document Image Sharpness Assessment

**DeQA-Doc-Sharpness** is a vision-language model specialized in assessing the **sharpness and clarity** of document images. It evaluates focus quality, blur levels, and text legibility in scanned or photographed documents.

## Model Family

This model is part of the **DeQA-Doc** family, which includes three specialized models:

| Model | Description | HuggingFace |
|-------|-------------|-------------|
| **DeQA-Doc-Overall** | Overall document quality | [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) |
| **DeQA-Doc-Color** | Color quality assessment | [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) |
| **DeQA-Doc-Sharpness** | Sharpness/clarity assessment (this model) | [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) |

## Quick Start

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    "mapo80/DeQA-Doc-Sharpness",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Score an image
image = Image.open("document.jpg").convert("RGB")
score = model.score([image])
print(f"Sharpness Score: {score.item():.2f} / 5.0")
```

## What Does Sharpness Quality Measure?

The sharpness score evaluates:

- **Focus Quality**: How well the document is in focus
- **Motion Blur**: Absence of blur from camera/scanner movement
- **Text Clarity**: Sharpness of text edges and characters
- **Detail Preservation**: Fine details are visible and crisp
- **Resolution Quality**: Adequate resolution for the content

## Score Interpretation

| Score Range | Quality Level | Typical Issues |
|-------------|---------------|----------------|
| 4.5 - 5.0 | **Excellent** | Perfectly sharp, crisp text |
| 3.5 - 4.5 | **Good** | Slight softness, still very readable |
| 2.5 - 3.5 | **Fair** | Noticeable blur, readable with effort |
| 1.5 - 2.5 | **Poor** | Significant blur, hard to read |
| 1.0 - 1.5 | **Bad** | Severe blur, text illegible |

## Batch Processing

```python
images = [
    Image.open("doc1.jpg").convert("RGB"),
    Image.open("doc2.jpg").convert("RGB"),
    Image.open("doc3.jpg").convert("RGB"),
]

scores = model.score(images)
for i, score in enumerate(scores):
    print(f"Document {i+1} Sharpness: {score.item():.2f} / 5.0")
```

## Use Cases

- **OCR Preprocessing**: Filter blurry images before OCR to improve accuracy
- **Document Capture QA**: Real-time feedback for mobile document scanning
- **Archive Quality Control**: Identify documents needing re-scanning
- **Blur Detection**: Automatic detection of out-of-focus captures
- **Scanner Maintenance**: Detect scanner focus issues

## Example: OCR Quality Gate

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image

model = AutoModelForCausalLM.from_pretrained(
    "mapo80/DeQA-Doc-Sharpness",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

def check_ocr_readiness(image_path, min_sharpness=3.5):
    """Check if document is sharp enough for reliable OCR."""
    img = Image.open(image_path).convert("RGB")
    score = model.score([img]).item()

    if score >= min_sharpness:
        return True, score, "Ready for OCR"
    elif score >= 2.5:
        return False, score, "May produce OCR errors - consider rescanning"
    else:
        return False, score, "Too blurry for OCR - rescan required"

ready, score, message = check_ocr_readiness("scan.jpg")
print(f"Sharpness: {score:.2f}/5.0 - {message}")

if ready:
    # Proceed with OCR
    pass
else:
    # Request rescan
    pass
```

## Example: Batch Quality Sorting

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image
from pathlib import Path

model = AutoModelForCausalLM.from_pretrained(
    "mapo80/DeQA-Doc-Sharpness",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto",
)

def sort_by_sharpness(image_folder):
    """Sort documents into quality buckets based on sharpness."""
    results = {"excellent": [], "good": [], "fair": [], "poor": [], "bad": []}

    for img_path in Path(image_folder).glob("*.jpg"):
        img = Image.open(img_path).convert("RGB")
        score = model.score([img]).item()

        if score >= 4.5:
            results["excellent"].append((img_path, score))
        elif score >= 3.5:
            results["good"].append((img_path, score))
        elif score >= 2.5:
            results["fair"].append((img_path, score))
        elif score >= 1.5:
            results["poor"].append((img_path, score))
        else:
            results["bad"].append((img_path, score))

    return results

# Usage
quality_report = sort_by_sharpness("scanned_docs/")
print(f"Excellent: {len(quality_report['excellent'])} documents")
print(f"Need rescan: {len(quality_report['poor']) + len(quality_report['bad'])} documents")
```

## Multi-Dimensional Quality Assessment

Combine with other DeQA-Doc models for comprehensive assessment:

```python
import torch
from transformers import AutoModelForCausalLM
from PIL import Image

# Load all three models
models = {
    "overall": AutoModelForCausalLM.from_pretrained(
        "mapo80/DeQA-Doc-Overall", trust_remote_code=True,
        torch_dtype=torch.float16, device_map="auto"
    ),
    "color": AutoModelForCausalLM.from_pretrained(
        "mapo80/DeQA-Doc-Color", trust_remote_code=True,
        torch_dtype=torch.float16, device_map="auto"
    ),
    "sharpness": AutoModelForCausalLM.from_pretrained(
        "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True,
        torch_dtype=torch.float16, device_map="auto"
    ),
}

def full_quality_report(image_path):
    img = Image.open(image_path).convert("RGB")

    scores = {}
    for name, model in models.items():
        scores[name] = model.score([img]).item()

    return scores

report = full_quality_report("document.jpg")
print(f"Overall:   {report['overall']:.2f}/5.0")
print(f"Color:     {report['color']:.2f}/5.0")
print(f"Sharpness: {report['sharpness']:.2f}/5.0")
```

## Model Architecture

- **Base Model**: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
- **Vision Encoder**: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
- **Language Model**: LLaMA2-7B
- **Training**: Full fine-tuning on document sharpness quality datasets
- **Input Resolution**: Images are resized to 448x448 (with aspect ratio preservation)

## Technical Details

| Property | Value |
|----------|-------|
| Model Size | ~16 GB (float16) |
| Parameters | ~7.2B |
| Input | RGB images (any resolution) |
| Output | Sharpness quality score (1.0 - 5.0) |
| Inference | ~2-3 seconds per image on A100 |

## Hardware Requirements

| Setup | VRAM Required | Recommended |
|-------|---------------|-------------|
| Full precision (fp32) | ~32 GB | A100, H100 |
| Half precision (fp16) | ~16 GB | A100, A40, RTX 4090 |
| With CPU offload | ~8 GB GPU + RAM | RTX 3090, RTX 4080 |

## Installation

```bash
pip install torch transformers accelerate pillow sentencepiece protobuf
```

**Note**: Use `transformers>=4.36.0` for best compatibility.

## Comparison with Traditional Methods

| Method | Pros | Cons |
|--------|------|------|
| **Laplacian Variance** | Fast, simple | Only measures edge intensity |
| **FFT-based** | Frequency analysis | Sensitive to image content |
| **Gradient-based** | Good for text | Requires tuning |
| **DeQA-Doc-Sharpness** | Content-aware, trained on documents | Requires GPU |

DeQA-Doc-Sharpness understands document context and can differentiate between intentionally smooth backgrounds and unintentional blur.

## Limitations

- Optimized for document images (text, forms, letters)
- May not generalize well to natural photos
- Requires GPU with sufficient VRAM for efficient inference
- Sharpness assessment is relative to training data distribution

## Credits & Attribution

This model is based on the **DeQA-Doc** project by Junjie Gao et al., which won the **Championship** in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge.

**Original Repository**: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc)

All credit for the research, training methodology, and model architecture goes to the original authors.

## Citation

If you use this model in your research, please cite the original paper:

```bibtex
@inproceedings{deqadoc,
  title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
  author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
  year={2025},
}
```

**ArXiv**: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796)

## License

Apache 2.0

## Related Models

- [DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) - Overall quality assessment
- [DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) - Color quality assessment