DeQA-Doc-Sharpness / README.md

Upload README.md with huggingface_hub

0800028 verified 14 days ago

9.38 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- image-quality-assessment
	- document-quality
	- mplug-owl2
	- vision-language
	- document-analysis
	- sharpness
	- blur-detection
	- IQA
	pipeline_tag: image-to-text
	library_name: transformers
	---

	# DeQA-Doc-Sharpness: Document Image Sharpness Assessment

	DeQA-Doc-Sharpness is a vision-language model specialized in assessing the sharpness and clarity of document images. It evaluates focus quality, blur levels, and text legibility in scanned or photographed documents.

	## Model Family

	This model is part of the DeQA-Doc family, which includes three specialized models:

	\| Model \| Description \| HuggingFace \|
	\|-------\|-------------\|-------------\|
	\| DeQA-Doc-Overall \| Overall document quality \| [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) \|
	\| DeQA-Doc-Color \| Color quality assessment \| [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) \|
	\| DeQA-Doc-Sharpness \| Sharpness/clarity assessment (this model) \| [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) \|

	## Quick Start

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	# Load the model
	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Sharpness",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	# Score an image
	image = Image.open("document.jpg").convert("RGB")
	score = model.score([image])
	print(f"Sharpness Score: {score.item():.2f} / 5.0")
	```

	## What Does Sharpness Quality Measure?

	The sharpness score evaluates:

	- Focus Quality: How well the document is in focus
	- Motion Blur: Absence of blur from camera/scanner movement
	- Text Clarity: Sharpness of text edges and characters
	- Detail Preservation: Fine details are visible and crisp
	- Resolution Quality: Adequate resolution for the content

	## Score Interpretation

	\| Score Range \| Quality Level \| Typical Issues \|
	\|-------------\|---------------\|----------------\|
	\| 4.5 - 5.0 \| Excellent \| Perfectly sharp, crisp text \|
	\| 3.5 - 4.5 \| Good \| Slight softness, still very readable \|
	\| 2.5 - 3.5 \| Fair \| Noticeable blur, readable with effort \|
	\| 1.5 - 2.5 \| Poor \| Significant blur, hard to read \|
	\| 1.0 - 1.5 \| Bad \| Severe blur, text illegible \|

	## Batch Processing

	```python
	images = [
	Image.open("doc1.jpg").convert("RGB"),
	Image.open("doc2.jpg").convert("RGB"),
	Image.open("doc3.jpg").convert("RGB"),
	]

	scores = model.score(images)
	for i, score in enumerate(scores):
	print(f"Document {i+1} Sharpness: {score.item():.2f} / 5.0")
	```

	## Use Cases

	- OCR Preprocessing: Filter blurry images before OCR to improve accuracy
	- Document Capture QA: Real-time feedback for mobile document scanning
	- Archive Quality Control: Identify documents needing re-scanning
	- Blur Detection: Automatic detection of out-of-focus captures
	- Scanner Maintenance: Detect scanner focus issues

	## Example: OCR Quality Gate

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Sharpness",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	def check_ocr_readiness(image_path, min_sharpness=3.5):
	"""Check if document is sharp enough for reliable OCR."""
	img = Image.open(image_path).convert("RGB")
	score = model.score([img]).item()

	if score >= min_sharpness:
	return True, score, "Ready for OCR"
	elif score >= 2.5:
	return False, score, "May produce OCR errors - consider rescanning"
	else:
	return False, score, "Too blurry for OCR - rescan required"

	ready, score, message = check_ocr_readiness("scan.jpg")
	print(f"Sharpness: {score:.2f}/5.0 - {message}")

	if ready:
	# Proceed with OCR
	pass
	else:
	# Request rescan
	pass
	```

	## Example: Batch Quality Sorting

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image
	from pathlib import Path

	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Sharpness",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	def sort_by_sharpness(image_folder):
	"""Sort documents into quality buckets based on sharpness."""
	results = {"excellent": [], "good": [], "fair": [], "poor": [], "bad": []}

	for img_path in Path(image_folder).glob("*.jpg"):
	img = Image.open(img_path).convert("RGB")
	score = model.score([img]).item()

	if score >= 4.5:
	results["excellent"].append((img_path, score))
	elif score >= 3.5:
	results["good"].append((img_path, score))
	elif score >= 2.5:
	results["fair"].append((img_path, score))
	elif score >= 1.5:
	results["poor"].append((img_path, score))
	else:
	results["bad"].append((img_path, score))

	return results

	# Usage
	quality_report = sort_by_sharpness("scanned_docs/")
	print(f"Excellent: {len(quality_report['excellent'])} documents")
	print(f"Need rescan: {len(quality_report['poor']) + len(quality_report['bad'])} documents")
	```

	## Multi-Dimensional Quality Assessment

	Combine with other DeQA-Doc models for comprehensive assessment:

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	# Load all three models
	models = {
	"overall": AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Overall", trust_remote_code=True,
	torch_dtype=torch.float16, device_map="auto"
	),
	"color": AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Color", trust_remote_code=True,
	torch_dtype=torch.float16, device_map="auto"
	),
	"sharpness": AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Sharpness", trust_remote_code=True,
	torch_dtype=torch.float16, device_map="auto"
	),
	}

	def full_quality_report(image_path):
	img = Image.open(image_path).convert("RGB")

	scores = {}
	for name, model in models.items():
	scores[name] = model.score([img]).item()

	return scores

	report = full_quality_report("document.jpg")
	print(f"Overall: {report['overall']:.2f}/5.0")
	print(f"Color: {report['color']:.2f}/5.0")
	print(f"Sharpness: {report['sharpness']:.2f}/5.0")
	```

	## Model Architecture

	- Base Model: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
	- Vision Encoder: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
	- Language Model: LLaMA2-7B
	- Training: Full fine-tuning on document sharpness quality datasets
	- Input Resolution: Images are resized to 448x448 (with aspect ratio preservation)

	## Technical Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Size \| ~16 GB (float16) \|
	\| Parameters \| ~7.2B \|
	\| Input \| RGB images (any resolution) \|
	\| Output \| Sharpness quality score (1.0 - 5.0) \|
	\| Inference \| ~2-3 seconds per image on A100 \|

	## Hardware Requirements

	\| Setup \| VRAM Required \| Recommended \|
	\|-------\|---------------\|-------------\|
	\| Full precision (fp32) \| ~32 GB \| A100, H100 \|
	\| Half precision (fp16) \| ~16 GB \| A100, A40, RTX 4090 \|
	\| With CPU offload \| ~8 GB GPU + RAM \| RTX 3090, RTX 4080 \|

	## Installation

	```bash
	pip install torch transformers accelerate pillow sentencepiece protobuf
	```

	Note: Use `transformers>=4.36.0` for best compatibility.

	## Comparison with Traditional Methods

	\| Method \| Pros \| Cons \|
	\|--------\|------\|------\|
	\| Laplacian Variance \| Fast, simple \| Only measures edge intensity \|
	\| FFT-based \| Frequency analysis \| Sensitive to image content \|
	\| Gradient-based \| Good for text \| Requires tuning \|
	\| DeQA-Doc-Sharpness \| Content-aware, trained on documents \| Requires GPU \|

	DeQA-Doc-Sharpness understands document context and can differentiate between intentionally smooth backgrounds and unintentional blur.

	## Limitations

	- Optimized for document images (text, forms, letters)
	- May not generalize well to natural photos
	- Requires GPU with sufficient VRAM for efficient inference
	- Sharpness assessment is relative to training data distribution

	## Credits & Attribution

	This model is based on the DeQA-Doc project by Junjie Gao et al., which won the Championship in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge.

	Original Repository: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc)

	All credit for the research, training methodology, and model architecture goes to the original authors.

	## Citation

	If you use this model in your research, please cite the original paper:

	```bibtex
	@inproceedings{deqadoc,
	title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
	author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
	year={2025},
	}
	```

	ArXiv: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796)

	## License

	Apache 2.0

	## Related Models

	- [DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) - Overall quality assessment
	- [DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) - Color quality assessment