DeQA-Doc-Overall / README.md

Upload README.md with huggingface_hub

04b94c6 verified 14 days ago

6.54 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- image-quality-assessment
	- document-quality
	- mplug-owl2
	- vision-language
	- document-analysis
	- IQA
	pipeline_tag: image-to-text
	library_name: transformers
	---

	# DeQA-Doc-Overall: Document Image Quality Assessment

	DeQA-Doc-Overall is a vision-language model for assessing the overall quality of document images. It provides a quality score from 1 (bad) to 5 (excellent) that reflects the general visual quality of scanned or photographed documents.

	## Model Family

	This model is part of the DeQA-Doc family, which includes three specialized models:

	\| Model \| Description \| HuggingFace \|
	\|-------\|-------------\|-------------\|
	\| DeQA-Doc-Overall \| Overall document quality (this model) \| [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) \|
	\| DeQA-Doc-Color \| Color quality assessment \| [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) \|
	\| DeQA-Doc-Sharpness \| Sharpness/clarity assessment \| [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) \|

	## Quick Start

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	# Load the model
	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Overall",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	# Score an image
	image = Image.open("document.jpg").convert("RGB")
	score = model.score([image])
	print(f"Overall Quality Score: {score.item():.2f} / 5.0")
	```

	## Batch Processing

	You can score multiple images at once:

	```python
	images = [
	Image.open("doc1.jpg").convert("RGB"),
	Image.open("doc2.jpg").convert("RGB"),
	Image.open("doc3.jpg").convert("RGB"),
	]

	scores = model.score(images)
	for i, score in enumerate(scores):
	print(f"Document {i+1}: {score.item():.2f} / 5.0")
	```

	## Score Interpretation

	\| Score Range \| Quality Level \| Description \|
	\|-------------\|---------------\|-------------\|
	\| 4.5 - 5.0 \| Excellent \| Perfect quality, no visible defects \|
	\| 3.5 - 4.5 \| Good \| Minor imperfections, highly readable \|
	\| 2.5 - 3.5 \| Fair \| Noticeable issues but still usable \|
	\| 1.5 - 2.5 \| Poor \| Significant quality problems \|
	\| 1.0 - 1.5 \| Bad \| Severe degradation, hard to read \|

	## Model Architecture

	- Base Model: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
	- Vision Encoder: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
	- Language Model: LLaMA2-7B
	- Training: Full fine-tuning on document quality datasets
	- Input Resolution: Images are resized to 448x448 (with aspect ratio preservation)

	## Technical Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Size \| ~16 GB (float16) \|
	\| Parameters \| ~7.2B \|
	\| Input \| RGB images (any resolution) \|
	\| Output \| Quality score (1.0 - 5.0) \|
	\| Inference \| ~2-3 seconds per image on A100 \|

	## Hardware Requirements

	\| Setup \| VRAM Required \| Recommended \|
	\|-------\|---------------\|-------------\|
	\| Full precision (fp32) \| ~32 GB \| A100, H100 \|
	\| Half precision (fp16) \| ~16 GB \| A100, A40, RTX 4090 \|
	\| With CPU offload \| ~8 GB GPU + RAM \| RTX 3090, RTX 4080 \|

	### GPU Inference (Recommended)

	```python
	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Overall",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)
	```

	### CPU Offload (Lower VRAM)

	```python
	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Overall",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	offload_folder="/tmp/offload",
	)
	```

	## Installation

	```bash
	pip install torch transformers accelerate pillow sentencepiece protobuf
	```

	Note: Use `transformers>=4.36.0` for best compatibility.

	## Use Cases

	- Document Scanning QA: Automatically flag low-quality scans for re-scanning
	- Archive Digitization: Prioritize documents needing restoration
	- OCR Preprocessing: Filter images likely to produce poor OCR results
	- Document Management: Sort and categorize documents by quality
	- Quality Control: Automated quality checks in document processing pipelines

	## Example: Quality-Based Filtering

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image
	from pathlib import Path

	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Overall",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	# Filter documents by quality
	def filter_by_quality(image_paths, min_score=3.0):
	good_docs = []
	bad_docs = []

	for path in image_paths:
	img = Image.open(path).convert("RGB")
	score = model.score([img]).item()

	if score >= min_score:
	good_docs.append((path, score))
	else:
	bad_docs.append((path, score))

	return good_docs, bad_docs

	# Usage
	docs = list(Path("documents/").glob("*.jpg"))
	good, bad = filter_by_quality(docs, min_score=3.5)

	print(f"Good quality: {len(good)} documents")
	print(f"Need review: {len(bad)} documents")
	```

	## Limitations

	- Optimized for document images (forms, letters, reports, etc.)
	- May not perform well on natural photos or artistic images
	- Requires GPU with sufficient VRAM for efficient inference
	- Score is subjective and based on training data distribution

	## Credits & Attribution

	This model is based on the DeQA-Doc project by Junjie Gao et al., which won the Championship in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge.

	Original Repository: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc)

	All credit for the research, training methodology, and model architecture goes to the original authors.

	## Citation

	If you use this model in your research, please cite the original paper:

	```bibtex
	@inproceedings{deqadoc,
	title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
	author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
	year={2025},
	}
	```

	ArXiv: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796)

	## License

	Apache 2.0

	## Related Models

	- [DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) - Color quality assessment
	- [DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) - Sharpness assessment