--- license: apache-2.0 language: - en tags: - image-quality-assessment - document-quality - mplug-owl2 - vision-language - document-analysis - sharpness - blur-detection - IQA pipeline_tag: image-to-text library_name: transformers --- # DeQA-Doc-Sharpness: Document Image Sharpness Assessment **DeQA-Doc-Sharpness** is a vision-language model specialized in assessing the **sharpness and clarity** of document images. It evaluates focus quality, blur levels, and text legibility in scanned or photographed documents. ## Model Family This model is part of the **DeQA-Doc** family, which includes three specialized models: | Model | Description | HuggingFace | |-------|-------------|-------------| | **DeQA-Doc-Overall** | Overall document quality | [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) | | **DeQA-Doc-Color** | Color quality assessment | [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) | | **DeQA-Doc-Sharpness** | Sharpness/clarity assessment (this model) | [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) | ## Quick Start ```python import torch from transformers import AutoModelForCausalLM from PIL import Image # Load the model model = AutoModelForCausalLM.from_pretrained( "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto", ) # Score an image image = Image.open("document.jpg").convert("RGB") score = model.score([image]) print(f"Sharpness Score: {score.item():.2f} / 5.0") ``` ## What Does Sharpness Quality Measure? The sharpness score evaluates: - **Focus Quality**: How well the document is in focus - **Motion Blur**: Absence of blur from camera/scanner movement - **Text Clarity**: Sharpness of text edges and characters - **Detail Preservation**: Fine details are visible and crisp - **Resolution Quality**: Adequate resolution for the content ## Score Interpretation | Score Range | Quality Level | Typical Issues | |-------------|---------------|----------------| | 4.5 - 5.0 | **Excellent** | Perfectly sharp, crisp text | | 3.5 - 4.5 | **Good** | Slight softness, still very readable | | 2.5 - 3.5 | **Fair** | Noticeable blur, readable with effort | | 1.5 - 2.5 | **Poor** | Significant blur, hard to read | | 1.0 - 1.5 | **Bad** | Severe blur, text illegible | ## Batch Processing ```python images = [ Image.open("doc1.jpg").convert("RGB"), Image.open("doc2.jpg").convert("RGB"), Image.open("doc3.jpg").convert("RGB"), ] scores = model.score(images) for i, score in enumerate(scores): print(f"Document {i+1} Sharpness: {score.item():.2f} / 5.0") ``` ## Use Cases - **OCR Preprocessing**: Filter blurry images before OCR to improve accuracy - **Document Capture QA**: Real-time feedback for mobile document scanning - **Archive Quality Control**: Identify documents needing re-scanning - **Blur Detection**: Automatic detection of out-of-focus captures - **Scanner Maintenance**: Detect scanner focus issues ## Example: OCR Quality Gate ```python import torch from transformers import AutoModelForCausalLM from PIL import Image model = AutoModelForCausalLM.from_pretrained( "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto", ) def check_ocr_readiness(image_path, min_sharpness=3.5): """Check if document is sharp enough for reliable OCR.""" img = Image.open(image_path).convert("RGB") score = model.score([img]).item() if score >= min_sharpness: return True, score, "Ready for OCR" elif score >= 2.5: return False, score, "May produce OCR errors - consider rescanning" else: return False, score, "Too blurry for OCR - rescan required" ready, score, message = check_ocr_readiness("scan.jpg") print(f"Sharpness: {score:.2f}/5.0 - {message}") if ready: # Proceed with OCR pass else: # Request rescan pass ``` ## Example: Batch Quality Sorting ```python import torch from transformers import AutoModelForCausalLM from PIL import Image from pathlib import Path model = AutoModelForCausalLM.from_pretrained( "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto", ) def sort_by_sharpness(image_folder): """Sort documents into quality buckets based on sharpness.""" results = {"excellent": [], "good": [], "fair": [], "poor": [], "bad": []} for img_path in Path(image_folder).glob("*.jpg"): img = Image.open(img_path).convert("RGB") score = model.score([img]).item() if score >= 4.5: results["excellent"].append((img_path, score)) elif score >= 3.5: results["good"].append((img_path, score)) elif score >= 2.5: results["fair"].append((img_path, score)) elif score >= 1.5: results["poor"].append((img_path, score)) else: results["bad"].append((img_path, score)) return results # Usage quality_report = sort_by_sharpness("scanned_docs/") print(f"Excellent: {len(quality_report['excellent'])} documents") print(f"Need rescan: {len(quality_report['poor']) + len(quality_report['bad'])} documents") ``` ## Multi-Dimensional Quality Assessment Combine with other DeQA-Doc models for comprehensive assessment: ```python import torch from transformers import AutoModelForCausalLM from PIL import Image # Load all three models models = { "overall": AutoModelForCausalLM.from_pretrained( "mapo80/DeQA-Doc-Overall", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto" ), "color": AutoModelForCausalLM.from_pretrained( "mapo80/DeQA-Doc-Color", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto" ), "sharpness": AutoModelForCausalLM.from_pretrained( "mapo80/DeQA-Doc-Sharpness", trust_remote_code=True, torch_dtype=torch.float16, device_map="auto" ), } def full_quality_report(image_path): img = Image.open(image_path).convert("RGB") scores = {} for name, model in models.items(): scores[name] = model.score([img]).item() return scores report = full_quality_report("document.jpg") print(f"Overall: {report['overall']:.2f}/5.0") print(f"Color: {report['color']:.2f}/5.0") print(f"Sharpness: {report['sharpness']:.2f}/5.0") ``` ## Model Architecture - **Base Model**: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder) - **Vision Encoder**: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor) - **Language Model**: LLaMA2-7B - **Training**: Full fine-tuning on document sharpness quality datasets - **Input Resolution**: Images are resized to 448x448 (with aspect ratio preservation) ## Technical Details | Property | Value | |----------|-------| | Model Size | ~16 GB (float16) | | Parameters | ~7.2B | | Input | RGB images (any resolution) | | Output | Sharpness quality score (1.0 - 5.0) | | Inference | ~2-3 seconds per image on A100 | ## Hardware Requirements | Setup | VRAM Required | Recommended | |-------|---------------|-------------| | Full precision (fp32) | ~32 GB | A100, H100 | | Half precision (fp16) | ~16 GB | A100, A40, RTX 4090 | | With CPU offload | ~8 GB GPU + RAM | RTX 3090, RTX 4080 | ## Installation ```bash pip install torch transformers accelerate pillow sentencepiece protobuf ``` **Note**: Use `transformers>=4.36.0` for best compatibility. ## Comparison with Traditional Methods | Method | Pros | Cons | |--------|------|------| | **Laplacian Variance** | Fast, simple | Only measures edge intensity | | **FFT-based** | Frequency analysis | Sensitive to image content | | **Gradient-based** | Good for text | Requires tuning | | **DeQA-Doc-Sharpness** | Content-aware, trained on documents | Requires GPU | DeQA-Doc-Sharpness understands document context and can differentiate between intentionally smooth backgrounds and unintentional blur. ## Limitations - Optimized for document images (text, forms, letters) - May not generalize well to natural photos - Requires GPU with sufficient VRAM for efficient inference - Sharpness assessment is relative to training data distribution ## Credits & Attribution This model is based on the **DeQA-Doc** project by Junjie Gao et al., which won the **Championship** in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge. **Original Repository**: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc) All credit for the research, training methodology, and model architecture goes to the original authors. ## Citation If you use this model in your research, please cite the original paper: ```bibtex @inproceedings{deqadoc, title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment}, author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop}, year={2025}, } ``` **ArXiv**: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796) ## License Apache 2.0 ## Related Models - [DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) - Overall quality assessment - [DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) - Color quality assessment