Spaces:
Sleeping
Sleeping
File size: 1,408 Bytes
f3a7e7a f74d4a8 f3a7e7a f74d4a8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ---
title: Document Readability Scorer
emoji: 📄
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: false
---
# 📄 Document Readability Scorer
Pre-screen documents before expensive OCR/LLM inference. Upload a document image and get a **readability score** (0–1) with detailed signal breakdown.
## Signals
| Signal | What it measures | Method |
|--------|-----------------|--------|
| **Sharpness** | Is the text sharp/blurry? | Laplacian variance + FFT high-freq energy |
| **Contrast** | Is text distinguishable from background? | RMS + Michelson contrast |
| **Noise** | How clean is the image? | Immerkær (1996) noise estimation |
| **Text Presence** | Is there text on the page? | MSER regions + Sobel edge density |
| **Brightness** | Is exposure appropriate? | Mean brightness + saturation analysis |
| **Entropy** | Is there information content? | Shannon entropy |
| **Learned IQA** | ML-based quality score | CLIP-IQA via pyiqa |
## Calibration
Adjust the weight sliders to match your pipeline's sensitivity. Export the config JSON and use it in your Python code.
## Python Integration
```python
from document_readability import DocumentReadabilityScorer, ScorerConfig
scorer = DocumentReadabilityScorer()
result = scorer.score("document.png")
if result.ocr_recommended:
run_ocr_pipeline(document)
else:
log_rejected(result.signals)
```
|