Spaces:
Sleeping
Sleeping
| title: Document Readability Scorer | |
| emoji: 📄 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.13.0 | |
| app_file: app.py | |
| pinned: false | |
| # 📄 Document Readability Scorer | |
| Pre-screen documents before expensive OCR/LLM inference. Upload a document image and get a **readability score** (0–1) with detailed signal breakdown. | |
| ## Signals | |
| | Signal | What it measures | Method | | |
| |--------|-----------------|--------| | |
| | **Sharpness** | Is the text sharp/blurry? | Laplacian variance + FFT high-freq energy | | |
| | **Contrast** | Is text distinguishable from background? | RMS + Michelson contrast | | |
| | **Noise** | How clean is the image? | Immerkær (1996) noise estimation | | |
| | **Text Presence** | Is there text on the page? | MSER regions + Sobel edge density | | |
| | **Brightness** | Is exposure appropriate? | Mean brightness + saturation analysis | | |
| | **Entropy** | Is there information content? | Shannon entropy | | |
| | **Learned IQA** | ML-based quality score | CLIP-IQA via pyiqa | | |
| ## Calibration | |
| Adjust the weight sliders to match your pipeline's sensitivity. Export the config JSON and use it in your Python code. | |
| ## Python Integration | |
| ```python | |
| from document_readability import DocumentReadabilityScorer, ScorerConfig | |
| scorer = DocumentReadabilityScorer() | |
| result = scorer.score("document.png") | |
| if result.ocr_recommended: | |
| run_ocr_pipeline(document) | |
| else: | |
| log_rejected(result.signals) | |
| ``` | |