--- title: Document Readability Scorer emoji: 📄 colorFrom: blue colorTo: green sdk: gradio sdk_version: 6.13.0 app_file: app.py pinned: false --- # 📄 Document Readability Scorer Pre-screen documents before expensive OCR/LLM inference. Upload a document image and get a **readability score** (0–1) with detailed signal breakdown. ## Signals | Signal | What it measures | Method | |--------|-----------------|--------| | **Sharpness** | Is the text sharp/blurry? | Laplacian variance + FFT high-freq energy | | **Contrast** | Is text distinguishable from background? | RMS + Michelson contrast | | **Noise** | How clean is the image? | Immerkær (1996) noise estimation | | **Text Presence** | Is there text on the page? | MSER regions + Sobel edge density | | **Brightness** | Is exposure appropriate? | Mean brightness + saturation analysis | | **Entropy** | Is there information content? | Shannon entropy | | **Learned IQA** | ML-based quality score | CLIP-IQA via pyiqa | ## Calibration Adjust the weight sliders to match your pipeline's sensitivity. Export the config JSON and use it in your Python code. ## Python Integration ```python from document_readability import DocumentReadabilityScorer, ScorerConfig scorer = DocumentReadabilityScorer() result = scorer.score("document.png") if result.ocr_recommended: run_ocr_pipeline(document) else: log_rejected(result.signals) ```