Spaces:

rbaks
/

document-readability-scorer

Sleeping

App Files Files Community

document-readability-scorer / README.md

rbaks

Upload README.md

f74d4a8 verified 28 days ago

preview code

raw

history blame contribute delete

1.41 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Document Readability Scorer
emoji: 📄
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: false

📄 Document Readability Scorer

Pre-screen documents before expensive OCR/LLM inference. Upload a document image and get a readability score (0–1) with detailed signal breakdown.

Signals

Signal	What it measures	Method
Sharpness	Is the text sharp/blurry?	Laplacian variance + FFT high-freq energy
Contrast	Is text distinguishable from background?	RMS + Michelson contrast
Noise	How clean is the image?	Immerkær (1996) noise estimation
Text Presence	Is there text on the page?	MSER regions + Sobel edge density
Brightness	Is exposure appropriate?	Mean brightness + saturation analysis
Entropy	Is there information content?	Shannon entropy
Learned IQA	ML-based quality score	CLIP-IQA via pyiqa

Calibration

Adjust the weight sliders to match your pipeline's sensitivity. Export the config JSON and use it in your Python code.

Python Integration

from document_readability import DocumentReadabilityScorer, ScorerConfig

scorer = DocumentReadabilityScorer()
result = scorer.score("document.png")

if result.ocr_recommended:
    run_ocr_pipeline(document)
else:
    log_rejected(result.signals)