Spaces:

rbaks
/

document-readability-scorer

Sleeping

Upload README.md

f74d4a8 verified about 1 month ago

1.41 kB

	---
	title: Document Readability Scorer
	emoji: 📄
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 6.13.0
	app_file: app.py
	pinned: false
	---

	# 📄 Document Readability Scorer

	Pre-screen documents before expensive OCR/LLM inference. Upload a document image and get a readability score (0–1) with detailed signal breakdown.

	## Signals

	\| Signal \| What it measures \| Method \|
	\|--------\|-----------------\|--------\|
	\| Sharpness \| Is the text sharp/blurry? \| Laplacian variance + FFT high-freq energy \|
	\| Contrast \| Is text distinguishable from background? \| RMS + Michelson contrast \|
	\| Noise \| How clean is the image? \| Immerkær (1996) noise estimation \|
	\| Text Presence \| Is there text on the page? \| MSER regions + Sobel edge density \|
	\| Brightness \| Is exposure appropriate? \| Mean brightness + saturation analysis \|
	\| Entropy \| Is there information content? \| Shannon entropy \|
	\| Learned IQA \| ML-based quality score \| CLIP-IQA via pyiqa \|

	## Calibration

	Adjust the weight sliders to match your pipeline's sensitivity. Export the config JSON and use it in your Python code.

	## Python Integration

	```python
	from document_readability import DocumentReadabilityScorer, ScorerConfig

	scorer = DocumentReadabilityScorer()
	result = scorer.score("document.png")

	if result.ocr_recommended:
	run_ocr_pipeline(document)
	else:
	log_rejected(result.signals)
	```