Spaces:

superlinked
/

document-ocr

Running

Initial: document-ocr demo for HF Spaces

ffe59ba 7 days ago

746 Bytes

	# Bundled sample documents

	Six synthetic, public-domain images covering the document shapes most
	real-world OCR pipelines hit:

	- `receipt.png`: printed grocery receipt, line items + totals
	- `invoice.png`: vendor invoice, multi-column form layout
	- `business-card.png`: tight contact card, mixed text sizes
	- `table.png`: dense numerical table with totals row
	- `handwritten.png`: jittered text that simulates informal handwriting
	- `multi-column.png`: two-column newspaper-style layout where reading order
	matters

	`index.json` carries metadata for each: the GLiNER labels we ask for, plus a
	short description shown in the UI.

	Regenerate with `python scripts/generate_samples.py`. Pillow is the only
	dep; no real customer data is involved.