Spaces:
Running
Running
Bundled sample documents
Six synthetic, public-domain images covering the document shapes most real-world OCR pipelines hit:
receipt.png: printed grocery receipt, line items + totalsinvoice.png: vendor invoice, multi-column form layoutbusiness-card.png: tight contact card, mixed text sizestable.png: dense numerical table with totals rowhandwritten.png: jittered text that simulates informal handwritingmulti-column.png: two-column newspaper-style layout where reading order matters
index.json carries metadata for each: the GLiNER labels we ask for, plus a
short description shown in the UI.
Regenerate with python scripts/generate_samples.py. Pillow is the only
dep; no real customer data is involved.