Spaces:
Running
Running
| # Bundled sample documents | |
| Six synthetic, public-domain images covering the document shapes most | |
| real-world OCR pipelines hit: | |
| - `receipt.png`: printed grocery receipt, line items + totals | |
| - `invoice.png`: vendor invoice, multi-column form layout | |
| - `business-card.png`: tight contact card, mixed text sizes | |
| - `table.png`: dense numerical table with totals row | |
| - `handwritten.png`: jittered text that simulates informal handwriting | |
| - `multi-column.png`: two-column newspaper-style layout where reading order | |
| matters | |
| `index.json` carries metadata for each: the GLiNER labels we ask for, plus a | |
| short description shown in the UI. | |
| Regenerate with `python scripts/generate_samples.py`. Pillow is the only | |
| dep; no real customer data is involved. | |