File size: 746 Bytes
ffe59ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Bundled sample documents

Six synthetic, public-domain images covering the document shapes most
real-world OCR pipelines hit:

- `receipt.png`: printed grocery receipt, line items + totals
- `invoice.png`: vendor invoice, multi-column form layout
- `business-card.png`: tight contact card, mixed text sizes
- `table.png`: dense numerical table with totals row
- `handwritten.png`: jittered text that simulates informal handwriting
- `multi-column.png`: two-column newspaper-style layout where reading order
  matters

`index.json` carries metadata for each: the GLiNER labels we ask for, plus a
short description shown in the UI.

Regenerate with `python scripts/generate_samples.py`. Pillow is the only
dep; no real customer data is involved.