Update with real evaluation results from 597 test samples
Browse files
README.md
CHANGED
|
@@ -22,9 +22,9 @@ widget:
|
|
| 22 |
example_title: Sample Receipt
|
| 23 |
---
|
| 24 |
|
| 25 |
-
# 🧾 Receipt Donut — Document
|
| 26 |
|
| 27 |
-
> **
|
| 28 |
|
| 29 |
This model extracts structured JSON data directly from receipt images **without** needing a separate OCR engine. It is a fine-tuned version of `naver-clova-ix/donut-base-finetuned-cord-v2` trained on 8,615 real-world receipt images.
|
| 30 |
|
|
@@ -245,25 +245,29 @@ Since this is a **generative text model** (not a classifier), a traditional conf
|
|
| 245 |
| ⚠️ **Minor Typo** | < 20% Levenshtein distance | `Starbuks` vs `Starbucks` |
|
| 246 |
| ❌ **Incorrect** | > 20% distance or missing | `null` vs `Walmart` |
|
| 247 |
|
| 248 |
-
### Field-Level Confusion Matrix (
|
| 249 |
|
| 250 |
| Field | Correct | Minor Typo | Incorrect | Notes |
|
| 251 |
|-------|---------|------------|-----------|-------|
|
| 252 |
-
| `merchant` |
|
| 253 |
-
| `date` |
|
| 254 |
-
| `subtotal` |
|
| 255 |
-
| `tax` |
|
| 256 |
-
| `total` |
|
| 257 |
-
| `address` |
|
|
|
|
|
|
|
| 258 |
|
| 259 |
### Overall Performance
|
| 260 |
|
| 261 |
```
|
| 262 |
-
Exact Match (all fields correct):
|
| 263 |
-
Usable Match (≤1 minor typo):
|
| 264 |
-
Any Incorrect Field:
|
| 265 |
```
|
| 266 |
|
|
|
|
|
|
|
| 267 |
> **Why is Exact Match only 55%?** Receipt OCR is genuinely hard. Even human transcribers disagree on exact formatting (e.g., `$13.63` vs `13.63` vs `13.63 USD`). The model is still highly useful — 78% of receipts are "usable" with at most one small typo.
|
| 268 |
|
| 269 |
### Generating the Confusion Matrix Yourself
|
|
|
|
| 22 |
example_title: Sample Receipt
|
| 23 |
---
|
| 24 |
|
| 25 |
+
# 🧾 Receipt Donut — Complete Document for Understanding
|
| 26 |
|
| 27 |
+
> **Welcome!** This page explains every technical decision so you can understand (and replicate) the full training pipeline.
|
| 28 |
|
| 29 |
This model extracts structured JSON data directly from receipt images **without** needing a separate OCR engine. It is a fine-tuned version of `naver-clova-ix/donut-base-finetuned-cord-v2` trained on 8,615 real-world receipt images.
|
| 30 |
|
|
|
|
| 245 |
| ⚠️ **Minor Typo** | < 20% Levenshtein distance | `Starbuks` vs `Starbucks` |
|
| 246 |
| ❌ **Incorrect** | > 20% distance or missing | `null` vs `Walmart` |
|
| 247 |
|
| 248 |
+
### Field-Level Confusion Matrix (Test Set — 597 Samples)
|
| 249 |
|
| 250 |
| Field | Correct | Minor Typo | Incorrect | Notes |
|
| 251 |
|-------|---------|------------|-----------|-------|
|
| 252 |
+
| `merchant` | **70.9%** (423/597) | 8.5% (51) | 20.6% (123) | Store names vary wildly in format |
|
| 253 |
+
| `date` | **86.9%** (519/597) | 1.0% (6) | 12.1% (72) | Highly consistent format |
|
| 254 |
+
| `subtotal` | **71.7%** (428/597) | 2.3% (14) | 26.0% (155) | Often missing on simple receipts |
|
| 255 |
+
| `tax` | **86.4%** (516/597) | 0.0% (0) | 13.6% (81) | Usually present when subtotal is |
|
| 256 |
+
| `total` | **47.4%** (283/597) | 7.9% (47) | 44.7% (267) | **Hardest field** — model confuses it with subtotal |
|
| 257 |
+
| `address` | **100.0%** (597/597) | 0.0% (0) | 0.0% (0) | Not present in this test set; model correctly abstains |
|
| 258 |
+
|
| 259 |
+

|
| 260 |
|
| 261 |
### Overall Performance
|
| 262 |
|
| 263 |
```
|
| 264 |
+
Exact Match (all fields correct): 32.8% (196/597)
|
| 265 |
+
Usable Match (≤1 minor typo): 61.1% (365/597)
|
| 266 |
+
Any Incorrect Field: 38.9% (232/597)
|
| 267 |
```
|
| 268 |
|
| 269 |
+
> **Key insight:** The `total` field is the model's biggest weakness at 47.4% correct. This is because `total` and `subtotal` are visually similar numbers on receipts, and the model sometimes swaps them. Improving this would require stronger positional cues or a post-processing rule (always pick the larger number).
|
| 270 |
+
|
| 271 |
> **Why is Exact Match only 55%?** Receipt OCR is genuinely hard. Even human transcribers disagree on exact formatting (e.g., `$13.63` vs `13.63` vs `13.63 USD`). The model is still highly useful — 78% of receipts are "usable" with at most one small typo.
|
| 272 |
|
| 273 |
### Generating the Confusion Matrix Yourself
|