Awarebeyond commited on
Commit
d1b593c
·
verified ·
1 Parent(s): 35be6de

Update with real evaluation results from 597 test samples

Browse files
Files changed (1) hide show
  1. README.md +16 -12
README.md CHANGED
@@ -22,9 +22,9 @@ widget:
22
  example_title: Sample Receipt
23
  ---
24
 
25
- # 🧾 Receipt Donut — Document Understanding for Students
26
 
27
- > **Built by a student, for students.** This page explains every technical decision so you can understand (and replicate) the full training pipeline.
28
 
29
  This model extracts structured JSON data directly from receipt images **without** needing a separate OCR engine. It is a fine-tuned version of `naver-clova-ix/donut-base-finetuned-cord-v2` trained on 8,615 real-world receipt images.
30
 
@@ -245,25 +245,29 @@ Since this is a **generative text model** (not a classifier), a traditional conf
245
  | ⚠️ **Minor Typo** | < 20% Levenshtein distance | `Starbuks` vs `Starbucks` |
246
  | ❌ **Incorrect** | > 20% distance or missing | `null` vs `Walmart` |
247
 
248
- ### Field-Level Confusion Matrix (Validation Set)
249
 
250
  | Field | Correct | Minor Typo | Incorrect | Notes |
251
  |-------|---------|------------|-----------|-------|
252
- | `merchant` | ~82% | ~10% | ~8% | Handwritten signs are hardest |
253
- | `date` | ~89% | ~5% | ~6% | Very consistent format |
254
- | `subtotal` | ~85% | ~8% | ~7% | Currency symbols sometimes dropped |
255
- | `tax` | ~78% | ~12% | ~10% | Often missing on simple receipts |
256
- | `total` | ~91% | ~5% | ~4% | Usually the largest, most visible number |
257
- | `address` | ~65% | ~15% | ~20% | Multi-line text is hardest |
 
 
258
 
259
  ### Overall Performance
260
 
261
  ```
262
- Exact Match (all fields correct): ~55%
263
- Usable Match (≤1 minor typo): ~78%
264
- Any Incorrect Field: ~22%
265
  ```
266
 
 
 
267
  > **Why is Exact Match only 55%?** Receipt OCR is genuinely hard. Even human transcribers disagree on exact formatting (e.g., `$13.63` vs `13.63` vs `13.63 USD`). The model is still highly useful — 78% of receipts are "usable" with at most one small typo.
268
 
269
  ### Generating the Confusion Matrix Yourself
 
22
  example_title: Sample Receipt
23
  ---
24
 
25
+ # 🧾 Receipt Donut — Complete Document for Understanding
26
 
27
+ > **Welcome!** This page explains every technical decision so you can understand (and replicate) the full training pipeline.
28
 
29
  This model extracts structured JSON data directly from receipt images **without** needing a separate OCR engine. It is a fine-tuned version of `naver-clova-ix/donut-base-finetuned-cord-v2` trained on 8,615 real-world receipt images.
30
 
 
245
  | ⚠️ **Minor Typo** | < 20% Levenshtein distance | `Starbuks` vs `Starbucks` |
246
  | ❌ **Incorrect** | > 20% distance or missing | `null` vs `Walmart` |
247
 
248
+ ### Field-Level Confusion Matrix (Test Set — 597 Samples)
249
 
250
  | Field | Correct | Minor Typo | Incorrect | Notes |
251
  |-------|---------|------------|-----------|-------|
252
+ | `merchant` | **70.9%** (423/597) | 8.5% (51) | 20.6% (123) | Store names vary wildly in format |
253
+ | `date` | **86.9%** (519/597) | 1.0% (6) | 12.1% (72) | Highly consistent format |
254
+ | `subtotal` | **71.7%** (428/597) | 2.3% (14) | 26.0% (155) | Often missing on simple receipts |
255
+ | `tax` | **86.4%** (516/597) | 0.0% (0) | 13.6% (81) | Usually present when subtotal is |
256
+ | `total` | **47.4%** (283/597) | 7.9% (47) | 44.7% (267) | **Hardest field** model confuses it with subtotal |
257
+ | `address` | **100.0%** (597/597) | 0.0% (0) | 0.0% (0) | Not present in this test set; model correctly abstains |
258
+
259
+ ![Field Confusion Matrix](hub_assets/field_confusion_matrix.png)
260
 
261
  ### Overall Performance
262
 
263
  ```
264
+ Exact Match (all fields correct): 32.8% (196/597)
265
+ Usable Match (≤1 minor typo): 61.1% (365/597)
266
+ Any Incorrect Field: 38.9% (232/597)
267
  ```
268
 
269
+ > **Key insight:** The `total` field is the model's biggest weakness at 47.4% correct. This is because `total` and `subtotal` are visually similar numbers on receipts, and the model sometimes swaps them. Improving this would require stronger positional cues or a post-processing rule (always pick the larger number).
270
+
271
  > **Why is Exact Match only 55%?** Receipt OCR is genuinely hard. Even human transcribers disagree on exact formatting (e.g., `$13.63` vs `13.63` vs `13.63 USD`). The model is still highly useful — 78% of receipts are "usable" with at most one small typo.
272
 
273
  ### Generating the Confusion Matrix Yourself