Awarebeyond
/

receipt-donut

@@ -22,9 +22,9 @@ widget:
     example_title: Sample Receipt
 ---
-# 🧾 Receipt Donut — Document Understanding for Students
-> **Built by a student, for students.** This page explains every technical decision so you can understand (and replicate) the full training pipeline.
 This model extracts structured JSON data directly from receipt images **without** needing a separate OCR engine. It is a fine-tuned version of `naver-clova-ix/donut-base-finetuned-cord-v2` trained on 8,615 real-world receipt images.
@@ -245,25 +245,29 @@ Since this is a **generative text model** (not a classifier), a traditional conf
 | ⚠️ **Minor Typo** | < 20% Levenshtein distance | `Starbuks` vs `Starbucks` |
 | ❌ **Incorrect** | > 20% distance or missing | `null` vs `Walmart` |
-### Field-Level Confusion Matrix (Validation Set)
 | Field | Correct | Minor Typo | Incorrect | Notes |
 |-------|---------|------------|-----------|-------|
-| `merchant` | ~82% | ~10% | ~8% | Handwritten signs are hardest |
-| `date` | ~89% | ~5% | ~6% | Very consistent format |
-| `subtotal` | ~85% | ~8% | ~7% | Currency symbols sometimes dropped |
-| `tax` | ~78% | ~12% | ~10% | Often missing on simple receipts |
-| `total` | ~91% | ~5% | ~4% | Usually the largest, most visible number |
-| `address` | ~65% | ~15% | ~20% | Multi-line text is hardest |
 ### Overall Performance
 ```
-Exact Match (all fields correct): ~55%
-Usable Match (≤1 minor typo):     ~78%
-Any Incorrect Field:              ~22%
 ```
 > **Why is Exact Match only 55%?** Receipt OCR is genuinely hard. Even human transcribers disagree on exact formatting (e.g., `$13.63` vs `13.63` vs `13.63 USD`). The model is still highly useful — 78% of receipts are "usable" with at most one small typo.
 ### Generating the Confusion Matrix Yourself

     example_title: Sample Receipt
 ---
+# 🧾 Receipt Donut — Complete Document for Understanding
+> **Welcome!** This page explains every technical decision so you can understand (and replicate) the full training pipeline.
 This model extracts structured JSON data directly from receipt images **without** needing a separate OCR engine. It is a fine-tuned version of `naver-clova-ix/donut-base-finetuned-cord-v2` trained on 8,615 real-world receipt images.
 | ⚠️ **Minor Typo** | < 20% Levenshtein distance | `Starbuks` vs `Starbucks` |
 | ❌ **Incorrect** | > 20% distance or missing | `null` vs `Walmart` |
+### Field-Level Confusion Matrix (Test Set — 597 Samples)
 | Field | Correct | Minor Typo | Incorrect | Notes |
 |-------|---------|------------|-----------|-------|
+| `merchant` | **70.9%** (423/597) | 8.5% (51) | 20.6% (123) | Store names vary wildly in format |
+| `date` | **86.9%** (519/597) | 1.0% (6) | 12.1% (72) | Highly consistent format |
+| `subtotal` | **71.7%** (428/597) | 2.3% (14) | 26.0% (155) | Often missing on simple receipts |
+| `tax` | **86.4%** (516/597) | 0.0% (0) | 13.6% (81) | Usually present when subtotal is |
+| `total` | **47.4%** (283/597) | 7.9% (47) | 44.7% (267) | **Hardest field** — model confuses it with subtotal |
+| `address` | **100.0%** (597/597) | 0.0% (0) | 0.0% (0) | Not present in this test set; model correctly abstains |
+![Field Confusion Matrix](hub_assets/field_confusion_matrix.png)
 ### Overall Performance
 ```
+Exact Match (all fields correct): 32.8% (196/597)
+Usable Match (≤1 minor typo):    61.1% (365/597)
+Any Incorrect Field:             38.9% (232/597)
 ```
+> **Key insight:** The `total` field is the model's biggest weakness at 47.4% correct. This is because `total` and `subtotal` are visually similar numbers on receipts, and the model sometimes swaps them. Improving this would require stronger positional cues or a post-processing rule (always pick the larger number).
 > **Why is Exact Match only 55%?** Receipt OCR is genuinely hard. Even human transcribers disagree on exact formatting (e.g., `$13.63` vs `13.63` vs `13.63 USD`). The model is still highly useful — 78% of receipts are "usable" with at most one small typo.
 ### Generating the Confusion Matrix Yourself