Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -8,19 +8,32 @@ tags:
|
|
| 8 |
- receipt-extraction
|
| 9 |
pipeline_tag: image-to-text
|
| 10 |
widget:
|
| 11 |
-
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/receipt.jpg
|
| 12 |
-
|
| 13 |
---
|
| 14 |
|
| 15 |
# Receipt Donut (Fine-tuned Document UI)
|
| 16 |
|
| 17 |
This model extracts structured JSON data directly from receipt images without needing a separate OCR engine. Fine-tuned on the `naver-clova-ix/donut-base-finetuned-cord-v2` base model.
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## Model Details
|
| 20 |
- **Architecture:** Donut (Document Understanding Transformer)
|
| 21 |
- **Task:** Image-to-JSON extraction
|
| 22 |
- **Extracted Fields:** `merchant`, `date`, `subtotal`, `tax`, `total`, `address`
|
| 23 |
- **Training Data:** 8,615 heavily augmented receipt images sourced from 8 diverse public datasets (CORD, WildReceipts, SROIE variants, etc.)
|
|
|
|
| 24 |
|
| 25 |
## Try it out!
|
| 26 |
Use the **Hosted Inference API** widget on the right.
|
|
|
|
| 8 |
- receipt-extraction
|
| 9 |
pipeline_tag: image-to-text
|
| 10 |
widget:
|
| 11 |
+
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/receipt.jpg
|
| 12 |
+
example_title: Sample Receipt
|
| 13 |
---
|
| 14 |
|
| 15 |
# Receipt Donut (Fine-tuned Document UI)
|
| 16 |
|
| 17 |
This model extracts structured JSON data directly from receipt images without needing a separate OCR engine. Fine-tuned on the `naver-clova-ix/donut-base-finetuned-cord-v2` base model.
|
| 18 |
|
| 19 |
+
## Training Performance
|
| 20 |
+
The model was trained for 11 epochs on an NVIDIA L4 GPU. Optimal convergence was reached at Epoch 9.
|
| 21 |
+
|
| 22 |
+

|
| 23 |
+
|
| 24 |
+
## Sample Extraction Results
|
| 25 |
+
Below are some examples of the model performing extraction on the validation set (Original Image vs. Model Output).
|
| 26 |
+
|
| 27 |
+

|
| 28 |
+

|
| 29 |
+

|
| 30 |
+
|
| 31 |
## Model Details
|
| 32 |
- **Architecture:** Donut (Document Understanding Transformer)
|
| 33 |
- **Task:** Image-to-JSON extraction
|
| 34 |
- **Extracted Fields:** `merchant`, `date`, `subtotal`, `tax`, `total`, `address`
|
| 35 |
- **Training Data:** 8,615 heavily augmented receipt images sourced from 8 diverse public datasets (CORD, WildReceipts, SROIE variants, etc.)
|
| 36 |
+
- **License:** MIT
|
| 37 |
|
| 38 |
## Try it out!
|
| 39 |
Use the **Hosted Inference API** widget on the right.
|