|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- receipt |
|
|
- document-ai |
|
|
- layoutlmv3 |
|
|
- token-classification |
|
|
- ner |
|
|
- information-extraction |
|
|
datasets: |
|
|
- custom-receipt-dataset |
|
|
metrics: |
|
|
- accuracy |
|
|
widget: |
|
|
- text: "STORE_NAME Date: 2024-01-01 Total: 25.99" |
|
|
--- |
|
|
|
|
|
# LayoutLMv3 Receipt Parser |
|
|
|
|
|
A fine-tuned LayoutLMv3 model for extracting structured information from receipt images with **89.34% validation accuracy**. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model**: `albertosei/layoutlmv3-receipt-parser` |
|
|
- **Architecture**: LayoutLMv3-base |
|
|
- **Task**: Token Classification (Named Entity Recognition) |
|
|
- **Languages**: English |
|
|
- **Training Data**: 1,426 receipt samples |
|
|
- **Validation**: 100 samples |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Final Validation Accuracy** | **89.34%** | |
|
|
| Training Loss (Epoch 1) | 0.6824 | |
|
|
| Training Loss (Epoch 2) | 0.3278 | |
|
|
| Validation Accuracy (Epoch 1) | 83.49% | |
|
|
| Number of Entity Labels | 51 | |
|
|
|
|
|
## Entity Labels |
|
|
|
|
|
The model recognizes 25 entity types in BIO format: |
|
|
|
|
|
### Vendor Information |
|
|
- `vendor_name` - Store/business name |
|
|
- `vendor_address` - Physical address |
|
|
- `vendor_phone_number` - Contact number |
|
|
|
|
|
### Date & Time |
|
|
- `date` - Transaction date |
|
|
- `time` - Transaction time |
|
|
|
|
|
### Receipt Details |
|
|
- `receipt_id` - Receipt number/identifier |
|
|
- `currency` - Currency type |
|
|
|
|
|
### Financial Amounts |
|
|
- `total_amount` - Final total |
|
|
- `subtotal_amount` - Subtotal before tax |
|
|
- `tax_amount` - Tax amount |
|
|
- `service_charge_amount` - Service fees |
|
|
- `discount_amount` - Discounts applied |
|
|
- `tip_amount` - Tip/gratuity |
|
|
|
|
|
### Payment Information |
|
|
- `cash_paid_amount` - Cash payment |
|
|
- `change_amount` - Change returned |
|
|
- `credit_card_amount` - Credit card payment |
|
|
- `e_money_amount` - Electronic payment |
|
|
- `payment_method` - Payment type |
|
|
|
|
|
### Line Items |
|
|
- `line_item_name` - Product/service name |
|
|
- `line_item_quantity` - Quantity purchased |
|
|
- `line_item_unit_price` - Price per unit |
|
|
- `line_item_total_price` - Line item total |
|
|
- `line_item_discount_amount` - Item-level discount |
|
|
- `line_item_vat_status` - VAT information |
|
|
|
|
|
### Other |
|
|
- `other` - Miscellaneous information |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForTokenClassification |
|
|
from PIL import Image |
|
|
import torch |
|
|
|
|
|
# Load model and processor |
|
|
processor = AutoProcessor.from_pretrained("albertosei/layoutlmv3-receipt-parser", apply_ocr=False) |
|
|
model = AutoModelForTokenClassification.from_pretrained("albertosei/layoutlmv3-receipt-parser") |
|
|
|
|
|
# Prepare inputs (requires external OCR for text and bounding boxes) |
|
|
image = Image.open("receipt.jpg").convert("RGB") |
|
|
words = ["STORE", "NAME", "Date:", "2024-01-01", "Total:", "25.99"] # From OCR |
|
|
boxes = [[0, 0, 100, 20], [100, 0, 200, 20], [0, 20, 50, 40], |
|
|
[50, 20, 150, 40], [0, 40, 50, 60], [50, 40, 150, 60]] # From OCR |
|
|
|
|
|
# Process and predict |
|
|
encoding = processor(image, words, boxes=boxes, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
outputs = model(**encoding) |
|
|
predictions = outputs.logits.argmax(-1).squeeze().tolist() |
|
|
|
|
|
# Convert to labels |
|
|
predicted_labels = [model.config.id2label[pred] for pred in predictions] |