mkdigitalgmbh commited on
Commit
f522508
·
verified ·
1 Parent(s): 710f14c

Add model card

Browse files
Files changed (1) hide show
  1. README.md +202 -0
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - document-ai
6
+ - layoutlmv3
7
+ - token-classification
8
+ - receipt-extraction
9
+ - invoice-extraction
10
+ - base-model
11
+ datasets:
12
+ - custom
13
+ metrics:
14
+ - f1
15
+ - precision
16
+ - recall
17
+ ---
18
+
19
+ # layoutlmv3-receipt-invoice
20
+
21
+ LayoutLMv3 model initialized for receipt and invoice field extraction.
22
+
23
+ ## Model Status
24
+
25
+ ⚠️ **This is an initialized base model** - not yet fine-tuned on custom data.
26
+
27
+ - **Base Model**: `microsoft/layoutlmv3-base`
28
+ - **Status**: Ready for deployment and fine-tuning
29
+ - **Custom Labels**: Configured for receipt/invoice field extraction
30
+
31
+ ## Intended Use
32
+
33
+ This model is configured to extract the following fields from receipts and invoices:
34
+
35
+ ### Supported Fields
36
+
37
+ [
38
+ "O",
39
+ "B-MerchantName",
40
+ "I-MerchantName",
41
+ "B-MerchantAddress",
42
+ "I-MerchantAddress",
43
+ "B-TransactionDate",
44
+ "I-TransactionDate",
45
+ "B-Currency",
46
+ "I-Currency",
47
+ "B-Total",
48
+ "I-Total",
49
+ "B-TotalTax",
50
+ "I-TotalTax",
51
+ "B-InvoiceNumber",
52
+ "I-InvoiceNumber",
53
+ "B-Subtotal",
54
+ "I-Subtotal",
55
+ "B-LineItems",
56
+ "I-LineItems"
57
+ ]
58
+
59
+ ## Training Status
60
+
61
+ This repository contains:
62
+ - ✅ Base LayoutLMv3 architecture
63
+ - ✅ Custom label configuration for receipts/invoices
64
+ - ⏳ **Not yet fine-tuned** - using pre-trained weights from `microsoft/layoutlmv3-base`
65
+
66
+ ### Training the Model
67
+
68
+ To fine-tune this model on your custom data:
69
+
70
+ ```bash
71
+ # On RunPod GPU pod or local machine with GPU
72
+ python main.py --mode train --push-to-hub --version v1.0
73
+ ```
74
+
75
+ This will:
76
+ 1. Train on your labeled receipt/invoice data
77
+ 2. Update this repository with fine-tuned weights
78
+ 3. Tag the trained version (e.g., v1.0, v1.1, etc.)
79
+
80
+ ## Usage
81
+
82
+ ### Local Inference
83
+
84
+ ```python
85
+ from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor
86
+ from PIL import Image
87
+
88
+ # Load model and processor
89
+ model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt")
90
+ processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False)
91
+
92
+ # Prepare inputs (you need OCR results: words and bounding boxes)
93
+ image = Image.open("receipt.jpg").convert("RGB")
94
+ words = ["STORE", "NAME", "Total:", "$10.99"]
95
+ boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]]
96
+
97
+ # Normalize boxes to 0-1000 range
98
+ width, height = image.size
99
+ normalized_boxes = [[int(1000*x0/width), int(1000*y0/height),
100
+ int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes]
101
+
102
+ encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt")
103
+ outputs = model(**encoding)
104
+ predictions = outputs.logits.argmax(-1)
105
+ ```
106
+
107
+ ### RunPod Serverless Deployment
108
+
109
+ This model is designed for deployment on RunPod Serverless:
110
+
111
+ 1. **Build and push Docker image:**
112
+ ```bash
113
+ cd deployment/runpod/LayoutLMv3
114
+ python deploy.py --action deploy
115
+ ```
116
+
117
+ 2. **Create RunPod endpoint:**
118
+ - Docker Image: `registry.hf.space/your-username/layoutlmv3-inference:latest`
119
+ - Environment Variables:
120
+ - `HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt`
121
+ - `HF_TOKEN=<your-token>`
122
+ - `MODEL_VERSION=main` (or specific version tag after training)
123
+
124
+ ## Model Architecture
125
+
126
+ - **Base**: microsoft/layoutlmv3-base
127
+ - **Task**: Token Classification
128
+ - **Input**: Image + Words + Bounding Boxes
129
+ - **Output**: Field labels (IOB tagging scheme)
130
+ - **Number of Labels**: 19
131
+
132
+ ## Label Schema
133
+
134
+ The model uses IOB (Inside-Outside-Beginning) tagging:
135
+
136
+ - **O**: Outside any field
137
+ - **B-FieldName**: Beginning of a field
138
+ - **I-FieldName**: Inside/continuation of a field
139
+
140
+ ### Example
141
+
142
+ ```
143
+ Text: ["Total:", "$", "10", ".", "99"]
144
+ Labels: ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"]
145
+ Extracted: Total: "$ 10 . 99"
146
+ ```
147
+
148
+ ## Version History
149
+
150
+ | Version | Date | Description | Status |
151
+ |---------|------|-------------|--------|
152
+ | main | 2025-11-13 | Initialized with base model + custom labels | Base (not trained) |
153
+
154
+ After training, versions will be tagged (v1.0, v1.1, etc.).
155
+
156
+ ## Training Configuration
157
+
158
+ When training is performed, the following configuration will be used:
159
+
160
+ ```python
161
+ {
162
+ "model_name": "microsoft/layoutlmv3-base",
163
+ "learning_rate": 5e-05,
164
+ "batch_size": 4,
165
+ "num_epochs": 20,
166
+ "warmup_steps": 500,
167
+ "max_length": 512,
168
+ "validation_split": 0.2,
169
+ "random_seed": 42,
170
+ "gradient_accumulation_steps": 2,
171
+ "eval_steps": 100,
172
+ "save_steps": 500,
173
+ "logging_steps": 50
174
+ }
175
+ ```
176
+
177
+ ## Citation
178
+
179
+ ```bibtex
180
+ @misc{layoutlmv3-receipt-invoice,
181
+ author = {MK Digital GmbH},
182
+ title = {LayoutLMv3 Receipt/Invoice Field Extraction},
183
+ year = {2025},
184
+ publisher = {Hugging Face},
185
+ howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}}
186
+ }
187
+
188
+ @article{huang2022layoutlmv3,
189
+ title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
190
+ author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu},
191
+ journal={arXiv preprint arXiv:2204.08387},
192
+ year={2022}
193
+ }
194
+ ```
195
+
196
+ ## License
197
+
198
+ Apache 2.0
199
+
200
+ ## Contact
201
+
202
+ For questions or issues, please open an issue in the repository.