mkdigitalgmbh
/

runpo-LayoutLM3-Invoice-Receipt

+---
+language: en
+license: apache-2.0
+tags:
+  - document-ai
+  - layoutlmv3
+  - token-classification
+  - receipt-extraction
+  - invoice-extraction
+  - base-model
+datasets:
+  - custom
+metrics:
+  - f1
+  - precision
+  - recall
+---
+# layoutlmv3-receipt-invoice
+LayoutLMv3 model initialized for receipt and invoice field extraction.
+## Model Status
+⚠️ **This is an initialized base model** - not yet fine-tuned on custom data.
+- **Base Model**: `microsoft/layoutlmv3-base`
+- **Status**: Ready for deployment and fine-tuning
+- **Custom Labels**: Configured for receipt/invoice field extraction
+## Intended Use
+This model is configured to extract the following fields from receipts and invoices:
+### Supported Fields
+[
+  "O",
+  "B-MerchantName",
+  "I-MerchantName",
+  "B-MerchantAddress",
+  "I-MerchantAddress",
+  "B-TransactionDate",
+  "I-TransactionDate",
+  "B-Currency",
+  "I-Currency",
+  "B-Total",
+  "I-Total",
+  "B-TotalTax",
+  "I-TotalTax",
+  "B-InvoiceNumber",
+  "I-InvoiceNumber",
+  "B-Subtotal",
+  "I-Subtotal",
+  "B-LineItems",
+  "I-LineItems"
+]
+## Training Status
+This repository contains:
+- ✅ Base LayoutLMv3 architecture
+- ✅ Custom label configuration for receipts/invoices
+- ⏳ **Not yet fine-tuned** - using pre-trained weights from `microsoft/layoutlmv3-base`
+### Training the Model
+To fine-tune this model on your custom data:
+```bash
+# On RunPod GPU pod or local machine with GPU
+python main.py --mode train --push-to-hub --version v1.0
+```
+This will:
+1. Train on your labeled receipt/invoice data
+2. Update this repository with fine-tuned weights
+3. Tag the trained version (e.g., v1.0, v1.1, etc.)
+## Usage
+### Local Inference
+```python
+from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor
+from PIL import Image
+# Load model and processor
+model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt")
+processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False)
+# Prepare inputs (you need OCR results: words and bounding boxes)
+image = Image.open("receipt.jpg").convert("RGB")
+words = ["STORE", "NAME", "Total:", "$10.99"]
+boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]]
+# Normalize boxes to 0-1000 range
+width, height = image.size
+normalized_boxes = [[int(1000*x0/width), int(1000*y0/height),
+                      int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes]
+encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt")
+outputs = model(**encoding)
+predictions = outputs.logits.argmax(-1)
+```
+### RunPod Serverless Deployment
+This model is designed for deployment on RunPod Serverless:
+1. **Build and push Docker image:**
+   ```bash
+   cd deployment/runpod/LayoutLMv3
+   python deploy.py --action deploy
+   ```
+2. **Create RunPod endpoint:**
+   - Docker Image: `registry.hf.space/your-username/layoutlmv3-inference:latest`
+   - Environment Variables:
+     - `HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt`
+     - `HF_TOKEN=<your-token>`
+     - `MODEL_VERSION=main` (or specific version tag after training)
+## Model Architecture
+- **Base**: microsoft/layoutlmv3-base
+- **Task**: Token Classification
+- **Input**: Image + Words + Bounding Boxes
+- **Output**: Field labels (IOB tagging scheme)
+- **Number of Labels**: 19
+## Label Schema
+The model uses IOB (Inside-Outside-Beginning) tagging:
+- **O**: Outside any field
+- **B-FieldName**: Beginning of a field
+- **I-FieldName**: Inside/continuation of a field
+### Example
+```
+Text:        ["Total:", "$", "10", ".", "99"]
+Labels:      ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"]
+Extracted:   Total: "$ 10 . 99"
+```
+## Version History
+| Version | Date | Description | Status |
+|---------|------|-------------|--------|
+| main | 2025-11-13 | Initialized with base model + custom labels | Base (not trained) |
+After training, versions will be tagged (v1.0, v1.1, etc.).
+## Training Configuration
+When training is performed, the following configuration will be used:
+```python
+{
+  "model_name": "microsoft/layoutlmv3-base",
+  "learning_rate": 5e-05,
+  "batch_size": 4,
+  "num_epochs": 20,
+  "warmup_steps": 500,
+  "max_length": 512,
+  "validation_split": 0.2,
+  "random_seed": 42,
+  "gradient_accumulation_steps": 2,
+  "eval_steps": 100,
+  "save_steps": 500,
+  "logging_steps": 50
+}
+```
+## Citation
+```bibtex
+@misc{layoutlmv3-receipt-invoice,
+  author = {MK Digital GmbH},
+  title = {LayoutLMv3 Receipt/Invoice Field Extraction},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}}
+}
+@article{huang2022layoutlmv3,
+  title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
+  author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu},
+  journal={arXiv preprint arXiv:2204.08387},
+  year={2022}
+}
+```
+## License
+Apache 2.0
+## Contact
+For questions or issues, please open an issue in the repository.