Instructions to use rhlprj/invoice-layoutlmv3-multidomain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rhlprj/invoice-layoutlmv3-multidomain with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="rhlprj/invoice-layoutlmv3-multidomain")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("rhlprj/invoice-layoutlmv3-multidomain", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Invoice LayoutLMv3 Multi-Domain Field Extraction
Five fine-tuned LayoutLMv3-base models for extracting structured fields from invoice/document images. Each domain has its own token-classification head with domain-specific BIO labels.
Trained on custom synthetic invoice data.
Domains
| Domain | Fields | Description |
|---|---|---|
general |
13 scalar + line items | Standard business invoices |
receipt |
7 scalar + line items | POS / thermal receipts |
medical |
16 scalar + procedures | Hospital bills |
insurance |
22 scalar + items | Insurance EOB / claims |
logistics |
22 scalar + charges | Freight / shipping invoices |
Quick Start
The easiest way to use this model is via the included inference_example.py:
pip install transformers torch easyocr huggingface_hub Pillow
# Download inference_example.py from this repo, then:
python inference_example.py invoice.png # auto-detect domain
python inference_example.py invoice.png --domain general # force domain
The script handles everything: OCR, subword-to-word alignment, BIO span merging, and label-prefix stripping. First run downloads ~2.5 GB of model weights.
Manual Usage
from huggingface_hub import snapshot_download
from transformers import AutoModelForTokenClassification, LayoutLMv3Processor
import json, torch
# Download all domains
snapshot_download("rhlprj/invoice-layoutlmv3-multidomain", local_dir="models/")
# Load one domain
domain = "general"
model = AutoModelForTokenClassification.from_pretrained(f"models/{domain}")
processor = LayoutLMv3Processor.from_pretrained(f"models/{domain}", apply_ocr=False)
with open(f"models/{domain}/label_maps.json") as f:
label_maps = json.load(f)
id2label = {int(k): v for k, v in label_maps["id2label"].items()}
# Encode (supply your own OCR words + bboxes normalised to 0-1000)
encoding = processor(
images=pil_image,
text=ocr_words, # List[str]
boxes=boxes_0_1000, # List[List[int]], each [x0, y0, x1, y1] in 0-1000
truncation=True,
padding="max_length",
max_length=512,
return_tensors="pt",
)
# Run model
with torch.no_grad():
outputs = model(**{k: v.to(model.device) for k, v in encoding.items()})
token_logits = outputs.logits[0].cpu()
# CRITICAL: map subword predictions back to word level using word_ids()
# Do NOT use preds[1:len(words)+1] — that assumes 1 token per word and WILL break.
word_ids = encoding.word_ids(0)
first_subword = {}
for tok_idx, w_id in enumerate(word_ids):
if w_id is not None and w_id not in first_subword:
first_subword[w_id] = tok_idx
for w_idx in range(len(ocr_words)):
tok_idx = first_subword.get(w_idx)
if tok_idx is not None:
label = id2label[int(token_logits[tok_idx].argmax())]
print(f" {ocr_words[w_idx]:30s} -> {label}")
Important: Subword Alignment
LayoutLMv3 uses a RoBERTa tokenizer that splits words into subword tokens.
For example, INV-2025-00782 becomes 6+ subword tokens. The model predicts
one BIO label per subword, so you must use encoding.word_ids(0) to map
predictions back to word level. Taking predictions[1:len(words)+1] is
incorrect and will produce garbage labels.
See inference_example.py for the complete, tested implementation.
Training
- Base model:
microsoft/layoutlmv3-base(133M params) - Method: LoRA (rank=16, alpha=32, target=query+value) - 0.44% trainable params
- Data: Synthetic invoices with auto-aligned BIO labels via EasyOCR + rapidfuzz
- Hardware: NVIDIA RTX 2000 Ada (8 GB VRAM)
Label Maps
Each domain folder contains label_maps.json with the full BIO label set.
Labels follow the format: O, B-<field>, I-<field>.
Model tree for rhlprj/invoice-layoutlmv3-multidomain
Base model
microsoft/layoutlmv3-base