LayoutLMv3 SROIE Token Classification
This model is a fine-tuned version of LayoutLMv3 for invoice token classification using the SROIE dataset.
Task
Token classification for document understanding:
- Invoice field extraction
- Key information detection (company name, date, address, total)
Dataset
- SROIE (Scanned Receipts OCR and Information Extraction)
Model
- Base: LayoutLMv3
- Fine-tuned on SROIE for invoice understanding
Evaluation Result
- Accuracy: 0.99
- F1 Score: 0.96
- Precision: 0.95
- Recall: 0.96
- Note: The model is evaluated on the SROIE test dataset.
Inference Example
from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
from PIL import Image
import torch
import pytesseract # other OCR library can also be used
# load model & image processor
processor = LayoutLMv3Processor.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
model = LayoutLMv3ForTokenClassification.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
# load image to perform inference
IMAGE_PATH = "path/to/the/image.jpg"
img = Image.open(IMAGE_PATH).covert("RGB")
width, height = img.size
# perform OCR
# note: OCR step can be skipped, if "apply_ocr=True" is specified while loading processor
ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
words, boxes = find_words_and_bboxes(ocr_data) # this function finds bounding boxes from input dictionary and maps it to words
# prepare input for the model
encoding = processor(
img,
words,
boxes=boxes,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=512,
)
# perform inference
with torch.no_grad():
outputs = model(**encoding)
predictions = torch.argmax(outputs.logits, dim=-1)[0].cpu().numpy()
# decode predictions
tokens = processor.tokenizer.convert_ids_to_tokens(
encoding["input_ids"][0].cpu().numpy()
)
# print result
id2label = model.config.id2label
print("\nToken predictions:\n")
for token, pred in zip(tokens, predictions):
print(f"{token:15} -> {id2label[pred]}")
# additional processing is required to convert tokens into words and sentences
BIO (NER) Tagging Scheme
| Tag | Meaning | Description |
|---|---|---|
| B-COMPANY | Beginning of Company | First token of a company name |
| I-COMPANY | Inside Company | Subsequent token of a company name |
| B-DATE | Beginning of Date | First token of a date expression |
| I-DATE | Inside Date | Subsequent token of a date |
| B-ADDRESS | Beginning of Address | First token of an address |
| I-ADDRESS | Inside Address | Subsequent token of an address |
| B-TOTAL | Beginning of Total | First token of a total amount |
| I-TOTAL | Inside Total | Subsequent token of a total amount |
| O | Outside | Token is not part of any entity |
Use Cases
- Invoice processing automation
- Document AI pipelines
- Financial document parsing
Related Work
- Ai-Invoice-Automation Project is built on top of this model.
- Model finetuning source code can be found here.
Support
- If you find this model useful, please support me by giving one 💖 to this model repository.
- Thank you!
- Downloads last month
- 38
Model tree for devashish-pisal/layoutlmv3-sroie-token-classification
Base model
microsoft/layoutlmv3-base