LayoutLMv3 SROIE Token Classification

This model is a fine-tuned version of LayoutLMv3 for invoice token classification using the SROIE dataset.

Task

Token classification for document understanding:

Invoice field extraction
Key information detection (company name, date, address, total)

Dataset

SROIE (Scanned Receipts OCR and Information Extraction)

Model

Base: LayoutLMv3
Fine-tuned on SROIE for invoice understanding

Evaluation Result

Accuracy: 0.99
F1 Score: 0.96
Precision: 0.95
Recall: 0.96
Note: The model is evaluated on the SROIE test dataset.

Inference Example

from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
from PIL import Image
import torch
import pytesseract  # other OCR library can also be used
 

# load model & image processor
processor = LayoutLMv3Processor.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
model = LayoutLMv3ForTokenClassification.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")

# load image to perform inference
IMAGE_PATH = "path/to/the/image.jpg"
img = Image.open(IMAGE_PATH).covert("RGB")
width, height = img.size

# perform OCR
# note: OCR step can be skipped, if "apply_ocr=True" is specified while loading processor
ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
words, boxes = find_words_and_bboxes(ocr_data) # this function finds bounding boxes from input dictionary and maps it to words

# prepare input for the model
encoding = processor(
    img,
    words,
    boxes=boxes,
    return_tensors="pt",
    truncation=True,
    padding="max_length",
    max_length=512,
) 

# perform inference
with torch.no_grad():
  outputs = model(**encoding)
predictions = torch.argmax(outputs.logits, dim=-1)[0].cpu().numpy()

# decode predictions
tokens = processor.tokenizer.convert_ids_to_tokens(
    encoding["input_ids"][0].cpu().numpy()
)

# print result
id2label = model.config.id2label
print("\nToken predictions:\n")
for token, pred in zip(tokens, predictions):
    print(f"{token:15} -> {id2label[pred]}")

# additional processing is required to convert tokens into words and sentences

BIO (NER) Tagging Scheme

Tag	Meaning	Description
B-COMPANY	Beginning of Company	First token of a company name
I-COMPANY	Inside Company	Subsequent token of a company name
B-DATE	Beginning of Date	First token of a date expression
I-DATE	Inside Date	Subsequent token of a date
B-ADDRESS	Beginning of Address	First token of an address
I-ADDRESS	Inside Address	Subsequent token of an address
B-TOTAL	Beginning of Total	First token of a total amount
I-TOTAL	Inside Total	Subsequent token of a total amount
O	Outside	Token is not part of any entity

Use Cases

Invoice processing automation
Document AI pipelines
Financial document parsing

Related Work

Ai-Invoice-Automation Project is built on top of this model.
Model finetuning source code can be found here.

Downloads last month: 15

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for devashish-pisal/layoutlmv3-sroie-token-classification

Base model

microsoft/layoutlmv3-base

Finetuned

(305)

this model

devashish-pisal
/

layoutlmv3-sroie-token-classification