devashish-pisal
/

layoutlmv3-sroie-token-classification

@@ -6,6 +6,8 @@ language:
 metrics:
 - f1
 - accuracy
 base_model:
 - microsoft/layoutlmv3-base
 pipeline_tag: token-classification
@@ -17,53 +19,131 @@ tags:
 - invoice
 - sroie
 - transformers
 ---
 # LayoutLMv3 SROIE Token Classification
 This model is a fine-tuned version of LayoutLMv3 for **invoice token classification** using the SROIE dataset.
 ## Task
 Token classification for document understanding:
 - Invoice field extraction
-- Key information detection (company, date, total, etc.)
 ## Dataset
 - [SROIE](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2?select=SROIE2019) (Scanned Receipts OCR and Information Extraction)
 ## Model
 - Base: LayoutLMv3
 - Fine-tuned on SROIE for invoice understanding
-## Use Cases
-- Invoice processing automation
-- Document AI pipelines
-- Financial document parsing
 ## Inference Example
 ```python
 from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
 processor = LayoutLMv3Processor.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
 model = LayoutLMv3ForTokenClassification.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
 ```
-# Evaluation Result
-- Accuracy: 0.99
-- F1 Score: 0.96
-- Precision: 0.95
-- Recall: 0.96
 ## Related Work
 - [Ai-Invoice-Automation Project](https://github.com/Devashish-Pisal/ai-document-automation) is built on top of this model.

 metrics:
 - f1
 - accuracy
+- precision
+- recall
 base_model:
 - microsoft/layoutlmv3-base
 pipeline_tag: token-classification
 - invoice
 - sroie
 - transformers
+- BIO-tagging
+- NER
+- named-entity-recognition
+- multimodel
 ---
+---
 # LayoutLMv3 SROIE Token Classification
 This model is a fine-tuned version of LayoutLMv3 for **invoice token classification** using the SROIE dataset.
+---
 ## Task
 Token classification for document understanding:
 - Invoice field extraction
+- Key information detection (company name, date, address, total)
+---
 ## Dataset
 - [SROIE](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2?select=SROIE2019) (Scanned Receipts OCR and Information Extraction)
+---
 ## Model
 - Base: LayoutLMv3
 - Fine-tuned on SROIE for invoice understanding
+---
+# Evaluation Result
+- Accuracy: 0.99
+- F1 Score: 0.96
+- Precision: 0.95
+- Recall: 0.96
+- Note: The model is evaluated on the SROIE test dataset.
+---
 ## Inference Example
 ```python
 from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
+from PIL import Image
+import torch
+import pytesseract  # other OCR library can also be used
+# load model & image processor
 processor = LayoutLMv3Processor.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
 model = LayoutLMv3ForTokenClassification.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
+# load image to perform inference
+IMAGE_PATH = "path/to/the/image.jpg"
+img = Image.open(IMAGE_PATH).covert("RGB")
+width, height = img.size
+# perform OCR
+# note: OCR step can be skipped, if "apply_ocr=True" is specified while loading processor
+ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
+words, boxes = find_words_and_bboxes(ocr_data) # this function finds bounding boxes from input dictionary and maps it to words
+# prepare input for the model
+encoding = processor(
+    img,
+    words,
+    boxes=boxes,
+    return_tensors="pt",
+    truncation=True,
+    padding="max_length",
+    max_length=512,
+)
+# perform inference
+with torch.no_grad():
+  outputs = model(**encoding)
+predictions = torch.argmax(outputs.logits, dim=-1)[0].cpu().numpy()
+# decode predictions
+tokens = processor.tokenizer.convert_ids_to_tokens(
+    encoding["input_ids"][0].cpu().numpy()
+)
+# print result
+id2label = model.config.id2label
+print("\nToken predictions:\n")
+for token, pred in zip(tokens, predictions):
+    print(f"{token:15} -> {id2label[pred]}")
+# additional processing is required to convert tokens into words and sentences
 ```
+---
+# BIO (NER) Tagging Scheme
+| Tag | Meaning | Description |
+|-----|--------|------------|
+| B-COMPANY | Beginning of Company | First token of a company name |
+| I-COMPANY | Inside Company | Subsequent token of a company name |
+| B-DATE | Beginning of Date | First token of a date expression |
+| I-DATE | Inside Date | Subsequent token of a date |
+| B-ADDRESS | Beginning of Address | First token of an address |
+| I-ADDRESS | Inside Address | Subsequent token of an address |
+| B-TOTAL | Beginning of Total | First token of a total amount |
+| I-TOTAL | Inside Total | Subsequent token of a total amount |
+| O | Outside | Token is not part of any entity |
+---
+## Use Cases
+- Invoice processing automation
+- Document AI pipelines
+- Financial document parsing
+---
 ## Related Work
 - [Ai-Invoice-Automation Project](https://github.com/Devashish-Pisal/ai-document-automation) is built on top of this model.
+- Model finetuning source code can be found [here](https://github.com/Devashish-Pisal/ai-document-automation/tree/main/src/model_finetuning).
+---
+## Support
+- If you find this model useful, please support me by giving one 💖 to this model repository.
+- Thank you!
+---