devashish-pisal commited on
Commit
d3257be
·
1 Parent(s): 451a698

extend inference code and overall readme

Browse files
Files changed (1) hide show
  1. README.md +91 -11
README.md CHANGED
@@ -6,6 +6,8 @@ language:
6
  metrics:
7
  - f1
8
  - accuracy
 
 
9
  base_model:
10
  - microsoft/layoutlmv3-base
11
  pipeline_tag: token-classification
@@ -17,53 +19,131 @@ tags:
17
  - invoice
18
  - sroie
19
  - transformers
 
 
 
 
20
  ---
21
 
 
22
 
23
  # LayoutLMv3 SROIE Token Classification
24
 
25
  This model is a fine-tuned version of LayoutLMv3 for **invoice token classification** using the SROIE dataset.
26
 
 
27
 
28
  ## Task
29
  Token classification for document understanding:
30
  - Invoice field extraction
31
- - Key information detection (company, date, total, etc.)
32
 
 
33
 
34
  ## Dataset
35
  - [SROIE](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2?select=SROIE2019) (Scanned Receipts OCR and Information Extraction)
36
 
 
37
 
38
  ## Model
39
  - Base: LayoutLMv3
40
  - Fine-tuned on SROIE for invoice understanding
41
 
 
42
 
43
- ## Use Cases
44
- - Invoice processing automation
45
- - Document AI pipelines
46
- - Financial document parsing
 
 
47
 
 
48
 
49
  ## Inference Example
50
 
51
  ```python
52
  from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
 
 
 
 
53
 
 
54
  processor = LayoutLMv3Processor.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
55
  model = LayoutLMv3ForTokenClassification.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ```
57
 
 
58
 
59
- # Evaluation Result
60
- - Accuracy: 0.99
61
- - F1 Score: 0.96
62
- - Precision: 0.95
63
- - Recall: 0.96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
 
65
 
66
  ## Related Work
67
  - [Ai-Invoice-Automation Project](https://github.com/Devashish-Pisal/ai-document-automation) is built on top of this model.
 
 
 
68
 
69
-
 
 
 
 
6
  metrics:
7
  - f1
8
  - accuracy
9
+ - precision
10
+ - recall
11
  base_model:
12
  - microsoft/layoutlmv3-base
13
  pipeline_tag: token-classification
 
19
  - invoice
20
  - sroie
21
  - transformers
22
+ - BIO-tagging
23
+ - NER
24
+ - named-entity-recognition
25
+ - multimodel
26
  ---
27
 
28
+ ---
29
 
30
  # LayoutLMv3 SROIE Token Classification
31
 
32
  This model is a fine-tuned version of LayoutLMv3 for **invoice token classification** using the SROIE dataset.
33
 
34
+ ---
35
 
36
  ## Task
37
  Token classification for document understanding:
38
  - Invoice field extraction
39
+ - Key information detection (company name, date, address, total)
40
 
41
+ ---
42
 
43
  ## Dataset
44
  - [SROIE](https://www.kaggle.com/datasets/urbikn/sroie-datasetv2?select=SROIE2019) (Scanned Receipts OCR and Information Extraction)
45
 
46
+ ---
47
 
48
  ## Model
49
  - Base: LayoutLMv3
50
  - Fine-tuned on SROIE for invoice understanding
51
 
52
+ ---
53
 
54
+ # Evaluation Result
55
+ - Accuracy: 0.99
56
+ - F1 Score: 0.96
57
+ - Precision: 0.95
58
+ - Recall: 0.96
59
+ - Note: The model is evaluated on the SROIE test dataset.
60
 
61
+ ---
62
 
63
  ## Inference Example
64
 
65
  ```python
66
  from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
67
+ from PIL import Image
68
+ import torch
69
+ import pytesseract # other OCR library can also be used
70
+
71
 
72
+ # load model & image processor
73
  processor = LayoutLMv3Processor.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
74
  model = LayoutLMv3ForTokenClassification.from_pretrained("devashish-pisal/layoutlmv3-sroie-token-classification")
75
+
76
+ # load image to perform inference
77
+ IMAGE_PATH = "path/to/the/image.jpg"
78
+ img = Image.open(IMAGE_PATH).covert("RGB")
79
+ width, height = img.size
80
+
81
+ # perform OCR
82
+ # note: OCR step can be skipped, if "apply_ocr=True" is specified while loading processor
83
+ ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
84
+ words, boxes = find_words_and_bboxes(ocr_data) # this function finds bounding boxes from input dictionary and maps it to words
85
+
86
+ # prepare input for the model
87
+ encoding = processor(
88
+ img,
89
+ words,
90
+ boxes=boxes,
91
+ return_tensors="pt",
92
+ truncation=True,
93
+ padding="max_length",
94
+ max_length=512,
95
+ )
96
+
97
+ # perform inference
98
+ with torch.no_grad():
99
+ outputs = model(**encoding)
100
+ predictions = torch.argmax(outputs.logits, dim=-1)[0].cpu().numpy()
101
+
102
+ # decode predictions
103
+ tokens = processor.tokenizer.convert_ids_to_tokens(
104
+ encoding["input_ids"][0].cpu().numpy()
105
+ )
106
+
107
+ # print result
108
+ id2label = model.config.id2label
109
+ print("\nToken predictions:\n")
110
+ for token, pred in zip(tokens, predictions):
111
+ print(f"{token:15} -> {id2label[pred]}")
112
+
113
+ # additional processing is required to convert tokens into words and sentences
114
  ```
115
 
116
+ ---
117
 
118
+ # BIO (NER) Tagging Scheme
119
+ | Tag | Meaning | Description |
120
+ |-----|--------|------------|
121
+ | B-COMPANY | Beginning of Company | First token of a company name |
122
+ | I-COMPANY | Inside Company | Subsequent token of a company name |
123
+ | B-DATE | Beginning of Date | First token of a date expression |
124
+ | I-DATE | Inside Date | Subsequent token of a date |
125
+ | B-ADDRESS | Beginning of Address | First token of an address |
126
+ | I-ADDRESS | Inside Address | Subsequent token of an address |
127
+ | B-TOTAL | Beginning of Total | First token of a total amount |
128
+ | I-TOTAL | Inside Total | Subsequent token of a total amount |
129
+ | O | Outside | Token is not part of any entity |
130
+
131
+ ---
132
+
133
+ ## Use Cases
134
+ - Invoice processing automation
135
+ - Document AI pipelines
136
+ - Financial document parsing
137
 
138
+ ---
139
 
140
  ## Related Work
141
  - [Ai-Invoice-Automation Project](https://github.com/Devashish-Pisal/ai-document-automation) is built on top of this model.
142
+ - Model finetuning source code can be found [here](https://github.com/Devashish-Pisal/ai-document-automation/tree/main/src/model_finetuning).
143
+
144
+ ---
145
 
146
+ ## Support
147
+ - If you find this model useful, please support me by giving one 💖 to this model repository.
148
+ - Thank you!
149
+ ---