Autopus's picture
Update README.md
7cc42ce verified
---
title: KDDA Global Model - Invoices
emoji: 🐨
---
# Configuration
`title`: _string_
Display title for the Space
`emoji`: _string_
Space emoji (emoji-only character allowed)
`colorFrom`: _string_
Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
`colorTo`: _string_
Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
`sdk`: _string_
Can be either `gradio` or `streamlit`
`app_file`: _string_
Path to your main application file (which contains either `gradio` or `streamlit` Python code).
Path is relative to the root of the repository.
`pinned`: _boolean_
Whether the Space stays on top of your list.
# Custom LayoutLM Model for Invoice Processing
This repository hosts a custom implementation of the [LayoutLM](https://huggingface.co/microsoft/layoutlm-base-uncased) model, specifically fine-tuned for extracting key information from invoices. The model is designed to identify and extract various fields such as amounts, dates, and names from invoice documents.
## Model Overview
This model is based on the LayoutLMv2 architecture and has been fine-tuned on a custom dataset of invoices. It is capable of performing token classification to extract the following entities:
- **Amount Including Tax**
- **Due Date**
- **Reference Number**
- **Customer Name**
- **Vendor Name**
- **Issue Date**
- **Amount**
The model uses a custom set of labels to identify and classify these entities within the invoice documents.
## Label Mapping
The model has been trained with the following `label2id` and `id2label` mappings:
### `label2id` Mapping
```json
label2id = {
'I-Customer Name': 0,
'B-Issue Date': 1,
'I-Issue Date': 2,
'I-Due Date': 3,
'I-Amount': 4,
'B-Due Date': 5,
'O': 6,
'B-Amount Including tax': 7,
'B-Customer Name': 8,
'B-Amount': 9,
'I-Amount Including tax': 10,
'B-Vendor Name': 11,
'I-Vendor Name': 12,
'I-Reference Number': 13,
'B-Reference Number': 14
}
id2label = {
0: 'I-Customer Name',
1: 'B-Issue Date',
2: 'I-Issue Date',
3: 'I-Due Date',
4: 'I-Amount',
5: 'B-Due Date',
6: 'O',
7: 'B-Amount Including tax',
8: 'B-Customer Name',
9: 'B-Amount',
10: 'I-Amount Including tax',
11: 'B-Vendor Name',
12: 'I-Vendor Name',
13: 'I-Reference Number',
14: 'B-Reference Number'
}
## Citation
@article{Xu2020LayoutLM,
title={LayoutLM: Multi-modal Pre-training for Visually-Rich Document Understanding},
author={Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou},
journal={ArXiv},
year={2020},
volume={abs/2012.14740}
}