Autopus
/

global_kdda_index_v2

Model card Files Files and versions

global_kdda_index_v2 / README.md

Autopus's picture

Update README.md

7cc42ce verified over 1 year ago

|

history blame contribute delete

2.7 kB

	---
	title: KDDA Global Model - Invoices
	emoji: 🐨
	---

	# Configuration

	`title`: _string_
	Display title for the Space

	`emoji`: _string_
	Space emoji (emoji-only character allowed)

	`colorFrom`: _string_
	Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)

	`colorTo`: _string_
	Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)

	`sdk`: _string_
	Can be either `gradio` or `streamlit`

	`app_file`: _string_
	Path to your main application file (which contains either `gradio` or `streamlit` Python code).
	Path is relative to the root of the repository.

	`pinned`: _boolean_
	Whether the Space stays on top of your list.

	# Custom LayoutLM Model for Invoice Processing

	This repository hosts a custom implementation of the [LayoutLM](https://huggingface.co/microsoft/layoutlm-base-uncased) model, specifically fine-tuned for extracting key information from invoices. The model is designed to identify and extract various fields such as amounts, dates, and names from invoice documents.

	## Model Overview

	This model is based on the LayoutLMv2 architecture and has been fine-tuned on a custom dataset of invoices. It is capable of performing token classification to extract the following entities:

	- Amount Including Tax
	- Due Date
	- Reference Number
	- Customer Name
	- Vendor Name
	- Issue Date
	- Amount

	The model uses a custom set of labels to identify and classify these entities within the invoice documents.

	## Label Mapping

	The model has been trained with the following `label2id` and `id2label` mappings:

	### `label2id` Mapping

	```json
	label2id = {
	'I-Customer Name': 0,
	'B-Issue Date': 1,
	'I-Issue Date': 2,
	'I-Due Date': 3,
	'I-Amount': 4,
	'B-Due Date': 5,
	'O': 6,
	'B-Amount Including tax': 7,
	'B-Customer Name': 8,
	'B-Amount': 9,
	'I-Amount Including tax': 10,
	'B-Vendor Name': 11,
	'I-Vendor Name': 12,
	'I-Reference Number': 13,
	'B-Reference Number': 14
	}
	id2label = {
	0: 'I-Customer Name',
	1: 'B-Issue Date',
	2: 'I-Issue Date',
	3: 'I-Due Date',
	4: 'I-Amount',
	5: 'B-Due Date',
	6: 'O',
	7: 'B-Amount Including tax',
	8: 'B-Customer Name',
	9: 'B-Amount',
	10: 'I-Amount Including tax',
	11: 'B-Vendor Name',
	12: 'I-Vendor Name',
	13: 'I-Reference Number',
	14: 'B-Reference Number'
	}


	## Citation
	@article{Xu2020LayoutLM,
	title={LayoutLM: Multi-modal Pre-training for Visually-Rich Document Understanding},
	author={Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou},
	journal={ArXiv},
	year={2020},
	volume={abs/2012.14740}
	}