metadata
pipeline_tag: document-question-answering
language: en
license: mit
tags:
- layoutlm
- pdf
LayoutLM for Visual Question Answering
This is a fine-tuned version of the multi-modal LayoutLM model for the task of question answering on documents. It has been fine-tuned using both the SQuAD2.0 and DocVQA datasets.
Getting started with the model
To run these examples, you must have PIL, pytesseract, and PyTorch installed in addition to transformers.
from transformers import pipeline
nlp = pipeline(
"document-question-answering",
model="impira/layoutlm-document-qa",
)
nlp(
"https://templates.invoicehome.com/invoice-template-us-neat-750px.png",
"What is the invoice number?"
)
# {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}
nlp(
"https://miro.medium.com/max/787/1*iECQRIiOGTmEFLdWkVIH2g.jpeg",
"What is the purchase amount?"
)
# {'score': 0.9912159, 'answer': '$1,000,000,000', 'start': 97, 'end': 97}
nlp(
"https://www.accountingcoach.com/wp-content/uploads/2013/10/income-statement-example@2x.png",
"What are the 2020 net sales?"
)
# {'score': 0.59147286, 'answer': '$ 3,750', 'start': 19, 'end': 20}
NOTE: This model and pipeline was recently landed in transformers via PR #18407 and PR #18414, so you'll need to use a recent version of transformers, for example:
pip install git+https://github.com/huggingface/transformers.git@2ef774211733f0acf8d3415f9284c49ef219e991