Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Document Question Answering","local":"document-question-answering","sections":[{"title":"Load the data","local":"load-the-data","sections":[],"depth":2},{"title":"Preprocess the data","local":"preprocess-the-data","sections":[{"title":"Preprocessing document images","local":"preprocessing-document-images","sections":[],"depth":3},{"title":"Preprocessing text data","local":"preprocessing-text-data","sections":[],"depth":3}],"depth":2},{"title":"Evaluation","local":"evaluation","sections":[],"depth":2},{"title":"Train","local":"train","sections":[],"depth":2},{"title":"Inference","local":"inference","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/transformers/main/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/entry/start.2135b7e6.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/scheduler.25b97de1.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/singletons.0f2b7d5f.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/index.e188933d.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/paths.3d04d2c6.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/entry/app.24372c84.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/index.d9030fc9.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/nodes/0.026d2fdd.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/nodes/401.ab8a29e2.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/Tip.baa67368.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/CodeBlock.e6cd0d95.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/DocNotebookDropdown.5ea6cb78.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/globals.7f7f1b26.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/en/_app/immutable/chunks/EditOnGithub.91d95064.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Document Question Answering","local":"document-question-answering","sections":[{"title":"Load the data","local":"load-the-data","sections":[],"depth":2},{"title":"Preprocess the data","local":"preprocess-the-data","sections":[{"title":"Preprocessing document images","local":"preprocessing-document-images","sections":[],"depth":3},{"title":"Preprocessing text data","local":"preprocessing-text-data","sections":[],"depth":3}],"depth":2},{"title":"Evaluation","local":"evaluation","sections":[],"depth":2},{"title":"Train","local":"train","sections":[],"depth":2},{"title":"Inference","local":"inference","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="document-question-answering" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#document-question-answering"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Document Question Answering</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <p data-svelte-h="svelte-1c1m6de">Document Question Answering, also referred to as Document Visual Question Answering, is a task that involves providing | |
| answers to questions posed about document images. The input to models supporting this task is typically a combination of an image and | |
| a question, and the output is an answer expressed in natural language. These models utilize multiple modalities, including | |
| text, the positions of words (bounding boxes), and the image itself.</p> <p data-svelte-h="svelte-ku8orh">This guide illustrates how to:</p> <ul data-svelte-h="svelte-1g8eree"><li>Fine-tune <a href="../model_doc/layoutlmv2">LayoutLMv2</a> on the <a href="https://huggingface.co/datasets/nielsr/docvqa_1200_examples_donut" rel="nofollow">DocVQA dataset</a>.</li> <li>Use your fine-tuned model for inference.</li></ul> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1nrnqa3">To see all architectures and checkpoints compatible with this task, we recommend checking the <a href="https://huggingface.co/tasks/image-to-text" rel="nofollow">task-page</a></p></div> <p data-svelte-h="svelte-1svbrv5">LayoutLMv2 solves the document question-answering task by adding a question-answering head on top of the final hidden | |
| states of the tokens, to predict the positions of the start and end tokens of the | |
| answer. In other words, the problem is treated as extractive question answering: given the context, extract which piece | |
| of information answers the question. The context comes from the output of an OCR engine, here it is Google’s Tesseract.</p> <p data-svelte-h="svelte-17fjxql">Before you begin, make sure you have all the necessary libraries installed. LayoutLMv2 depends on detectron2, torchvision and tesseract.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install -q transformers datasets<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install <span class="hljs-string">'git+https://github.com/facebookresearch/detectron2.git'</span> | |
| pip install torchvision<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sudo apt install tesseract-ocr | |
| pip install -q pytesseract<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-hsz112">Once you have installed all of the dependencies, restart your runtime.</p> <p data-svelte-h="svelte-1yqpblu">We encourage you to share your model with the community. Log in to your Hugging Face account to upload it to the 🤗 Hub. | |
| When prompted, enter your token to log in:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> notebook_login | |
| <span class="hljs-meta">>>> </span>notebook_login()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1us2g34">Let’s define some global variables.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>model_checkpoint = <span class="hljs-string">"microsoft/layoutlmv2-base-uncased"</span> | |
| <span class="hljs-meta">>>> </span>batch_size = <span class="hljs-number">4</span><!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="load-the-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#load-the-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Load the data</span></h2> <p data-svelte-h="svelte-xkaeyi">In this guide we use a small sample of preprocessed DocVQA that you can find on 🤗 Hub. If you’d like to use the full | |
| DocVQA dataset, you can register and download it on <a href="https://rrc.cvc.uab.es/?ch=17" rel="nofollow">DocVQA homepage</a>. If you do so, to | |
| proceed with this guide check out <a href="https://huggingface.co/docs/datasets/loading#local-and-remote-files" rel="nofollow">how to load files into a 🤗 dataset</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| <span class="hljs-meta">>>> </span>dataset = load_dataset(<span class="hljs-string">"nielsr/docvqa_1200_examples"</span>) | |
| <span class="hljs-meta">>>> </span>dataset | |
| DatasetDict({ | |
| train: Dataset({ | |
| features: [<span class="hljs-string">'id'</span>, <span class="hljs-string">'image'</span>, <span class="hljs-string">'query'</span>, <span class="hljs-string">'answers'</span>, <span class="hljs-string">'words'</span>, <span class="hljs-string">'bounding_boxes'</span>, <span class="hljs-string">'answer'</span>], | |
| num_rows: <span class="hljs-number">1000</span> | |
| }) | |
| test: Dataset({ | |
| features: [<span class="hljs-string">'id'</span>, <span class="hljs-string">'image'</span>, <span class="hljs-string">'query'</span>, <span class="hljs-string">'answers'</span>, <span class="hljs-string">'words'</span>, <span class="hljs-string">'bounding_boxes'</span>, <span class="hljs-string">'answer'</span>], | |
| num_rows: <span class="hljs-number">200</span> | |
| }) | |
| })<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-18ggx10">As you can see, the dataset is split into train and test sets already. Take a look at a random example to familiarize | |
| yourself with the features.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-string">"train"</span>].features<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1fi388d">Here’s what the individual fields represent:</p> <ul data-svelte-h="svelte-12b5dxa"><li><code>id</code>: the example’s id</li> <li><code>image</code>: a PIL.Image.Image object containing the document image</li> <li><code>query</code>: the question string - natural language asked question, in several languages</li> <li><code>answers</code>: a list of correct answers provided by human annotators</li> <li><code>words</code> and <code>bounding_boxes</code>: the results of OCR, which we will not use here</li> <li><code>answer</code>: an answer matched by a different model which we will not use here</li></ul> <p data-svelte-h="svelte-1h0f0qo">Let’s leave only English questions, and drop the <code>answer</code> feature which appears to contain predictions by another model. | |
| We’ll also take the first of the answers from the set provided by the annotators. Alternatively, you can randomly sample it.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset = dataset.<span class="hljs-built_in">map</span>(<span class="hljs-keyword">lambda</span> example: {<span class="hljs-string">"question"</span>: example[<span class="hljs-string">"query"</span>][<span class="hljs-string">"en"</span>]}, remove_columns=[<span class="hljs-string">"query"</span>]) | |
| <span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.<span class="hljs-built_in">map</span>( | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">lambda</span> example: {<span class="hljs-string">"answer"</span>: example[<span class="hljs-string">"answers"</span>][<span class="hljs-number">0</span>]}, remove_columns=[<span class="hljs-string">"answer"</span>, <span class="hljs-string">"answers"</span>] | |
| <span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-x5p0j2">Note that the LayoutLMv2 checkpoint that we use in this guide has been trained with <code>max_position_embeddings = 512</code> (you can | |
| find this information in the <a href="https://huggingface.co/microsoft/layoutlmv2-base-uncased/blob/main/config.json#L18" rel="nofollow">checkpoint’s <code>config.json</code> file</a>). | |
| We can truncate the examples but to avoid the situation where the answer might be at the end of a large document and end up truncated, | |
| here we’ll remove the few examples where the embedding is likely to end up longer than 512. | |
| If most of the documents in your dataset are long, you can implement a sliding window strategy - check out <a href="https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb" rel="nofollow">this notebook</a> for details.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.<span class="hljs-built_in">filter</span>(<span class="hljs-keyword">lambda</span> x: <span class="hljs-built_in">len</span>(x[<span class="hljs-string">"words"</span>]) + <span class="hljs-built_in">len</span>(x[<span class="hljs-string">"question"</span>].split()) < <span class="hljs-number">512</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ydtan0">At this point let’s also remove the OCR features from this dataset. These are a result of OCR for fine-tuning a different | |
| model. They would still require some processing if we wanted to use them, as they do not match the input requirements | |
| of the model we use in this guide. Instead, we can use the <a href="/docs/transformers/main/en/model_doc/layoutlmv2#transformers.LayoutLMv2Processor">LayoutLMv2Processor</a> on the original data for both OCR and | |
| tokenization. This way we’ll get the inputs that match model’s expected input. If you want to process images manually, | |
| check out the <a href="../model_doc/layoutlmv2"><code>LayoutLMv2</code> model documentation</a> to learn what input format the model expects.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.remove_columns(<span class="hljs-string">"words"</span>) | |
| <span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.remove_columns(<span class="hljs-string">"bounding_boxes"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1vy80t">Finally, the data exploration won’t be complete if we don’t peek at an image example.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset[<span class="hljs-string">"train"</span>][<span class="hljs-number">11</span>][<span class="hljs-string">"image"</span>]<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-q63tj1"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/docvqa_example.jpg" alt="DocVQA Image Example"></div> <h2 class="relative group"><a id="preprocess-the-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocess-the-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preprocess the data</span></h2> <p data-svelte-h="svelte-1hlj9yo">The Document Question Answering task is a multimodal task, and you need to make sure that the inputs from each modality | |
| are preprocessed according to the model’s expectations. Let’s start by loading the <a href="/docs/transformers/main/en/model_doc/layoutlmv2#transformers.LayoutLMv2Processor">LayoutLMv2Processor</a>, which internally combines an image processor that can handle image data and a tokenizer that can encode text data.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor | |
| <span class="hljs-meta">>>> </span>processor = AutoProcessor.from_pretrained(model_checkpoint)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="preprocessing-document-images" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocessing-document-images"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preprocessing document images</span></h3> <p data-svelte-h="svelte-1u7369n">First, let’s prepare the document images for the model with the help of the <code>image_processor</code> from the processor. | |
| By default, image processor resizes the images to 224x224, makes sure they have the correct order of color channels, | |
| applies OCR with tesseract to get words and normalized bounding boxes. In this tutorial, all of these defaults are exactly what we need. | |
| Write a function that applies the default image processing to a batch of images and returns the results of OCR.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>image_processor = processor.image_processor | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">get_ocr_words_and_boxes</span>(<span class="hljs-params">examples</span>): | |
| <span class="hljs-meta">... </span> images = [image.convert(<span class="hljs-string">"RGB"</span>) <span class="hljs-keyword">for</span> image <span class="hljs-keyword">in</span> examples[<span class="hljs-string">"image"</span>]] | |
| <span class="hljs-meta">... </span> encoded_inputs = image_processor(images) | |
| <span class="hljs-meta">... </span> examples[<span class="hljs-string">"image"</span>] = encoded_inputs.pixel_values | |
| <span class="hljs-meta">... </span> examples[<span class="hljs-string">"words"</span>] = encoded_inputs.words | |
| <span class="hljs-meta">... </span> examples[<span class="hljs-string">"boxes"</span>] = encoded_inputs.boxes | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> examples<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1um3786">To apply this preprocessing to the entire dataset in a fast way, use <a href="https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map" rel="nofollow">map</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset_with_ocr = updated_dataset.<span class="hljs-built_in">map</span>(get_ocr_words_and_boxes, batched=<span class="hljs-literal">True</span>, batch_size=<span class="hljs-number">2</span>)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="preprocessing-text-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocessing-text-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preprocessing text data</span></h3> <p data-svelte-h="svelte-dfarfe">Once we have applied OCR to the images, we need to encode the text part of the dataset to prepare it for the model. | |
| This involves converting the words and boxes that we got in the previous step to token-level <code>input_ids</code>, <code>attention_mask</code>, | |
| <code>token_type_ids</code> and <code>bbox</code>. For preprocessing text, we’ll need the <code>tokenizer</code> from the processor.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>tokenizer = processor.tokenizer<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-12sojfo">On top of the preprocessing mentioned above, we also need to add the labels for the model. For <code>xxxForQuestionAnswering</code> models | |
| in 🤗 Transformers, the labels consist of the <code>start_positions</code> and <code>end_positions</code>, indicating which token is at the | |
| start and which token is at the end of the answer.</p> <p data-svelte-h="svelte-1kkerbo">Let’s start with that. Define a helper function that can find a sublist (the answer split into words) in a larger list (the words list).</p> <p data-svelte-h="svelte-1wppb4o">This function will take two lists as input, <code>words_list</code> and <code>answer_list</code>. It will then iterate over the <code>words_list</code> and check | |
| if the current word in the <code>words_list</code> (words_list[i]) is equal to the first word of answer_list (answer_list[0]) and if | |
| the sublist of <code>words_list</code> starting from the current word and of the same length as <code>answer_list</code> is equal <code>to answer_list</code>. | |
| If this condition is true, it means that a match has been found, and the function will record the match, its starting index (idx), | |
| and its ending index (idx + len(answer_list) - 1). If more than one match was found, the function will return only the first one. | |
| If no match is found, the function returns (<code>None</code>, 0, and 0).</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">subfinder</span>(<span class="hljs-params">words_list, answer_list</span>): | |
| <span class="hljs-meta">... </span> matches = [] | |
| <span class="hljs-meta">... </span> start_indices = [] | |
| <span class="hljs-meta">... </span> end_indices = [] | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> idx, i <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(<span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(words_list))): | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> words_list[i] == answer_list[<span class="hljs-number">0</span>] <span class="hljs-keyword">and</span> words_list[i : i + <span class="hljs-built_in">len</span>(answer_list)] == answer_list: | |
| <span class="hljs-meta">... </span> matches.append(answer_list) | |
| <span class="hljs-meta">... </span> start_indices.append(idx) | |
| <span class="hljs-meta">... </span> end_indices.append(idx + <span class="hljs-built_in">len</span>(answer_list) - <span class="hljs-number">1</span>) | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> matches: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> matches[<span class="hljs-number">0</span>], start_indices[<span class="hljs-number">0</span>], end_indices[<span class="hljs-number">0</span>] | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-19pibjd">To illustrate how this function finds the position of the answer, let’s use it on an example:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>example = dataset_with_ocr[<span class="hljs-string">"train"</span>][<span class="hljs-number">1</span>] | |
| <span class="hljs-meta">>>> </span>words = [word.lower() <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> example[<span class="hljs-string">"words"</span>]] | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">match</span>, word_idx_start, word_idx_end = subfinder(words, example[<span class="hljs-string">"answer"</span>].lower().split()) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"Question: "</span>, example[<span class="hljs-string">"question"</span>]) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"Words:"</span>, words) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"Answer: "</span>, example[<span class="hljs-string">"answer"</span>]) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"start_index"</span>, word_idx_start) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"end_index"</span>, word_idx_end) | |
| Question: Who <span class="hljs-keyword">is</span> <span class="hljs-keyword">in</span> cc <span class="hljs-keyword">in</span> this letter? | |
| Words: [<span class="hljs-string">'wie'</span>, <span class="hljs-string">'baw'</span>, <span class="hljs-string">'brown'</span>, <span class="hljs-string">'&'</span>, <span class="hljs-string">'williamson'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'corporation'</span>, <span class="hljs-string">'research'</span>, <span class="hljs-string">'&'</span>, <span class="hljs-string">'development'</span>, <span class="hljs-string">'internal'</span>, <span class="hljs-string">'correspondence'</span>, <span class="hljs-string">'to:'</span>, <span class="hljs-string">'r.'</span>, <span class="hljs-string">'h.'</span>, <span class="hljs-string">'honeycutt'</span>, <span class="hljs-string">'ce:'</span>, <span class="hljs-string">'t.f.'</span>, <span class="hljs-string">'riehl'</span>, <span class="hljs-string">'from:'</span>, <span class="hljs-string">'.'</span>, <span class="hljs-string">'c.j.'</span>, <span class="hljs-string">'cook'</span>, <span class="hljs-string">'date:'</span>, <span class="hljs-string">'may'</span>, <span class="hljs-string">'8,'</span>, <span class="hljs-string">'1995'</span>, <span class="hljs-string">'subject:'</span>, <span class="hljs-string">'review'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'existing'</span>, <span class="hljs-string">'brainstorming'</span>, <span class="hljs-string">'ideas/483'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'major'</span>, <span class="hljs-string">'function'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'product'</span>, <span class="hljs-string">'innovation'</span>, <span class="hljs-string">'graup'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'develop'</span>, <span class="hljs-string">'marketable'</span>, <span class="hljs-string">'nove!'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'would'</span>, <span class="hljs-string">'be'</span>, <span class="hljs-string">'profitable'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'manufacture'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'sell.'</span>, <span class="hljs-string">'novel'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'defined'</span>, <span class="hljs-string">'as:'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'new'</span>, <span class="hljs-string">'kind,'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'different'</span>, <span class="hljs-string">'from'</span>, <span class="hljs-string">'anything'</span>, <span class="hljs-string">'seen'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'known'</span>, <span class="hljs-string">'before.'</span>, <span class="hljs-string">'innovation'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'defined'</span>, <span class="hljs-string">'as:'</span>, <span class="hljs-string">'something'</span>, <span class="hljs-string">'new'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'different'</span>, <span class="hljs-string">'introduced;'</span>, <span class="hljs-string">'act'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'innovating;'</span>, <span class="hljs-string">'introduction'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'new'</span>, <span class="hljs-string">'things'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'methods.'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'may'</span>, <span class="hljs-string">'incorporate'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'latest'</span>, <span class="hljs-string">'technologies,'</span>, <span class="hljs-string">'materials'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'know-how'</span>, <span class="hljs-string">'available'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'give'</span>, <span class="hljs-string">'then'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'unique'</span>, <span class="hljs-string">'taste'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'look.'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'first'</span>, <span class="hljs-string">'task'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'product'</span>, <span class="hljs-string">'innovation'</span>, <span class="hljs-string">'group'</span>, <span class="hljs-string">'was'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'assemble,'</span>, <span class="hljs-string">'review'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'categorize'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'list'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'existing'</span>, <span class="hljs-string">'brainstorming'</span>, <span class="hljs-string">'ideas.'</span>, <span class="hljs-string">'ideas'</span>, <span class="hljs-string">'were'</span>, <span class="hljs-string">'grouped'</span>, <span class="hljs-string">'into'</span>, <span class="hljs-string">'two'</span>, <span class="hljs-string">'major'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'labeled'</span>, <span class="hljs-string">'appearance'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'taste/aroma.'</span>, <span class="hljs-string">'these'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'are'</span>, <span class="hljs-string">'used'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'novel'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'may'</span>, <span class="hljs-string">'differ'</span>, <span class="hljs-string">'from'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'visual'</span>, <span class="hljs-string">'and/or'</span>, <span class="hljs-string">'taste/aroma'</span>, <span class="hljs-string">'point'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'view'</span>, <span class="hljs-string">'compared'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'canventional'</span>, <span class="hljs-string">'cigarettes.'</span>, <span class="hljs-string">'other'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'include'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'combination'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'above,'</span>, <span class="hljs-string">'filters,'</span>, <span class="hljs-string">'packaging'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'brand'</span>, <span class="hljs-string">'extensions.'</span>, <span class="hljs-string">'appearance'</span>, <span class="hljs-string">'this'</span>, <span class="hljs-string">'category'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'used'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'novel'</span>, <span class="hljs-string">'cigarette'</span>, <span class="hljs-string">'constructions'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'yield'</span>, <span class="hljs-string">'visually'</span>, <span class="hljs-string">'different'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'with'</span>, <span class="hljs-string">'minimal'</span>, <span class="hljs-string">'changes'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'smoke'</span>, <span class="hljs-string">'chemistry'</span>, <span class="hljs-string">'two'</span>, <span class="hljs-string">'cigarettes'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'cne.'</span>, <span class="hljs-string">'emulti-plug'</span>, <span class="hljs-string">'te'</span>, <span class="hljs-string">'build'</span>, <span class="hljs-string">'yaur'</span>, <span class="hljs-string">'awn'</span>, <span class="hljs-string">'cigarette.'</span>, <span class="hljs-string">'eswitchable'</span>, <span class="hljs-string">'menthol'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'non'</span>, <span class="hljs-string">'menthol'</span>, <span class="hljs-string">'cigarette.'</span>, <span class="hljs-string">'*cigarettes'</span>, <span class="hljs-string">'with'</span>, <span class="hljs-string">'interspaced'</span>, <span class="hljs-string">'perforations'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'enable'</span>, <span class="hljs-string">'smoker'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'separate'</span>, <span class="hljs-string">'unburned'</span>, <span class="hljs-string">'section'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'future'</span>, <span class="hljs-string">'smoking.'</span>, <span class="hljs-string">'«short'</span>, <span class="hljs-string">'cigarette,'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'section'</span>, <span class="hljs-string">'30'</span>, <span class="hljs-string">'mm.'</span>, <span class="hljs-string">'«extremely'</span>, <span class="hljs-string">'fast'</span>, <span class="hljs-string">'buming'</span>, <span class="hljs-string">'cigarette.'</span>, <span class="hljs-string">'«novel'</span>, <span class="hljs-string">'cigarette'</span>, <span class="hljs-string">'constructions'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'permit'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'significant'</span>, <span class="hljs-string">'reduction'</span>, <span class="hljs-string">'iretobacco'</span>, <span class="hljs-string">'weight'</span>, <span class="hljs-string">'while'</span>, <span class="hljs-string">'maintaining'</span>, <span class="hljs-string">'smoking'</span>, <span class="hljs-string">'mechanics'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'visual'</span>, <span class="hljs-string">'characteristics.'</span>, <span class="hljs-string">'higher'</span>, <span class="hljs-string">'basis'</span>, <span class="hljs-string">'weight'</span>, <span class="hljs-string">'paper:'</span>, <span class="hljs-string">'potential'</span>, <span class="hljs-string">'reduction'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'weight.'</span>, <span class="hljs-string">'«more'</span>, <span class="hljs-string">'rigid'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'column;'</span>, <span class="hljs-string">'stiffing'</span>, <span class="hljs-string">'agent'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'tobacco;'</span>, <span class="hljs-string">'e.g.'</span>, <span class="hljs-string">'starch'</span>, <span class="hljs-string">'*colored'</span>, <span class="hljs-string">'tow'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'cigarette'</span>, <span class="hljs-string">'papers;'</span>, <span class="hljs-string">'seasonal'</span>, <span class="hljs-string">'promotions,'</span>, <span class="hljs-string">'e.g.'</span>, <span class="hljs-string">'pastel'</span>, <span class="hljs-string">'colored'</span>, <span class="hljs-string">'cigarettes'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'easter'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'an'</span>, <span class="hljs-string">'ebony'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'ivory'</span>, <span class="hljs-string">'brand'</span>, <span class="hljs-string">'containing'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'mixture'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'all'</span>, <span class="hljs-string">'black'</span>, <span class="hljs-string">'(black'</span>, <span class="hljs-string">'paper'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'tow)'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'ail'</span>, <span class="hljs-string">'white'</span>, <span class="hljs-string">'cigarettes.'</span>, <span class="hljs-string">'499150498'</span>] | |
| Answer: T.F. Riehl | |
| start_index <span class="hljs-number">17</span> | |
| end_index <span class="hljs-number">18</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-19lp6r8">Once examples are encoded, however, they will look like this:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>encoding = tokenizer(example[<span class="hljs-string">"question"</span>], example[<span class="hljs-string">"words"</span>], example[<span class="hljs-string">"boxes"</span>]) | |
| <span class="hljs-meta">>>> </span>tokenizer.decode(encoding[<span class="hljs-string">"input_ids"</span>]) | |
| [CLS] who <span class="hljs-keyword">is</span> <span class="hljs-keyword">in</span> cc <span class="hljs-keyword">in</span> this letter? [SEP] wie baw brown & williamson tobacco corporation research & development ...<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1tk94l">We’ll need to find the position of the answer in the encoded input.</p> <ul data-svelte-h="svelte-zfehno"><li><code>token_type_ids</code> tells us which tokens are part of the question, and which ones are part of the document’s words.</li> <li><code>tokenizer.cls_token_id</code> will help find the special token at the beginning of the input.</li> <li><code>word_ids</code> will help match the answer found in the original <code>words</code> to the same answer in the full encoded input and determine | |
| the start/end position of the answer in the encoded input.</li></ul> <p data-svelte-h="svelte-701rvg">With that in mind, let’s create a function to encode a batch of examples in the dataset:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">encode_dataset</span>(<span class="hljs-params">examples, max_length=<span class="hljs-number">512</span></span>): | |
| <span class="hljs-meta">... </span> questions = examples[<span class="hljs-string">"question"</span>] | |
| <span class="hljs-meta">... </span> words = examples[<span class="hljs-string">"words"</span>] | |
| <span class="hljs-meta">... </span> boxes = examples[<span class="hljs-string">"boxes"</span>] | |
| <span class="hljs-meta">... </span> answers = examples[<span class="hljs-string">"answer"</span>] | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># encode the batch of examples and initialize the start_positions and end_positions</span> | |
| <span class="hljs-meta">... </span> encoding = tokenizer(questions, words, boxes, max_length=max_length, padding=<span class="hljs-string">"max_length"</span>, truncation=<span class="hljs-literal">True</span>) | |
| <span class="hljs-meta">... </span> start_positions = [] | |
| <span class="hljs-meta">... </span> end_positions = [] | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># loop through the examples in the batch</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(questions)): | |
| <span class="hljs-meta">... </span> cls_index = encoding[<span class="hljs-string">"input_ids"</span>][i].index(tokenizer.cls_token_id) | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># find the position of the answer in example's words</span> | |
| <span class="hljs-meta">... </span> words_example = [word.lower() <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> words[i]] | |
| <span class="hljs-meta">... </span> answer = answers[i] | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">match</span>, word_idx_start, word_idx_end = subfinder(words_example, answer.lower().split()) | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">match</span>: | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># if match is found, use `token_type_ids` to find where words start in the encoding</span> | |
| <span class="hljs-meta">... </span> token_type_ids = encoding[<span class="hljs-string">"token_type_ids"</span>][i] | |
| <span class="hljs-meta">... </span> token_start_index = <span class="hljs-number">0</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">while</span> token_type_ids[token_start_index] != <span class="hljs-number">1</span>: | |
| <span class="hljs-meta">... </span> token_start_index += <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> token_end_index = <span class="hljs-built_in">len</span>(encoding[<span class="hljs-string">"input_ids"</span>][i]) - <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">while</span> token_type_ids[token_end_index] != <span class="hljs-number">1</span>: | |
| <span class="hljs-meta">... </span> token_end_index -= <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> word_ids = encoding.word_ids(i)[token_start_index : token_end_index + <span class="hljs-number">1</span>] | |
| <span class="hljs-meta">... </span> start_position = cls_index | |
| <span class="hljs-meta">... </span> end_position = cls_index | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># loop over word_ids and increase `token_start_index` until it matches the answer position in words</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># once it matches, save the `token_start_index` as the `start_position` of the answer in the encoding</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">id</span> <span class="hljs-keyword">in</span> word_ids: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-built_in">id</span> == word_idx_start: | |
| <span class="hljs-meta">... </span> start_position = token_start_index | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> token_start_index += <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># similarly loop over `word_ids` starting from the end to find the `end_position` of the answer</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">id</span> <span class="hljs-keyword">in</span> word_ids[::-<span class="hljs-number">1</span>]: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-built_in">id</span> == word_idx_end: | |
| <span class="hljs-meta">... </span> end_position = token_end_index | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> token_end_index -= <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> start_positions.append(start_position) | |
| <span class="hljs-meta">... </span> end_positions.append(end_position) | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> start_positions.append(cls_index) | |
| <span class="hljs-meta">... </span> end_positions.append(cls_index) | |
| <span class="hljs-meta">... </span> encoding[<span class="hljs-string">"image"</span>] = examples[<span class="hljs-string">"image"</span>] | |
| <span class="hljs-meta">... </span> encoding[<span class="hljs-string">"start_positions"</span>] = start_positions | |
| <span class="hljs-meta">... </span> encoding[<span class="hljs-string">"end_positions"</span>] = end_positions | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> encoding<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1ori799">Now that we have this preprocessing function, we can encode the entire dataset:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>encoded_train_dataset = dataset_with_ocr[<span class="hljs-string">"train"</span>].<span class="hljs-built_in">map</span>( | |
| <span class="hljs-meta">... </span> encode_dataset, batched=<span class="hljs-literal">True</span>, batch_size=<span class="hljs-number">2</span>, remove_columns=dataset_with_ocr[<span class="hljs-string">"train"</span>].column_names | |
| <span class="hljs-meta">... </span>) | |
| <span class="hljs-meta">>>> </span>encoded_test_dataset = dataset_with_ocr[<span class="hljs-string">"test"</span>].<span class="hljs-built_in">map</span>( | |
| <span class="hljs-meta">... </span> encode_dataset, batched=<span class="hljs-literal">True</span>, batch_size=<span class="hljs-number">2</span>, remove_columns=dataset_with_ocr[<span class="hljs-string">"test"</span>].column_names | |
| <span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-upxsp">Let’s check what the features of the encoded dataset look like:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>encoded_train_dataset.features | |
| {<span class="hljs-string">'image'</span>: <span class="hljs-type">Sequence</span>(feature=<span class="hljs-type">Sequence</span>(feature=<span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'uint8'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'input_ids'</span>: <span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int32'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'token_type_ids'</span>: <span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int8'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'attention_mask'</span>: <span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int8'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'bbox'</span>: <span class="hljs-type">Sequence</span>(feature=<span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int64'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'start_positions'</span>: Value(dtype=<span class="hljs-string">'int64'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'end_positions'</span>: Value(dtype=<span class="hljs-string">'int64'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>)}<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="evaluation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#evaluation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Evaluation</span></h2> <p data-svelte-h="svelte-1ro13gx">Evaluation for document question answering requires a significant amount of postprocessing. To avoid taking up too much | |
| of your time, this guide skips the evaluation step. The <a href="/docs/transformers/main/en/main_classes/trainer#transformers.Trainer">Trainer</a> still calculates the evaluation loss during training so | |
| you’re not completely in the dark about your model’s performance. Extractive question answering is typically evaluated using F1/exact match. | |
| If you’d like to implement it yourself, check out the <a href="https://huggingface.co/course/chapter7/7?fw=pt#postprocessing" rel="nofollow">Question Answering chapter</a> | |
| of the Hugging Face course for inspiration.</p> <h2 class="relative group"><a id="train" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#train"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Train</span></h2> <p data-svelte-h="svelte-10f6ay">Congratulations! You’ve successfully navigated the toughest part of this guide and now you are ready to train your own model. | |
| Training involves the following steps:</p> <ul data-svelte-h="svelte-pmt07p"><li>Load the model with <a href="/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForDocumentQuestionAnswering">AutoModelForDocumentQuestionAnswering</a> using the same checkpoint as in the preprocessing.</li> <li>Define your training hyperparameters in <a href="/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments">TrainingArguments</a>.</li> <li>Define a function to batch examples together, here the <a href="/docs/transformers/main/en/main_classes/data_collator#transformers.DefaultDataCollator">DefaultDataCollator</a> will do just fine</li> <li>Pass the training arguments to <a href="/docs/transformers/main/en/main_classes/trainer#transformers.Trainer">Trainer</a> along with the model, dataset, and data collator.</li> <li>Call <a href="/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train">train()</a> to finetune your model.</li></ul> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForDocumentQuestionAnswering | |
| <span class="hljs-meta">>>> </span>model = AutoModelForDocumentQuestionAnswering.from_pretrained(model_checkpoint)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-b9l6b1">In the <a href="/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments">TrainingArguments</a> use <code>output_dir</code> to specify where to save your model, and configure hyperparameters as you see fit. | |
| If you wish to share your model with the community, set <code>push_to_hub</code> to <code>True</code> (you must be signed in to Hugging Face to upload your model). | |
| In this case the <code>output_dir</code> will also be the name of the repo where your model checkpoint will be pushed.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments | |
| <span class="hljs-meta">>>> </span><span class="hljs-comment"># REPLACE THIS WITH YOUR REPO ID</span> | |
| <span class="hljs-meta">>>> </span>repo_id = <span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span> | |
| <span class="hljs-meta">>>> </span>training_args = TrainingArguments( | |
| <span class="hljs-meta">... </span> output_dir=repo_id, | |
| <span class="hljs-meta">... </span> per_device_train_batch_size=<span class="hljs-number">4</span>, | |
| <span class="hljs-meta">... </span> num_train_epochs=<span class="hljs-number">20</span>, | |
| <span class="hljs-meta">... </span> save_steps=<span class="hljs-number">200</span>, | |
| <span class="hljs-meta">... </span> logging_steps=<span class="hljs-number">50</span>, | |
| <span class="hljs-meta">... </span> eval_strategy=<span class="hljs-string">"steps"</span>, | |
| <span class="hljs-meta">... </span> learning_rate=<span class="hljs-number">5e-5</span>, | |
| <span class="hljs-meta">... </span> save_total_limit=<span class="hljs-number">2</span>, | |
| <span class="hljs-meta">... </span> remove_unused_columns=<span class="hljs-literal">False</span>, | |
| <span class="hljs-meta">... </span> push_to_hub=<span class="hljs-literal">True</span>, | |
| <span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1gq6t8w">Define a simple data collator to batch examples together.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> DefaultDataCollator | |
| <span class="hljs-meta">>>> </span>data_collator = DefaultDataCollator()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-15i5vaz">Finally, bring everything together, and call <a href="/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train">train()</a>:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer | |
| <span class="hljs-meta">>>> </span>trainer = Trainer( | |
| <span class="hljs-meta">... </span> model=model, | |
| <span class="hljs-meta">... </span> args=training_args, | |
| <span class="hljs-meta">... </span> data_collator=data_collator, | |
| <span class="hljs-meta">... </span> train_dataset=encoded_train_dataset, | |
| <span class="hljs-meta">... </span> eval_dataset=encoded_test_dataset, | |
| <span class="hljs-meta">... </span> tokenizer=processor, | |
| <span class="hljs-meta">... </span>) | |
| <span class="hljs-meta">>>> </span>trainer.train()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-gilssp">To add the final model to 🤗 Hub, create a model card and call <code>push_to_hub</code>:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>trainer.create_model_card() | |
| <span class="hljs-meta">>>> </span>trainer.push_to_hub()<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="inference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#inference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Inference</span></h2> <p data-svelte-h="svelte-1pszhza">Now that you have finetuned a LayoutLMv2 model, and uploaded it to the 🤗 Hub, you can use it for inference. The simplest | |
| way to try out your finetuned model for inference is to use it in a <a href="/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline">Pipeline</a>.</p> <p data-svelte-h="svelte-1wtngfz">Let’s take an example:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>example = dataset[<span class="hljs-string">"test"</span>][<span class="hljs-number">2</span>] | |
| <span class="hljs-meta">>>> </span>question = example[<span class="hljs-string">"query"</span>][<span class="hljs-string">"en"</span>] | |
| <span class="hljs-meta">>>> </span>image = example[<span class="hljs-string">"image"</span>] | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(question) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(example[<span class="hljs-string">"answers"</span>]) | |
| <span class="hljs-string">'Who is ‘presiding’ TRRF GENERAL SESSION (PART 1)?'</span> | |
| [<span class="hljs-string">'TRRF Vice President'</span>, <span class="hljs-string">'lee a. waller'</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-mcchgg">Next, instantiate a pipeline for | |
| document question answering with your model, and pass the image + question combination to it.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline | |
| <span class="hljs-meta">>>> </span>qa_pipeline = pipeline(<span class="hljs-string">"document-question-answering"</span>, model=<span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span>) | |
| <span class="hljs-meta">>>> </span>qa_pipeline(image, question) | |
| [{<span class="hljs-string">'score'</span>: <span class="hljs-number">0.9949808120727539</span>, | |
| <span class="hljs-string">'answer'</span>: <span class="hljs-string">'Lee A. Waller'</span>, | |
| <span class="hljs-string">'start'</span>: <span class="hljs-number">55</span>, | |
| <span class="hljs-string">'end'</span>: <span class="hljs-number">57</span>}]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-o6117l">You can also manually replicate the results of the pipeline if you’d like:</p> <ol data-svelte-h="svelte-19rdijs"><li>Take an image and a question, prepare them for the model using the processor from your model.</li> <li>Forward the result or preprocessing through the model.</li> <li>The model returns <code>start_logits</code> and <code>end_logits</code>, which indicate which token is at the start of the answer and | |
| which token is at the end of the answer. Both have shape (batch_size, sequence_length).</li> <li>Take an argmax on the last dimension of both the <code>start_logits</code> and <code>end_logits</code> to get the predicted <code>start_idx</code> and <code>end_idx</code>.</li> <li>Decode the answer with the tokenizer.</li></ol> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">import</span> torch | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForDocumentQuestionAnswering | |
| <span class="hljs-meta">>>> </span>processor = AutoProcessor.from_pretrained(<span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span>) | |
| <span class="hljs-meta">>>> </span>model = AutoModelForDocumentQuestionAnswering.from_pretrained(<span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span>) | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">with</span> torch.no_grad(): | |
| <span class="hljs-meta">... </span> encoding = processor(image.convert(<span class="hljs-string">"RGB"</span>), question, return_tensors=<span class="hljs-string">"pt"</span>) | |
| <span class="hljs-meta">... </span> outputs = model(**encoding) | |
| <span class="hljs-meta">... </span> start_logits = outputs.start_logits | |
| <span class="hljs-meta">... </span> end_logits = outputs.end_logits | |
| <span class="hljs-meta">... </span> predicted_start_idx = start_logits.argmax(-<span class="hljs-number">1</span>).item() | |
| <span class="hljs-meta">... </span> predicted_end_idx = end_logits.argmax(-<span class="hljs-number">1</span>).item() | |
| <span class="hljs-meta">>>> </span>processor.tokenizer.decode(encoding.input_ids.squeeze()[predicted_start_idx : predicted_end_idx + <span class="hljs-number">1</span>]) | |
| <span class="hljs-string">'lee a. waller'</span><!-- HTML_TAG_END --></pre></div> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/en/tasks/document_question_answering.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1xexzbk = { | |
| assets: "/docs/transformers/main/en", | |
| base: "/docs/transformers/main/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/transformers/main/en/_app/immutable/entry/start.2135b7e6.js"), | |
| import("/docs/transformers/main/en/_app/immutable/entry/app.24372c84.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 401], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 107 kB
- Xet hash:
- fe7560bad558fa4b8eb4f75f20d05fbe49558dd2027a7b4cd281b3e9402519cc
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.