Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Document Question Answering","local":"document-question-answering","sections":[{"title":"Load the data","local":"load-the-data","sections":[],"depth":2},{"title":"Preprocess the data","local":"preprocess-the-data","sections":[{"title":"Preprocessing document images","local":"preprocessing-document-images","sections":[],"depth":3},{"title":"Preprocessing text data","local":"preprocessing-text-data","sections":[],"depth":3}],"depth":2},{"title":"Evaluation","local":"evaluation","sections":[],"depth":2},{"title":"Train","local":"train","sections":[],"depth":2},{"title":"Inference","local":"inference","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/transformers/main/ja/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/entry/start.1486e459.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/scheduler.9bc65507.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/singletons.eee55cbf.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/index.3b203c72.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/paths.59da1547.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/entry/app.d9ae818f.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/index.707bf1b6.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/nodes/0.c06aa070.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/nodes/137.d8e38c96.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/Tip.c2ecdbf4.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/CodeBlock.54a9f38d.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/DocNotebookDropdown.41f65cb5.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/globals.7f7f1b26.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/EditOnGithub.922df6ba.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Document Question Answering","local":"document-question-answering","sections":[{"title":"Load the data","local":"load-the-data","sections":[],"depth":2},{"title":"Preprocess the data","local":"preprocess-the-data","sections":[{"title":"Preprocessing document images","local":"preprocessing-document-images","sections":[],"depth":3},{"title":"Preprocessing text data","local":"preprocessing-text-data","sections":[],"depth":3}],"depth":2},{"title":"Evaluation","local":"evaluation","sections":[],"depth":2},{"title":"Train","local":"train","sections":[],"depth":2},{"title":"Inference","local":"inference","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="document-question-answering" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#document-question-answering"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Document Question Answering</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <p data-svelte-h="svelte-9sm3vo">文書による質問応答は、文書による視覚的な質問応答とも呼ばれ、以下を提供するタスクです。 | |
| ドキュメント画像に関する質問への回答。このタスクをサポートするモデルへの入力は通常、画像と画像の組み合わせです。 | |
| 質問があり、出力は自然言語で表現された回答です。これらのモデルは、以下を含む複数のモダリティを利用します。 | |
| テキスト、単語の位置 (境界ボックス)、および画像自体。</p> <p data-svelte-h="svelte-w5jzhi">このガイドでは、次の方法を説明します。</p> <ul data-svelte-h="svelte-15ewf3"><li><a href="https://huggingface.co/datasets/nielsr/docvqa_1200_examples_donut" rel="nofollow">DocVQA データセット</a> の <a href="../model_doc/layoutlmv2">LayoutLMv2</a> を微調整します。</li> <li>微調整されたモデルを推論に使用します。</li></ul> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1awcuot">このタスクと互換性のあるすべてのアーキテクチャとチェックポイントを確認するには、<a href="https://huggingface.co/tasks/image-to-text" rel="nofollow">タスクページ</a> を確認することをお勧めします。</p></div> <p data-svelte-h="svelte-1sjno3o">LayoutLMv2 は、最後の非表示のヘッダーの上に質問応答ヘッドを追加することで、ドキュメントの質問応答タスクを解決します。 | |
| トークンの状態を調べて、トークンの開始トークンと終了トークンの位置を予測します。 | |
| 答え。言い換えれば、問題は抽出的質問応答として扱われます。つまり、コンテキストを考慮して、どの部分を抽出するかということです。 | |
| の情報が質問に答えます。コンテキストは OCR エンジンの出力から取得されます。ここでは Google の Tesseract です。</p> <p data-svelte-h="svelte-ff3iti">始める前に、必要なライブラリがすべてインストールされていることを確認してください。 LayoutLMv2 は detectron2、torchvision、tesseract に依存します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install -q transformers datasets<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install <span class="hljs-string">'git+https://github.com/facebookresearch/detectron2.git'</span> | |
| pip install torchvision<!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sudo apt install tesseract-ocr | |
| pip install -q pytesseract<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1cl2ydk">すべての依存関係をインストールしたら、ランタイムを再起動します。</p> <p data-svelte-h="svelte-1hem303">モデルをコミュニティと共有することをお勧めします。 Hugging Face アカウントにログインして、🤗 ハブにアップロードします。 | |
| プロンプトが表示されたら、トークンを入力してログインします。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> notebook_login | |
| <span class="hljs-meta">>>> </span>notebook_login()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-wa8qzt">いくつかのグローバル変数を定義しましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>model_checkpoint = <span class="hljs-string">"microsoft/layoutlmv2-base-uncased"</span> | |
| <span class="hljs-meta">>>> </span>batch_size = <span class="hljs-number">4</span><!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="load-the-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#load-the-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Load the data</span></h2> <p data-svelte-h="svelte-lcmakh">このガイドでは、🤗 Hub にある前処理された DocVQA の小さなサンプルを使用します。フルに使いたい場合は、 | |
| DocVQA データセットは、<a href="https://rrc.cvc.uab.es/?ch=17" rel="nofollow">DocVQA ホームページ</a> で登録してダウンロードできます。そうすれば、 | |
| このガイドを進めて、<a href="https://huggingface.co/docs/datasets/loading#local-and-remote-files" rel="nofollow">🤗 データセットにファイルをロードする方法</a> を確認してください。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| <span class="hljs-meta">>>> </span>dataset = load_dataset(<span class="hljs-string">"nielsr/docvqa_1200_examples"</span>) | |
| <span class="hljs-meta">>>> </span>dataset | |
| DatasetDict({ | |
| train: Dataset({ | |
| features: [<span class="hljs-string">'id'</span>, <span class="hljs-string">'image'</span>, <span class="hljs-string">'query'</span>, <span class="hljs-string">'answers'</span>, <span class="hljs-string">'words'</span>, <span class="hljs-string">'bounding_boxes'</span>, <span class="hljs-string">'answer'</span>], | |
| num_rows: <span class="hljs-number">1000</span> | |
| }) | |
| test: Dataset({ | |
| features: [<span class="hljs-string">'id'</span>, <span class="hljs-string">'image'</span>, <span class="hljs-string">'query'</span>, <span class="hljs-string">'answers'</span>, <span class="hljs-string">'words'</span>, <span class="hljs-string">'bounding_boxes'</span>, <span class="hljs-string">'answer'</span>], | |
| num_rows: <span class="hljs-number">200</span> | |
| }) | |
| })<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-zebss4">ご覧のとおり、データセットはすでにトレーニング セットとテスト セットに分割されています。理解するためにランダムな例を見てみましょう | |
| 機能を備えた自分自身。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset[<span class="hljs-string">"train"</span>].features<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-s4dlp7">個々のフィールドが表す内容は次のとおりです。</p> <ul data-svelte-h="svelte-juyyji"><li><code>id</code>: サンプルのID</li> <li><code>image</code>: ドキュメント画像を含む PIL.Image.Image オブジェクト</li> <li><code>query</code>: 質問文字列 - いくつかの言語での自然言語による質問</li> <li><code>answers</code>: ヒューマン アノテーターによって提供された正解のリスト</li> <li><code>words</code> と <code>bounding_boxes</code>: OCR の結果。ここでは使用しません。</li> <li><code>answer</code>: 別のモデルと一致する答え。ここでは使用しません。</li></ul> <p data-svelte-h="svelte-16q3if0">英語の質問だけを残し、別のモデルによる予測が含まれていると思われる<code>answer</code>機能を削除しましょう。 | |
| また、アノテーターによって提供されたセットから最初の回答を取得します。あるいは、ランダムにサンプリングすることもできます。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset = dataset.<span class="hljs-built_in">map</span>(<span class="hljs-keyword">lambda</span> example: {<span class="hljs-string">"question"</span>: example[<span class="hljs-string">"query"</span>][<span class="hljs-string">"en"</span>]}, remove_columns=[<span class="hljs-string">"query"</span>]) | |
| <span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.<span class="hljs-built_in">map</span>( | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">lambda</span> example: {<span class="hljs-string">"answer"</span>: example[<span class="hljs-string">"answers"</span>][<span class="hljs-number">0</span>]}, remove_columns=[<span class="hljs-string">"answer"</span>, <span class="hljs-string">"answers"</span>] | |
| <span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1d7hjyv">このガイドで使用する LayoutLMv2 チェックポイントは、<code>max_position_embeddings = 512</code> でトレーニングされていることに注意してください ( | |
| この情報は、<a href="https://huggingface.co/microsoft/layoutlmv2-base-uncased/blob/main/config.json#L18" rel="nofollow">チェックポイントの <code>config.json</code> ファイル</a>) で見つけてください。 | |
| 例を省略することもできますが、答えが大きな文書の最後にあり、結局省略されてしまうという状況を避けるために、 | |
| ここでは、埋め込みが 512 を超える可能性があるいくつかの例を削除します。 | |
| データセット内のほとんどのドキュメントが長い場合は、スライディング ウィンドウ戦略を実装できます。詳細については、<a href="https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb" rel="nofollow">このノートブック</a> を確認してください。 。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.<span class="hljs-built_in">filter</span>(<span class="hljs-keyword">lambda</span> x: <span class="hljs-built_in">len</span>(x[<span class="hljs-string">"words"</span>]) + <span class="hljs-built_in">len</span>(x[<span class="hljs-string">"question"</span>].split()) < <span class="hljs-number">512</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-rz8h2y">この時点で、このデータセットから OCR 機能も削除しましょう。これらは、異なるデータを微調整するための OCR の結果です。 | |
| モデル。これらは入力要件と一致しないため、使用したい場合はさらに処理が必要になります。 | |
| このガイドで使用するモデルの。代わりに、OCR と OCR の両方の元のデータに対して <code>LayoutLMv2Processor</code> を使用できます。 | |
| トークン化。このようにして、モデルの予想される入力と一致する入力を取得します。画像を手動で加工したい場合は、 | |
| モデルがどのような入力形式を想定しているかを知るには、<a href="../model_doc/layoutlmv2"><code>LayoutLMv2</code> モデルのドキュメント</a> を確認してください。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.remove_columns(<span class="hljs-string">"words"</span>) | |
| <span class="hljs-meta">>>> </span>updated_dataset = updated_dataset.remove_columns(<span class="hljs-string">"bounding_boxes"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-bwfbhc">最後に、画像サンプルを確認しないとデータ探索は完了しません。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>updated_dataset[<span class="hljs-string">"train"</span>][<span class="hljs-number">11</span>][<span class="hljs-string">"image"</span>]<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-q63tj1"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/docvqa_example.jpg" alt="DocVQA Image Example"></div> <h2 class="relative group"><a id="preprocess-the-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocess-the-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preprocess the data</span></h2> <p data-svelte-h="svelte-3lfv7v">文書の質問に答えるタスクはマルチモーダル タスクであるため、各モダリティからの入力が確実に行われるようにする必要があります。 | |
| モデルの期待に従って前処理されます。まず、<code>LayoutLMv2Processor</code> をロードします。これは、画像データを処理できる画像プロセッサとテキスト データをエンコードできるトークナイザーを内部で組み合わせています。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor | |
| <span class="hljs-meta">>>> </span>processor = AutoProcessor.from_pretrained(model_checkpoint)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="preprocessing-document-images" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocessing-document-images"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preprocessing document images</span></h3> <p data-svelte-h="svelte-q3eyh4">まず、プロセッサからの <code>image_processor</code> を利用して、モデルのドキュメント画像を準備しましょう。 | |
| デフォルトでは、画像プロセッサは画像のサイズを 224x224 に変更し、カラー チャネルの順序が正しいことを確認します。 | |
| tesseract を使用して OCR を適用し、単語と正規化された境界ボックスを取得します。このチュートリアルでは、これらのデフォルトはすべて、まさに必要なものです。 | |
| デフォルトの画像処理を画像のバッチに適用し、OCR の結果を返す関数を作成します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>image_processor = processor.image_processor | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">get_ocr_words_and_boxes</span>(<span class="hljs-params">examples</span>): | |
| <span class="hljs-meta">... </span> images = [image.convert(<span class="hljs-string">"RGB"</span>) <span class="hljs-keyword">for</span> image <span class="hljs-keyword">in</span> examples[<span class="hljs-string">"image"</span>]] | |
| <span class="hljs-meta">... </span> encoded_inputs = image_processor(images) | |
| <span class="hljs-meta">... </span> examples[<span class="hljs-string">"image"</span>] = encoded_inputs.pixel_values | |
| <span class="hljs-meta">... </span> examples[<span class="hljs-string">"words"</span>] = encoded_inputs.words | |
| <span class="hljs-meta">... </span> examples[<span class="hljs-string">"boxes"</span>] = encoded_inputs.boxes | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> examples<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1kcttro">この前処理をデータセット全体に高速に適用するには、<code>map</code> を使用します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>dataset_with_ocr = updated_dataset.<span class="hljs-built_in">map</span>(get_ocr_words_and_boxes, batched=<span class="hljs-literal">True</span>, batch_size=<span class="hljs-number">2</span>)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="preprocessing-text-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocessing-text-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Preprocessing text data</span></h3> <p data-svelte-h="svelte-1pxju63">画像に OCR を適用したら、データセットのテキスト部分をエンコードしてモデル用に準備する必要があります。 | |
| これには、前のステップで取得した単語とボックスをトークンレベルの <code>input_ids</code>、<code>attention_mask</code>、 | |
| <code>token_type_ids</code>と<code>bbox</code>。テキストを前処理するには、プロセッサからの<code>Tokenizer</code>が必要になります。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>tokenizer = processor.tokenizer<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1peqsp4">前述の前処理に加えて、モデルのラベルを追加する必要もあります。 <code>xxxForQuestionAnswering</code> モデルの場合 | |
| 🤗 Transformers では、ラベルは <code>start_positions</code> と <code>end_positions</code> で構成され、どのトークンがその位置にあるかを示します。 | |
| 開始点と、どのトークンが回答の最後にあるか。</p> <p data-svelte-h="svelte-11bgtd7">それから始めましょう。より大きなリスト (単語リスト) 内のサブリスト (単語に分割された回答) を検索できるヘルパー関数を定義します。</p> <p data-svelte-h="svelte-15kxwes">この関数は、<code>words_list</code> と <code>answer_list</code> という 2 つのリストを入力として受け取ります。次に、<code>words_list</code>を反復処理してチェックします。 | |
| <code>words_list</code> (words_list[i]) 内の現在の単語が、answer_list (answer_list[0]) の最初の単語と等しいかどうか、および | |
| 現在の単語から始まり、<code>answer_list</code> と同じ長さの <code>words_list</code> のサブリストは、<code>to answer_list</code> と等しくなります。 | |
| この条件が true の場合、一致が見つかったことを意味し、関数は一致とその開始インデックス (idx) を記録します。 | |
| とその終了インデックス (idx + len(answer_list) - 1)。複数の一致が見つかった場合、関数は最初のもののみを返します。 | |
| 一致するものが見つからない場合、関数は (<code>None</code>、0、および 0) を返します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">subfinder</span>(<span class="hljs-params">words_list, answer_list</span>): | |
| <span class="hljs-meta">... </span> matches = [] | |
| <span class="hljs-meta">... </span> start_indices = [] | |
| <span class="hljs-meta">... </span> end_indices = [] | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> idx, i <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(<span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(words_list))): | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> words_list[i] == answer_list[<span class="hljs-number">0</span>] <span class="hljs-keyword">and</span> words_list[i : i + <span class="hljs-built_in">len</span>(answer_list)] == answer_list: | |
| <span class="hljs-meta">... </span> matches.append(answer_list) | |
| <span class="hljs-meta">... </span> start_indices.append(idx) | |
| <span class="hljs-meta">... </span> end_indices.append(idx + <span class="hljs-built_in">len</span>(answer_list) - <span class="hljs-number">1</span>) | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> matches: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> matches[<span class="hljs-number">0</span>], start_indices[<span class="hljs-number">0</span>], end_indices[<span class="hljs-number">0</span>] | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1vrcdhg">この関数が答えの位置を見つける方法を説明するために、例で使用してみましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>example = dataset_with_ocr[<span class="hljs-string">"train"</span>][<span class="hljs-number">1</span>] | |
| <span class="hljs-meta">>>> </span>words = [word.lower() <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> example[<span class="hljs-string">"words"</span>]] | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">match</span>, word_idx_start, word_idx_end = subfinder(words, example[<span class="hljs-string">"answer"</span>].lower().split()) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"Question: "</span>, example[<span class="hljs-string">"question"</span>]) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"Words:"</span>, words) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"Answer: "</span>, example[<span class="hljs-string">"answer"</span>]) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"start_index"</span>, word_idx_start) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(<span class="hljs-string">"end_index"</span>, word_idx_end) | |
| Question: Who <span class="hljs-keyword">is</span> <span class="hljs-keyword">in</span> cc <span class="hljs-keyword">in</span> this letter? | |
| Words: [<span class="hljs-string">'wie'</span>, <span class="hljs-string">'baw'</span>, <span class="hljs-string">'brown'</span>, <span class="hljs-string">'&'</span>, <span class="hljs-string">'williamson'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'corporation'</span>, <span class="hljs-string">'research'</span>, <span class="hljs-string">'&'</span>, <span class="hljs-string">'development'</span>, <span class="hljs-string">'internal'</span>, <span class="hljs-string">'correspondence'</span>, <span class="hljs-string">'to:'</span>, <span class="hljs-string">'r.'</span>, <span class="hljs-string">'h.'</span>, <span class="hljs-string">'honeycutt'</span>, <span class="hljs-string">'ce:'</span>, <span class="hljs-string">'t.f.'</span>, <span class="hljs-string">'riehl'</span>, <span class="hljs-string">'from:'</span>, <span class="hljs-string">'.'</span>, <span class="hljs-string">'c.j.'</span>, <span class="hljs-string">'cook'</span>, <span class="hljs-string">'date:'</span>, <span class="hljs-string">'may'</span>, <span class="hljs-string">'8,'</span>, <span class="hljs-string">'1995'</span>, <span class="hljs-string">'subject:'</span>, <span class="hljs-string">'review'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'existing'</span>, <span class="hljs-string">'brainstorming'</span>, <span class="hljs-string">'ideas/483'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'major'</span>, <span class="hljs-string">'function'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'product'</span>, <span class="hljs-string">'innovation'</span>, <span class="hljs-string">'graup'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'develop'</span>, <span class="hljs-string">'marketable'</span>, <span class="hljs-string">'nove!'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'would'</span>, <span class="hljs-string">'be'</span>, <span class="hljs-string">'profitable'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'manufacture'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'sell.'</span>, <span class="hljs-string">'novel'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'defined'</span>, <span class="hljs-string">'as:'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'new'</span>, <span class="hljs-string">'kind,'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'different'</span>, <span class="hljs-string">'from'</span>, <span class="hljs-string">'anything'</span>, <span class="hljs-string">'seen'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'known'</span>, <span class="hljs-string">'before.'</span>, <span class="hljs-string">'innovation'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'defined'</span>, <span class="hljs-string">'as:'</span>, <span class="hljs-string">'something'</span>, <span class="hljs-string">'new'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'different'</span>, <span class="hljs-string">'introduced;'</span>, <span class="hljs-string">'act'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'innovating;'</span>, <span class="hljs-string">'introduction'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'new'</span>, <span class="hljs-string">'things'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'methods.'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'may'</span>, <span class="hljs-string">'incorporate'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'latest'</span>, <span class="hljs-string">'technologies,'</span>, <span class="hljs-string">'materials'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'know-how'</span>, <span class="hljs-string">'available'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'give'</span>, <span class="hljs-string">'then'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'unique'</span>, <span class="hljs-string">'taste'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'look.'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'first'</span>, <span class="hljs-string">'task'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'product'</span>, <span class="hljs-string">'innovation'</span>, <span class="hljs-string">'group'</span>, <span class="hljs-string">'was'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'assemble,'</span>, <span class="hljs-string">'review'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'categorize'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'list'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'existing'</span>, <span class="hljs-string">'brainstorming'</span>, <span class="hljs-string">'ideas.'</span>, <span class="hljs-string">'ideas'</span>, <span class="hljs-string">'were'</span>, <span class="hljs-string">'grouped'</span>, <span class="hljs-string">'into'</span>, <span class="hljs-string">'two'</span>, <span class="hljs-string">'major'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'labeled'</span>, <span class="hljs-string">'appearance'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'taste/aroma.'</span>, <span class="hljs-string">'these'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'are'</span>, <span class="hljs-string">'used'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'novel'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'may'</span>, <span class="hljs-string">'differ'</span>, <span class="hljs-string">'from'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'visual'</span>, <span class="hljs-string">'and/or'</span>, <span class="hljs-string">'taste/aroma'</span>, <span class="hljs-string">'point'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'view'</span>, <span class="hljs-string">'compared'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'canventional'</span>, <span class="hljs-string">'cigarettes.'</span>, <span class="hljs-string">'other'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'include'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'combination'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'the'</span>, <span class="hljs-string">'above,'</span>, <span class="hljs-string">'filters,'</span>, <span class="hljs-string">'packaging'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'brand'</span>, <span class="hljs-string">'extensions.'</span>, <span class="hljs-string">'appearance'</span>, <span class="hljs-string">'this'</span>, <span class="hljs-string">'category'</span>, <span class="hljs-string">'is'</span>, <span class="hljs-string">'used'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'novel'</span>, <span class="hljs-string">'cigarette'</span>, <span class="hljs-string">'constructions'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'yield'</span>, <span class="hljs-string">'visually'</span>, <span class="hljs-string">'different'</span>, <span class="hljs-string">'products'</span>, <span class="hljs-string">'with'</span>, <span class="hljs-string">'minimal'</span>, <span class="hljs-string">'changes'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'smoke'</span>, <span class="hljs-string">'chemistry'</span>, <span class="hljs-string">'two'</span>, <span class="hljs-string">'cigarettes'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'cne.'</span>, <span class="hljs-string">'emulti-plug'</span>, <span class="hljs-string">'te'</span>, <span class="hljs-string">'build'</span>, <span class="hljs-string">'yaur'</span>, <span class="hljs-string">'awn'</span>, <span class="hljs-string">'cigarette.'</span>, <span class="hljs-string">'eswitchable'</span>, <span class="hljs-string">'menthol'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'non'</span>, <span class="hljs-string">'menthol'</span>, <span class="hljs-string">'cigarette.'</span>, <span class="hljs-string">'*cigarettes'</span>, <span class="hljs-string">'with'</span>, <span class="hljs-string">'interspaced'</span>, <span class="hljs-string">'perforations'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'enable'</span>, <span class="hljs-string">'smoker'</span>, <span class="hljs-string">'to'</span>, <span class="hljs-string">'separate'</span>, <span class="hljs-string">'unburned'</span>, <span class="hljs-string">'section'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'future'</span>, <span class="hljs-string">'smoking.'</span>, <span class="hljs-string">'«short'</span>, <span class="hljs-string">'cigarette,'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'section'</span>, <span class="hljs-string">'30'</span>, <span class="hljs-string">'mm.'</span>, <span class="hljs-string">'«extremely'</span>, <span class="hljs-string">'fast'</span>, <span class="hljs-string">'buming'</span>, <span class="hljs-string">'cigarette.'</span>, <span class="hljs-string">'«novel'</span>, <span class="hljs-string">'cigarette'</span>, <span class="hljs-string">'constructions'</span>, <span class="hljs-string">'that'</span>, <span class="hljs-string">'permit'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'significant'</span>, <span class="hljs-string">'reduction'</span>, <span class="hljs-string">'iretobacco'</span>, <span class="hljs-string">'weight'</span>, <span class="hljs-string">'while'</span>, <span class="hljs-string">'maintaining'</span>, <span class="hljs-string">'smoking'</span>, <span class="hljs-string">'mechanics'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'visual'</span>, <span class="hljs-string">'characteristics.'</span>, <span class="hljs-string">'higher'</span>, <span class="hljs-string">'basis'</span>, <span class="hljs-string">'weight'</span>, <span class="hljs-string">'paper:'</span>, <span class="hljs-string">'potential'</span>, <span class="hljs-string">'reduction'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'weight.'</span>, <span class="hljs-string">'«more'</span>, <span class="hljs-string">'rigid'</span>, <span class="hljs-string">'tobacco'</span>, <span class="hljs-string">'column;'</span>, <span class="hljs-string">'stiffing'</span>, <span class="hljs-string">'agent'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'tobacco;'</span>, <span class="hljs-string">'e.g.'</span>, <span class="hljs-string">'starch'</span>, <span class="hljs-string">'*colored'</span>, <span class="hljs-string">'tow'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'cigarette'</span>, <span class="hljs-string">'papers;'</span>, <span class="hljs-string">'seasonal'</span>, <span class="hljs-string">'promotions,'</span>, <span class="hljs-string">'e.g.'</span>, <span class="hljs-string">'pastel'</span>, <span class="hljs-string">'colored'</span>, <span class="hljs-string">'cigarettes'</span>, <span class="hljs-string">'for'</span>, <span class="hljs-string">'easter'</span>, <span class="hljs-string">'or'</span>, <span class="hljs-string">'in'</span>, <span class="hljs-string">'an'</span>, <span class="hljs-string">'ebony'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'ivory'</span>, <span class="hljs-string">'brand'</span>, <span class="hljs-string">'containing'</span>, <span class="hljs-string">'a'</span>, <span class="hljs-string">'mixture'</span>, <span class="hljs-string">'of'</span>, <span class="hljs-string">'all'</span>, <span class="hljs-string">'black'</span>, <span class="hljs-string">'(black'</span>, <span class="hljs-string">'paper'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'tow)'</span>, <span class="hljs-string">'and'</span>, <span class="hljs-string">'ail'</span>, <span class="hljs-string">'white'</span>, <span class="hljs-string">'cigarettes.'</span>, <span class="hljs-string">'499150498'</span>] | |
| Answer: T.F. Riehl | |
| start_index <span class="hljs-number">17</span> | |
| end_index <span class="hljs-number">18</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-4vnazu">ただし、サンプルがエンコードされると、次のようになります。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>encoding = tokenizer(example[<span class="hljs-string">"question"</span>], example[<span class="hljs-string">"words"</span>], example[<span class="hljs-string">"boxes"</span>]) | |
| <span class="hljs-meta">>>> </span>tokenizer.decode(encoding[<span class="hljs-string">"input_ids"</span>]) | |
| [CLS] who <span class="hljs-keyword">is</span> <span class="hljs-keyword">in</span> cc <span class="hljs-keyword">in</span> this letter? [SEP] wie baw brown & williamson tobacco corporation research & development ...<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-wbu1a8">エンコードされた入力内で答えの位置を見つける必要があります。</p> <ul data-svelte-h="svelte-2utbgq"><li><code>token_type_ids</code> は、どのトークンが質問の一部であり、どのトークンが文書の単語の一部であるかを示します。</li> <li><code>tokenizer.cls_token_id</code> は、入力の先頭で特別なトークンを見つけるのに役立ちます。</li> <li><code>word_ids</code> は、元の <code>words</code> で見つかった回答を、完全にエンコードされた入力内の同じ回答と照合して判断するのに役立ちます。 | |
| エンコードされた入力内の応答の開始/終了位置。</li></ul> <p data-svelte-h="svelte-1qkn7ne">これを念頭に置いて、データセット内のサンプルのバッチをエンコードする関数を作成しましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">encode_dataset</span>(<span class="hljs-params">examples, max_length=<span class="hljs-number">512</span></span>): | |
| <span class="hljs-meta">... </span> questions = examples[<span class="hljs-string">"question"</span>] | |
| <span class="hljs-meta">... </span> words = examples[<span class="hljs-string">"words"</span>] | |
| <span class="hljs-meta">... </span> boxes = examples[<span class="hljs-string">"boxes"</span>] | |
| <span class="hljs-meta">... </span> answers = examples[<span class="hljs-string">"answer"</span>] | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># encode the batch of examples and initialize the start_positions and end_positions</span> | |
| <span class="hljs-meta">... </span> encoding = tokenizer(questions, words, boxes, max_length=max_length, padding=<span class="hljs-string">"max_length"</span>, truncation=<span class="hljs-literal">True</span>) | |
| <span class="hljs-meta">... </span> start_positions = [] | |
| <span class="hljs-meta">... </span> end_positions = [] | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># loop through the examples in the batch</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(questions)): | |
| <span class="hljs-meta">... </span> cls_index = encoding[<span class="hljs-string">"input_ids"</span>][i].index(tokenizer.cls_token_id) | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># find the position of the answer in example's words</span> | |
| <span class="hljs-meta">... </span> words_example = [word.lower() <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> words[i]] | |
| <span class="hljs-meta">... </span> answer = answers[i] | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">match</span>, word_idx_start, word_idx_end = subfinder(words_example, answer.lower().split()) | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">match</span>: | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># if match is found, use `token_type_ids` to find where words start in the encoding</span> | |
| <span class="hljs-meta">... </span> token_type_ids = encoding[<span class="hljs-string">"token_type_ids"</span>][i] | |
| <span class="hljs-meta">... </span> token_start_index = <span class="hljs-number">0</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">while</span> token_type_ids[token_start_index] != <span class="hljs-number">1</span>: | |
| <span class="hljs-meta">... </span> token_start_index += <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> token_end_index = <span class="hljs-built_in">len</span>(encoding[<span class="hljs-string">"input_ids"</span>][i]) - <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">while</span> token_type_ids[token_end_index] != <span class="hljs-number">1</span>: | |
| <span class="hljs-meta">... </span> token_end_index -= <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> word_ids = encoding.word_ids(i)[token_start_index : token_end_index + <span class="hljs-number">1</span>] | |
| <span class="hljs-meta">... </span> start_position = cls_index | |
| <span class="hljs-meta">... </span> end_position = cls_index | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># loop over word_ids and increase `token_start_index` until it matches the answer position in words</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># once it matches, save the `token_start_index` as the `start_position` of the answer in the encoding</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">id</span> <span class="hljs-keyword">in</span> word_ids: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-built_in">id</span> == word_idx_start: | |
| <span class="hljs-meta">... </span> start_position = token_start_index | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> token_start_index += <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-comment"># similarly loop over `word_ids` starting from the end to find the `end_position` of the answer</span> | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> <span class="hljs-built_in">id</span> <span class="hljs-keyword">in</span> word_ids[::-<span class="hljs-number">1</span>]: | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-built_in">id</span> == word_idx_end: | |
| <span class="hljs-meta">... </span> end_position = token_end_index | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> token_end_index -= <span class="hljs-number">1</span> | |
| <span class="hljs-meta">... </span> start_positions.append(start_position) | |
| <span class="hljs-meta">... </span> end_positions.append(end_position) | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">else</span>: | |
| <span class="hljs-meta">... </span> start_positions.append(cls_index) | |
| <span class="hljs-meta">... </span> end_positions.append(cls_index) | |
| <span class="hljs-meta">... </span> encoding[<span class="hljs-string">"image"</span>] = examples[<span class="hljs-string">"image"</span>] | |
| <span class="hljs-meta">... </span> encoding[<span class="hljs-string">"start_positions"</span>] = start_positions | |
| <span class="hljs-meta">... </span> encoding[<span class="hljs-string">"end_positions"</span>] = end_positions | |
| <span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> encoding<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1bet0pt">この前処理関数が完成したので、データセット全体をエンコードできます。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>encoded_train_dataset = dataset_with_ocr[<span class="hljs-string">"train"</span>].<span class="hljs-built_in">map</span>( | |
| <span class="hljs-meta">... </span> encode_dataset, batched=<span class="hljs-literal">True</span>, batch_size=<span class="hljs-number">2</span>, remove_columns=dataset_with_ocr[<span class="hljs-string">"train"</span>].column_names | |
| <span class="hljs-meta">... </span>) | |
| <span class="hljs-meta">>>> </span>encoded_test_dataset = dataset_with_ocr[<span class="hljs-string">"test"</span>].<span class="hljs-built_in">map</span>( | |
| <span class="hljs-meta">... </span> encode_dataset, batched=<span class="hljs-literal">True</span>, batch_size=<span class="hljs-number">2</span>, remove_columns=dataset_with_ocr[<span class="hljs-string">"test"</span>].column_names | |
| <span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-15kv72x">エンコードされたデータセットの特徴がどのようなものかを確認してみましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>encoded_train_dataset.features | |
| {<span class="hljs-string">'image'</span>: <span class="hljs-type">Sequence</span>(feature=<span class="hljs-type">Sequence</span>(feature=<span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'uint8'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'input_ids'</span>: <span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int32'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'token_type_ids'</span>: <span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int8'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'attention_mask'</span>: <span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int8'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'bbox'</span>: <span class="hljs-type">Sequence</span>(feature=<span class="hljs-type">Sequence</span>(feature=Value(dtype=<span class="hljs-string">'int64'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), length=-<span class="hljs-number">1</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'start_positions'</span>: Value(dtype=<span class="hljs-string">'int64'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>), | |
| <span class="hljs-string">'end_positions'</span>: Value(dtype=<span class="hljs-string">'int64'</span>, <span class="hljs-built_in">id</span>=<span class="hljs-literal">None</span>)}<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="evaluation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#evaluation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Evaluation</span></h2> <p data-svelte-h="svelte-9loxdb">文書の質問回答の評価には、大量の後処理が必要です。過剰摂取を避けるために | |
| 現時点では、このガイドでは評価ステップを省略しています。 <a href="/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer">Trainer</a> はトレーニング中に評価損失を計算するため、 | |
| モデルのパフォーマンスについてまったくわからないわけではありません。抽出的質問応答は通常、F1/完全一致を使用して評価されます。 | |
| 自分で実装したい場合は、<a href="https://huggingface.co/course/chapter7/7?fw=pt#postprocessing" rel="nofollow">質問応答の章</a> を確認してください。 | |
| インスピレーションを得るためにハグフェイスコースの。</p> <h2 class="relative group"><a id="train" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#train"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Train</span></h2> <p data-svelte-h="svelte-1a0t5k3">おめでとう!このガイドの最も難しい部分を無事にナビゲートできたので、独自のモデルをトレーニングする準備が整いました。 | |
| トレーニングには次の手順が含まれます。</p> <ul data-svelte-h="svelte-1jcbhx0"><li>前処理と同じチェックポイントを使用して、<a href="/docs/transformers/main/ja/model_doc/auto#transformers.AutoModelForDocumentQuestionAnswering">AutoModelForDocumentQuestionAnswering</a> でモデルを読み込みます。</li> <li><a href="/docs/transformers/main/ja/main_classes/trainer#transformers.TrainingArguments">TrainingArguments</a> でトレーニング ハイパーパラメータを定義します。</li> <li>サンプルをバッチ処理する関数を定義します。ここでは <code>DefaultDataCollator</code> が適切に機能します。</li> <li>モデル、データセット、データ照合器とともにトレーニング引数を <a href="/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer">Trainer</a> に渡します。</li> <li><a href="/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer.train">train()</a> を呼び出してモデルを微調整します。</li></ul> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForDocumentQuestionAnswering | |
| <span class="hljs-meta">>>> </span>model = AutoModelForDocumentQuestionAnswering.from_pretrained(model_checkpoint)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-zfxfli"><a href="/docs/transformers/main/ja/main_classes/trainer#transformers.TrainingArguments">TrainingArguments</a> で、<code>output_dir</code> を使用してモデルの保存場所を指定し、必要に応じてハイパーパラメーターを構成します。 | |
| モデルをコミュニティと共有したい場合は、<code>push_to_hub</code>を<code>True</code>に設定します (モデルをアップロードするには、Hugging Face にサインインする必要があります)。 | |
| この場合、<code>output_dir</code>はモデルのチェックポイントがプッシュされるリポジトリの名前にもなります。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments | |
| <span class="hljs-meta">>>> </span><span class="hljs-comment"># REPLACE THIS WITH YOUR REPO ID</span> | |
| <span class="hljs-meta">>>> </span>repo_id = <span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span> | |
| <span class="hljs-meta">>>> </span>training_args = TrainingArguments( | |
| <span class="hljs-meta">... </span> output_dir=repo_id, | |
| <span class="hljs-meta">... </span> per_device_train_batch_size=<span class="hljs-number">4</span>, | |
| <span class="hljs-meta">... </span> num_train_epochs=<span class="hljs-number">20</span>, | |
| <span class="hljs-meta">... </span> save_steps=<span class="hljs-number">200</span>, | |
| <span class="hljs-meta">... </span> logging_steps=<span class="hljs-number">50</span>, | |
| <span class="hljs-meta">... </span> eval_strategy=<span class="hljs-string">"steps"</span>, | |
| <span class="hljs-meta">... </span> learning_rate=<span class="hljs-number">5e-5</span>, | |
| <span class="hljs-meta">... </span> save_total_limit=<span class="hljs-number">2</span>, | |
| <span class="hljs-meta">... </span> remove_unused_columns=<span class="hljs-literal">False</span>, | |
| <span class="hljs-meta">... </span> push_to_hub=<span class="hljs-literal">True</span>, | |
| <span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1wu7whv">サンプルをまとめてバッチ処理するための単純なデータ照合器を定義します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> DefaultDataCollator | |
| <span class="hljs-meta">>>> </span>data_collator = DefaultDataCollator()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-432r7p">最後に、すべてをまとめて、<a href="/docs/transformers/main/ja/main_classes/trainer#transformers.Trainer.train">train()</a> を呼び出します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer | |
| <span class="hljs-meta">>>> </span>trainer = Trainer( | |
| <span class="hljs-meta">... </span> model=model, | |
| <span class="hljs-meta">... </span> args=training_args, | |
| <span class="hljs-meta">... </span> data_collator=data_collator, | |
| <span class="hljs-meta">... </span> train_dataset=encoded_train_dataset, | |
| <span class="hljs-meta">... </span> eval_dataset=encoded_test_dataset, | |
| <span class="hljs-meta">... </span> tokenizer=processor, | |
| <span class="hljs-meta">... </span>) | |
| <span class="hljs-meta">>>> </span>trainer.train()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-j0b3yb">最終モデルを 🤗 Hub に追加するには、モデル カードを作成し、<code>push_to_hub</code> を呼び出します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>trainer.create_model_card() | |
| <span class="hljs-meta">>>> </span>trainer.push_to_hub()<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="inference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#inference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Inference</span></h2> <p data-svelte-h="svelte-1d1gv8h">LayoutLMv2 モデルを微調整し、🤗 ハブにアップロードしたので、それを推論に使用できます。もっとも単純な | |
| 推論用に微調整されたモデルを試す方法は、それを <a href="/docs/transformers/main/ja/main_classes/pipelines#transformers.Pipeline">Pipeline</a> で使用することです。</p> <p data-svelte-h="svelte-yqqhpz">例を挙げてみましょう:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>example = dataset[<span class="hljs-string">"test"</span>][<span class="hljs-number">2</span>] | |
| <span class="hljs-meta">>>> </span>question = example[<span class="hljs-string">"query"</span>][<span class="hljs-string">"en"</span>] | |
| <span class="hljs-meta">>>> </span>image = example[<span class="hljs-string">"image"</span>] | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(question) | |
| <span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(example[<span class="hljs-string">"answers"</span>]) | |
| <span class="hljs-string">'Who is ‘presiding’ TRRF GENERAL SESSION (PART 1)?'</span> | |
| [<span class="hljs-string">'TRRF Vice President'</span>, <span class="hljs-string">'lee a. waller'</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1vyzmgs">次に、パイプラインをインスタンス化します。 | |
| モデルを使用して質問への回答を文書化し、画像と質問の組み合わせをモデルに渡します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline | |
| <span class="hljs-meta">>>> </span>qa_pipeline = pipeline(<span class="hljs-string">"document-question-answering"</span>, model=<span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span>) | |
| <span class="hljs-meta">>>> </span>qa_pipeline(image, question) | |
| [{<span class="hljs-string">'score'</span>: <span class="hljs-number">0.9949808120727539</span>, | |
| <span class="hljs-string">'answer'</span>: <span class="hljs-string">'Lee A. Waller'</span>, | |
| <span class="hljs-string">'start'</span>: <span class="hljs-number">55</span>, | |
| <span class="hljs-string">'end'</span>: <span class="hljs-number">57</span>}]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-19p6rx9">必要に応じて、パイプラインの結果を手動で複製することもできます。</p> <ol data-svelte-h="svelte-157yub6"><li>画像と質問を取得し、モデルのプロセッサを使用してモデル用に準備します。</li> <li>モデルを通じて結果または前処理を転送します。</li> <li>モデルは<code>start_logits</code>と<code>end_logits</code>を返します。これらは、どのトークンが応答の先頭にあるのかを示し、 | |
| どのトークンが回答の最後にありますか。どちらも形状 (batch_size、sequence_length) を持ちます。</li> <li><code>start_logits</code> と <code>end_logits</code> の両方の最後の次元で argmax を取得し、予測される <code>start_idx</code> と <code>end_idx</code> を取得します。</li> <li>トークナイザーを使用して回答をデコードします。</li></ol> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">import</span> torch | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForDocumentQuestionAnswering | |
| <span class="hljs-meta">>>> </span>processor = AutoProcessor.from_pretrained(<span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span>) | |
| <span class="hljs-meta">>>> </span>model = AutoModelForDocumentQuestionAnswering.from_pretrained(<span class="hljs-string">"MariaK/layoutlmv2-base-uncased_finetuned_docvqa"</span>) | |
| <span class="hljs-meta">>>> </span><span class="hljs-keyword">with</span> torch.no_grad(): | |
| <span class="hljs-meta">... </span> encoding = processor(image.convert(<span class="hljs-string">"RGB"</span>), question, return_tensors=<span class="hljs-string">"pt"</span>) | |
| <span class="hljs-meta">... </span> outputs = model(**encoding) | |
| <span class="hljs-meta">... </span> start_logits = outputs.start_logits | |
| <span class="hljs-meta">... </span> end_logits = outputs.end_logits | |
| <span class="hljs-meta">... </span> predicted_start_idx = start_logits.argmax(-<span class="hljs-number">1</span>).item() | |
| <span class="hljs-meta">... </span> predicted_end_idx = end_logits.argmax(-<span class="hljs-number">1</span>).item() | |
| <span class="hljs-meta">>>> </span>processor.tokenizer.decode(encoding.input_ids.squeeze()[predicted_start_idx : predicted_end_idx + <span class="hljs-number">1</span>]) | |
| <span class="hljs-string">'lee a. waller'</span><!-- HTML_TAG_END --></pre></div> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/ja/tasks/document_question_answering.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_jement = { | |
| assets: "/docs/transformers/main/ja", | |
| base: "/docs/transformers/main/ja", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/transformers/main/ja/_app/immutable/entry/start.1486e459.js"), | |
| import("/docs/transformers/main/ja/_app/immutable/entry/app.d9ae818f.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 137], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 110 kB
- Xet hash:
- 44bad1773f96434250afc52d4852f59ddf81b58674a9afdd7f02c3d918193137
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.