Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"အားလုံးကို ပေါင်းစပ်ခြင်း","local":"putting-it-all-together","sections":[{"title":"Special Tokens များ","local":"special-tokens","sections":[],"depth":2},{"title":"အနှစ်ချုပ်: Tokenizer မှ Model ဆီသို့","local":"wrapping-up-from-tokenizer-to-model","sections":[],"depth":2},{"title":"ဝေါဟာရ ရှင်းလင်းချက် (Glossary)","local":"ဝဟရ-ရငလငခက-glossary","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/course/pr_1107/my/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/entry/start.5c6233a8.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/scheduler.0835143d.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/singletons.c8b11329.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/index.1bab75e2.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/paths.e4a366ea.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/entry/app.55586789.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/preload-helper.5f7c8393.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/index.3d7efe79.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/nodes/0.0cec3d6c.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/nodes/19.091dc8db.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/CodeBlock.116ed840.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/CourseFloatingBanner.860ea6e4.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/FrameworkSwitchCourse.ff2bd9ab.js"> | |
| <link rel="modulepreload" href="/docs/course/pr_1107/my/_app/immutable/chunks/MermaidChart.svelte_svelte_type_style_lang.0b02b772.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"အားလုံးကို ပေါင်းစပ်ခြင်း","local":"putting-it-all-together","sections":[{"title":"Special Tokens များ","local":"special-tokens","sections":[],"depth":2},{"title":"အနှစ်ချုပ်: Tokenizer မှ Model ဆီသို့","local":"wrapping-up-from-tokenizer-to-model","sections":[],"depth":2},{"title":"ဝေါဟာရ ရှင်းလင်းချက် (Glossary)","local":"ဝဟရ-ရငလငခက-glossary","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <div class="bg-white leading-none border border-gray-100 rounded-lg flex p-0.5 w-56 text-sm mb-4"><a class="flex justify-center flex-1 py-1.5 px-2.5 focus:outline-none !no-underline rounded-l bg-red-50 dark:bg-transparent text-red-600" href="?fw=pt"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><defs><clipPath id="a"><rect x="3.05" y="0.5" width="25.73" height="31" fill="none"></rect></clipPath></defs><g clip-path="url(#a)"><path d="M24.94,9.51a12.81,12.81,0,0,1,0,18.16,12.68,12.68,0,0,1-18,0,12.81,12.81,0,0,1,0-18.16l9-9V5l-.84.83-6,6a9.58,9.58,0,1,0,13.55,0ZM20.44,9a1.68,1.68,0,1,1,1.67-1.67A1.68,1.68,0,0,1,20.44,9Z" fill="#ee4c2c"></path></g></svg> Pytorch </a><a class="flex justify-center flex-1 py-1.5 px-2.5 focus:outline-none !no-underline rounded-r text-gray-500 filter grayscale" href="?fw=tf"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="0.94em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 274"><path d="M145.726 42.065v42.07l72.861 42.07v-42.07l-72.86-42.07zM0 84.135v42.07l36.43 21.03V105.17L0 84.135zm109.291 21.035l-36.43 21.034v126.2l36.43 21.035v-84.135l36.435 21.035v-42.07l-36.435-21.034V105.17z" fill="#E55B2D"></path><path d="M145.726 42.065L36.43 105.17v42.065l72.861-42.065v42.065l36.435-21.03v-84.14zM255.022 63.1l-36.435 21.035v42.07l36.435-21.035V63.1zm-72.865 84.135l-36.43 21.035v42.07l36.43-21.036v-42.07zm-36.43 63.104l-36.436-21.035v84.135l36.435-21.035V210.34z" fill="#ED8E24"></path><path d="M145.726 0L0 84.135l36.43 21.035l109.296-63.105l72.861 42.07L255.022 63.1L145.726 0zm0 126.204l-36.435 21.03l36.435 21.036l36.43-21.035l-36.43-21.03z" fill="#F8BF3C"></path></svg> TensorFlow </a></div> <h1 class="relative group"><a id="putting-it-all-together" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#putting-it-all-together"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>အားလုံးကို ပေါင်းစပ်ခြင်း</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"><a href="https://discuss.huggingface.co/t/chapter-2-questions" target="_blank"><img alt="Ask a Question" class="!m-0" src="https://img.shields.io/badge/Ask%20a%20question-ffcb4c.svg?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgLTEgMTA0IDEwNiI+PGRlZnM+PHN0eWxlPi5jbHMtMXtmaWxsOiMyMzFmMjA7fS5jbHMtMntmaWxsOiNmZmY5YWU7fS5jbHMtM3tmaWxsOiMwMGFlZWY7fS5jbHMtNHtmaWxsOiMwMGE5NGY7fS5jbHMtNXtmaWxsOiNmMTVkMjI7fS5jbHMtNntmaWxsOiNlMzFiMjM7fTwvc3R5bGU+PC9kZWZzPjx0aXRsZT5EaXNjb3Vyc2VfbG9nbzwvdGl0bGU+PGcgaWQ9IkxheWVyXzIiPjxnIGlkPSJMYXllcl8zIj48cGF0aCBjbGFzcz0iY2xzLTEiIGQ9Ik01MS44NywwQzIzLjcxLDAsMCwyMi44MywwLDUxYzAsLjkxLDAsNTIuODEsMCw1Mi44MWw1MS44Ni0uMDVjMjguMTYsMCw1MS0yMy43MSw1MS01MS44N1M4MCwwLDUxLjg3LDBaIi8+PHBhdGggY2xhc3M9ImNscy0yIiBkPSJNNTIuMzcsMTkuNzRBMzEuNjIsMzEuNjIsMCwwLDAsMjQuNTgsNjYuNDFsLTUuNzIsMTguNEwzOS40LDgwLjE3YTMxLjYxLDMxLjYxLDAsMSwwLDEzLTYwLjQzWiIvPjxwYXRoIGNsYXNzPSJjbHMtMyIgZD0iTTc3LjQ1LDMyLjEyYTMxLjYsMzEuNiwwLDAsMS0zOC4wNSw0OEwxOC44Niw4NC44MmwyMC45MS0yLjQ3QTMxLjYsMzEuNiwwLDAsMCw3Ny40NSwzMi4xMloiLz48cGF0aCBjbGFzcz0iY2xzLTQiIGQ9Ik03MS42MywyNi4yOUEzMS42LDMxLjYsMCwwLDEsMzguOCw3OEwxOC44Niw4NC44MiwzOS40LDgwLjE3QTMxLjYsMzEuNiwwLDAsMCw3MS42MywyNi4yOVoiLz48cGF0aCBjbGFzcz0iY2xzLTUiIGQ9Ik0yNi40Nyw2Ny4xMWEzMS42MSwzMS42MSwwLDAsMSw1MS0zNUEzMS42MSwzMS42MSwwLDAsMCwyNC41OCw2Ni40MWwtNS43MiwxOC40WiIvPjxwYXRoIGNsYXNzPSJjbHMtNiIgZD0iTTI0LjU4LDY2LjQxQTMxLjYxLDMxLjYxLDAsMCwxLDcxLjYzLDI2LjI5YTMxLjYxLDMxLjYxLDAsMCwwLTQ5LDM5LjYzbC0zLjc2LDE4LjlaIi8+PC9nPjwvZz48L3N2Zz4="></a> <a href="https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section6_pt.ipynb" target="_blank"><img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section6_pt.ipynb" target="_blank"><img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"></a></div> <p data-svelte-h="svelte-1iey9c7">နောက်ဆုံးအပိုင်းအချို့မှာတော့ ကျွန်တော်တို့ အလုပ်အများစုကို ကိုယ်တိုင်လုပ်ဆောင်ဖို့ အစွမ်းကုန် ကြိုးစားခဲ့ပါတယ်။ tokenizers တွေ ဘယ်လိုအလုပ်လုပ်တယ်ဆိုတာကို လေ့လာခဲ့ပြီး tokenization, input IDs အဖြစ် ပြောင်းလဲခြင်း၊ padding, truncation, နဲ့ attention masks တွေအကြောင်းကို လေ့လာခဲ့ပါတယ်။</p> <p data-svelte-h="svelte-aajhfx">သို့သော်လည်း၊ အပိုင်း ၂ မှာ ကျွန်တော်တို့ တွေ့ခဲ့ရသလိုပဲ၊ 🤗 Transformers API က ဒါတွေအားလုံးကို ကျွန်တော်တို့အတွက် အဆင့်မြင့် function တစ်ခုနဲ့ ကိုင်တွယ်ပေးနိုင်ပြီး၊ အဲဒါကို ဒီနေရာမှာ ကျွန်တော်တို့ နက်ရှိုင်းစွာ လေ့လာပါမယ်။ သင်ရဲ့ <code>tokenizer</code> ကို စာကြောင်းပေါ်မှာ တိုက်ရိုက် ခေါ်ဆိုတဲ့အခါ၊ သင်ရဲ့ model ကို ဖြတ်သန်းဖို့ အဆင်သင့်ဖြစ်နေတဲ့ inputs တွေကို ပြန်ရပါလိမ့်မယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer | |
| checkpoint = <span class="hljs-string">"distilbert-base-uncased-finetuned-sst-2-english"</span> | |
| tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
| sequence = <span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span> | |
| model_inputs = tokenizer(sequence)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1fo6gda">ဒီနေရာမှာ <code>model_inputs</code> variable မှာ model တစ်ခု ကောင်းကောင်း အလုပ်လုပ်နိုင်ဖို့ လိုအပ်တဲ့ အရာအားလုံး ပါဝင်ပါတယ်။ DistilBERT အတွက်ဆိုရင်၊ အဲဒါက input IDs တွေအပြင် attention mask ပါ ပါဝင်ပါတယ်။ အပို inputs တွေကို လက်ခံတဲ့ တခြား model တွေအတွက်လည်း <code>tokenizer</code> object က အဲဒါတွေကို output အဖြစ် ထုတ်ပေးပါလိမ့်မယ်။</p> <p data-svelte-h="svelte-12bzjhn">အောက်ပါ ဥပမာအချို့မှာ ကျွန်တော်တို့ မြင်ရမယ့်အတိုင်း၊ ဒီ method က အလွန်အစွမ်းထက်ပါတယ်။ ပထမဆုံးအနေနဲ့၊ ဒါက single sequence တစ်ခုကို tokenize လုပ်ဆောင်နိုင်ပါတယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sequence = <span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span> | |
| model_inputs = tokenizer(sequence)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-nfnt39">ဒါက API မှာ ဘာမှမပြောင်းလဲဘဲ sequence များစွာကို တစ်ပြိုင်နက်တည်း ကိုင်တွယ်နိုင်ပါတယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sequences = [<span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span>, <span class="hljs-string">"So have I!"</span>] | |
| model_inputs = tokenizer(sequences)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-4gs3gv">ဒါက ရည်ရွယ်ချက်အမျိုးမျိုးအရ pad လုပ်နိုင်ပါတယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># Sequences တွေကို အရှည်ဆုံး sequence length အထိ pad လုပ်ပါလိမ့်မယ်။</span> | |
| model_inputs = tokenizer(sequences, padding=<span class="hljs-string">"longest"</span>) | |
| <span class="hljs-comment"># Sequences တွေကို model ရဲ့ max length (BERT ဒါမှမဟုတ် DistilBERT အတွက် 512) အထိ pad လုပ်ပါလိမ့်မယ်။</span> | |
| model_inputs = tokenizer(sequences, padding=<span class="hljs-string">"max_length"</span>) | |
| <span class="hljs-comment"># Sequences တွေကို သတ်မှတ်ထားတဲ့ max length အထိ pad လုပ်ပါလိမ့်မယ်။</span> | |
| model_inputs = tokenizer(sequences, padding=<span class="hljs-string">"max_length"</span>, max_length=<span class="hljs-number">8</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1c12gps">ဒါက sequences တွေကို truncate လည်း လုပ်နိုင်ပါတယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sequences = [<span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span>, <span class="hljs-string">"So have I!"</span>] | |
| <span class="hljs-comment"># Model ရဲ့ max length (BERT ဒါမှမဟုတ် DistilBERT အတွက် 512) ထက် ပိုရှည်တဲ့ sequences တွေကို truncate လုပ်ပါလိမ့်မယ်။</span> | |
| model_inputs = tokenizer(sequences, truncation=<span class="hljs-literal">True</span>) | |
| <span class="hljs-comment"># သတ်မှတ်ထားတဲ့ max length ထက် ပိုရှည်တဲ့ sequences တွေကို truncate လုပ်ပါလိမ့်မယ်။</span> | |
| model_inputs = tokenizer(sequences, max_length=<span class="hljs-number">8</span>, truncation=<span class="hljs-literal">True</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-bb0wxs"><code>tokenizer</code> object က သီးခြား framework tensors တွေအဖြစ် ပြောင်းလဲခြင်းကို ကိုင်တွယ်နိုင်ပါတယ်။ ၎င်းတို့ကို model ကို တိုက်ရိုက် ပို့နိုင်ပါတယ်။ ဥပမာအားဖြင့်၊ အောက်ပါ code sample မှာ ကျွန်တော်တို့က tokenizer ကို မတူညီတဲ့ frameworks တွေကနေ tensors တွေကို ပြန်ပေးဖို့ တောင်းဆိုနေတာပါ။ <code>"pt"</code> က PyTorch tensors တွေကို ပြန်ပေးပြီး <code>"np"</code> က NumPy arrays တွေကို ပြန်ပေးပါတယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sequences = [<span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span>, <span class="hljs-string">"So have I!"</span>] | |
| <span class="hljs-comment"># PyTorch tensors များကို ပြန်ပေးသည်။</span> | |
| model_inputs = tokenizer(sequences, padding=<span class="hljs-literal">True</span>, return_tensors=<span class="hljs-string">"pt"</span>) | |
| <span class="hljs-comment"># NumPy arrays များကို ပြန်ပေးသည်။</span> | |
| model_inputs = tokenizer(sequences, padding=<span class="hljs-literal">True</span>, return_tensors=<span class="hljs-string">"np"</span>)<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="special-tokens" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#special-tokens"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Special Tokens များ</span></h2> <p data-svelte-h="svelte-1q4gzee">tokenizer က ပြန်ပေးတဲ့ input IDs တွေကို ကြည့်လိုက်ရင်၊ အစောပိုင်းက ကျွန်တော်တို့ ရရှိခဲ့တာတွေနဲ့ အနည်းငယ် ကွဲပြားနေတာကို တွေ့ရပါလိမ့်မယ်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sequence = <span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span> | |
| model_inputs = tokenizer(sequence) | |
| <span class="hljs-built_in">print</span>(model_inputs[<span class="hljs-string">"input_ids"</span>]) | |
| tokens = tokenizer.tokenize(sequence) | |
| ids = tokenizer.convert_tokens_to_ids(tokens) | |
| <span class="hljs-built_in">print</span>(ids)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->[<span class="hljs-number">101</span>, <span class="hljs-number">1045</span>, <span class="hljs-number">1005</span>, <span class="hljs-number">2310</span>, <span class="hljs-number">2042</span>, <span class="hljs-number">3403</span>, <span class="hljs-number">2005</span>, <span class="hljs-number">1037</span>, <span class="hljs-number">17662</span>, <span class="hljs-number">12172</span>, <span class="hljs-number">2607</span>, <span class="hljs-number">2026</span>, <span class="hljs-number">2878</span>, <span class="hljs-number">2166</span>, <span class="hljs-number">1012</span>, <span class="hljs-number">102</span>] | |
| [<span class="hljs-number">1045</span>, <span class="hljs-number">1005</span>, <span class="hljs-number">2310</span>, <span class="hljs-number">2042</span>, <span class="hljs-number">3403</span>, <span class="hljs-number">2005</span>, <span class="hljs-number">1037</span>, <span class="hljs-number">17662</span>, <span class="hljs-number">12172</span>, <span class="hljs-number">2607</span>, <span class="hljs-number">2026</span>, <span class="hljs-number">2878</span>, <span class="hljs-number">2166</span>, <span class="hljs-number">1012</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1u24mhs">token ID တစ်ခုကို အစမှာ ထည့်သွင်းထားပြီး၊ တစ်ခုကို အဆုံးမှာ ထည့်သွင်းထားပါတယ်။ ဒါက ဘာအကြောင်းလဲဆိုတာ သိဖို့ အထက်ပါ IDs sequence နှစ်ခုကို decode လုပ်ကြည့်ရအောင်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-built_in">print</span>(tokenizer.decode(model_inputs[<span class="hljs-string">"input_ids"</span>])) | |
| <span class="hljs-built_in">print</span>(tokenizer.decode(ids))<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-string">"[CLS] i've been waiting for a huggingface course my whole life. [SEP]"</span> | |
| <span class="hljs-string">"i've been waiting for a huggingface course my whole life."</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1m9onyd">tokenizer က အစမှာ <code>[CLS]</code> ဆိုတဲ့ special word ကို ထည့်ထားပြီး၊ အဆုံးမှာ <code>[SEP]</code> ဆိုတဲ့ special word ကို ထည့်ထားပါတယ်။ ဒါက model ကို အဲဒီ tokens တွေနဲ့ pretrained လုပ်ထားတာကြောင့် ဖြစ်ပြီး၊ inference အတွက် တူညီတဲ့ ရလဒ်တွေ ရရှိဖို့အတွက် ကျွန်တော်တို့လည်း ဒါတွေကို ထည့်ဖို့ လိုအပ်ပါတယ်။ တချို့ model တွေက special words တွေ မထည့်တာ ဒါမှမဟုတ် မတူညီတဲ့ special words တွေ ထည့်တာမျိုး ရှိနိုင်ပါတယ်။ model တွေက special words တွေကို အစမှာပဲ ဒါမှမဟုတ် အဆုံးမှာပဲ ထည့်တာမျိုးလည်း ရှိနိုင်ပါတယ်။ ဘယ်လိုပဲဖြစ်ဖြစ်၊ tokenizer က ဘယ် special tokens တွေ လိုအပ်တယ်ဆိုတာ သိပြီး သင့်အတွက် ဒါတွေကို ကိုင်တွယ်ပေးပါလိမ့်မယ်။</p> <h2 class="relative group"><a id="wrapping-up-from-tokenizer-to-model" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#wrapping-up-from-tokenizer-to-model"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>အနှစ်ချုပ်: Tokenizer မှ Model ဆီသို့</span></h2> <p data-svelte-h="svelte-1y1s1tg"><code>tokenizer</code> object က text တွေပေါ်မှာ အသုံးပြုတဲ့အခါ တစ်ဦးချင်းစီ အဆင့်တွေကို အားလုံး မြင်ပြီးသွားပြီဆိုတော့၊ ဒါက sequences များစွာကို (padding!)၊ အလွန်ရှည်လျားတဲ့ sequences တွေကို (truncation!) နဲ့ မတူညီတဲ့ tensors အမျိုးအစားများစွာကို သူ့ရဲ့ အဓိက API နဲ့ ဘယ်လိုကိုင်တွယ်လဲဆိုတာကို နောက်ဆုံးတစ်ကြိမ် ကြည့်ရအောင်။</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> torch | |
| <span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer, AutoModelForSequenceClassification | |
| checkpoint = <span class="hljs-string">"distilbert-base-uncased-finetuned-sst-2-english"</span> | |
| tokenizer = AutoTokenizer.from_pretrained(checkpoint) | |
| model = AutoModelForSequenceClassification.from_pretrained(checkpoint) | |
| sequences = [<span class="hljs-string">"I've been waiting for a HuggingFace course my whole life."</span>, <span class="hljs-string">"So have I!"</span>] | |
| tokens = tokenizer(sequences, padding=<span class="hljs-literal">True</span>, truncation=<span class="hljs-literal">True</span>, return_tensors=<span class="hljs-string">"pt"</span>) | |
| output = model(**tokens)<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="ဝဟရ-ရငလငခက-glossary" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#ဝဟရ-ရငလငခက-glossary"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>ဝေါဟာရ ရှင်းလင်းချက် (Glossary)</span></h2> <ul data-svelte-h="svelte-1luf0mo"><li><strong>Tokenizer</strong>: စာသား (သို့မဟုတ် အခြားဒေတာ) ကို AI မော်ဒယ်များ စီမံဆောင်ရွက်နိုင်ရန် tokens တွေအဖြစ် ပိုင်းခြားပေးသည့် ကိရိယာ သို့မဟုတ် လုပ်ငန်းစဉ်။</li> <li><strong>Tokenization</strong>: စာသားကို tokens များအဖြစ် ပိုင်းခြားသော လုပ်ငန်းစဉ်။</li> <li><strong>Input IDs</strong>: Tokenizer မှ ထုတ်ပေးသော tokens တစ်ခုစီ၏ ထူးခြားသော ဂဏန်းဆိုင်ရာ ID များ။</li> <li><strong>Padding</strong>: မတူညီသော အရှည်ရှိသည့် input sequence များကို အရှည်တူညီအောင် သတ်မှတ်ထားသော တန်ဖိုးများဖြင့် ဖြည့်စွက်ခြင်း။</li> <li><strong>Truncation</strong>: အရှည်ကန့်သတ်ချက်ထက် ပိုနေသော input sequence များကို ဖြတ်တောက်ခြင်း။</li> <li><strong>Attention Mask</strong>: မော်ဒယ်ကို အာရုံစိုက်သင့်သည့် tokens များနှင့် လျစ်လျူရှုသင့်သည့် (padding) tokens များကို ခွဲခြားပေးသည့် binary mask။</li> <li><strong>🤗 Transformers API</strong>: Hugging Face Transformers library ကို အသုံးပြုရန်အတွက် ပရိုဂရမ်မာများက ခေါ်ဆိုနိုင်သော လုပ်ဆောင်ချက်များ၊ class များ နှင့် methods များ။</li> <li><strong><code>AutoTokenizer</code> Class</strong>: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး မော်ဒယ်အမည်ကို အသုံးပြုပြီး သက်ဆိုင်ရာ tokenizer ကို အလိုအလျောက် load လုပ်ပေးသည်။</li> <li><strong><code>from_pretrained()</code> Method</strong>: Pre-trained model သို့မဟုတ် tokenizer ကို load လုပ်ရန် အသုံးပြုသော method။</li> <li><strong><code>distilbert-base-uncased-finetuned-sst-2-english</code></strong>: <code>sentiment-analysis</code> pipeline ၏ default checkpoint အဖြစ် အသုံးပြုသော DistilBERT မော်ဒယ်၏ အမည်။ <code>base</code> သည် မော်ဒယ်၏ အရွယ်အစားကို ဖော်ပြပြီး <code>uncased</code> သည် စာလုံးအကြီးအသေး ခွဲခြားခြင်းမရှိဘဲ လေ့ကျင့်ထားကြောင်း ဖော်ပြသည်။ <code>finetuned-sst-2-english</code> က SST-2 dataset တွင် English ဘာသာစကားအတွက် fine-tune လုပ်ထားသည်ကို ဆိုလိုသည်။</li> <li><strong><code>model_inputs</code> Variable</strong>: tokenizer ကနေ ထွက်လာတဲ့ model ရဲ့ inputs တွေအားလုံးကို သိမ်းဆည်းထားတဲ့ variable။</li> <li><strong>PyTorch Tensors</strong>: PyTorch deep learning framework မှာ ဒေတာတွေကို ကိုယ်စားပြုသော multi-dimensional array များ။</li> <li><strong>NumPy Arrays</strong>: Python တွင် ဂဏန်းတွက်ချက်မှုများအတွက် အသုံးပြုသော multi-dimensional array များအတွက် library။</li> <li><strong><code>padding="longest"</code></strong>: Batch အတွင်းရှိ အရှည်ဆုံး sequence အထိ pad လုပ်ခြင်း။</li> <li><strong><code>padding="max_length"</code></strong>: Model ၏ အများဆုံး length အထိ pad လုပ်ခြင်း။</li> <li><strong><code>max_length</code></strong>: Padding သို့မဟုတ် truncation အတွက် သတ်မှတ်ထားသော အရှည် ကန့်သတ်ချက်။</li> <li><strong><code>truncation=True</code></strong>: Sequences များကို သတ်မှတ်ထားသော length အထိ ဖြတ်တောက်ခြင်း။</li> <li><strong><code>return_tensors="pt"</code></strong>: PyTorch tensors များကို ပြန်ပေးရန် tokenizer ကို ညွှန်ကြားခြင်း။</li> <li><strong><code>return_tensors="np"</code></strong>: NumPy arrays များကို ပြန်ပေးရန် tokenizer ကို ညွှန်ကြားခြင်း။</li> <li><strong>Special Tokens</strong>: Transformer model များက စာကြောင်းနယ်နိမိတ်များ သို့မဟုတ် အခြားအချက်အလက်များကို ကိုယ်စားပြုရန် အသုံးပြုသော အထူး tokens များ (ဥပမာ - <code>[CLS]</code>, <code>[SEP]</code>, <code>[PAD]</code>)။</li> <li><strong><code>[CLS]</code></strong>: BERT မော်ဒယ်တွင် classification task အတွက် အသုံးပြုသော special token (စာကြောင်း၏ အစတွင် ပေါ်လာသည်)။</li> <li><strong><code>[SEP]</code></strong>: BERT မော်ဒယ်တွင် စာကြောင်းများကြား ပိုင်းခြားရန် အသုံးပြုသော special token။</li> <li><strong><code>tokenizer.decode()</code> Method</strong>: Token IDs များကို မူရင်းစာသားသို့ ပြန်ပြောင်းလဲပေးသော method။</li> <li><strong>Inference</strong>: လေ့ကျင့်ပြီးသား Artificial Intelligence (AI) မော်ဒယ်တစ်ခုကို အသုံးပြုပြီး input data ကနေ ခန့်မှန်းချက်တွေ ဒါမှမဟုတ် output တွေကို ထုတ်လုပ်တဲ့ လုပ်ငန်းစဉ်။</li> <li><strong><code>AutoModelForSequenceClassification</code> Class</strong>: Hugging Face Transformers library မှာ ပါဝင်တဲ့ class တစ်ခုဖြစ်ပြီး sequence classification အတွက် pre-trained model ကို အလိုအလျောက် load လုပ်ပေးသည်။</li> <li><strong>`model(</strong>tokens)`**: tokenizer ကနေ ထုတ်ပေးတဲ့ dictionary ကို model ရဲ့ input အဖြစ် ထည့်သွင်းပေးခြင်း။</li></ul> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/course/blob/main/chapters/my/chapter2/6.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_dep9rk = { | |
| assets: "/docs/course/pr_1107/my", | |
| base: "/docs/course/pr_1107/my", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/course/pr_1107/my/_app/immutable/entry/start.5c6233a8.js"), | |
| import("/docs/course/pr_1107/my/_app/immutable/entry/app.55586789.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 19], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 47.3 kB
- Xet hash:
- 1fcf777fcdf2f0095e74dbbc31c8e8b219666c5dbfc6fd3a05af96a6dfdaaa76
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.