Buckets:

hf-doc-build/doc-dev / transformers /main /ko /tasks /object_detection.html
rtrm's picture
download
raw
97.7 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;객체 탐지&quot;,&quot;local&quot;:&quot;object-detection&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;CPPE-5 데이터 세트 가져오기&quot;,&quot;local&quot;:&quot;load-the-CPPE-5-dataset&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;데이터 전처리하기&quot;,&quot;local&quot;:&quot;preprocess-the-data&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;DETR 모델 학습시키기&quot;,&quot;local&quot;:&quot;training-the-DETR-model&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;평가하기&quot;,&quot;local&quot;:&quot;evaluate&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;추론하기&quot;,&quot;local&quot;:&quot;inference&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/transformers/main/ko/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/entry/start.9aa88961.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/scheduler.9bc65507.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/singletons.9eec45c3.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/index.3b203c72.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/paths.566078f7.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/entry/app.84fb67c3.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/index.707bf1b6.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/nodes/0.1c99376b.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/nodes/75.bce6ff14.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/Tip.c2ecdbf4.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/CodeBlock.54a9f38d.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/DocNotebookDropdown.41f65cb5.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/globals.7f7f1b26.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/EditOnGithub.922df6ba.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;객체 탐지&quot;,&quot;local&quot;:&quot;object-detection&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;CPPE-5 데이터 세트 가져오기&quot;,&quot;local&quot;:&quot;load-the-CPPE-5-dataset&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;데이터 전처리하기&quot;,&quot;local&quot;:&quot;preprocess-the-data&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;DETR 모델 학습시키기&quot;,&quot;local&quot;:&quot;training-the-DETR-model&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;평가하기&quot;,&quot;local&quot;:&quot;evaluate&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;추론하기&quot;,&quot;local&quot;:&quot;inference&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>객체 탐지</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <p data-svelte-h="svelte-2l5zzj">객체 탐지는 이미지에서 인스턴스(예: 사람, 건물 또는 자동차)를 감지하는 컴퓨터 비전 작업입니다. 객체 탐지 모델은 이미지를 입력으로 받고 탐지된 바운딩 박스의 좌표와 관련된 레이블을 출력합니다.
하나의 이미지에는 여러 객체가 있을 수 있으며 각각은 자체적인 바운딩 박스와 레이블을 가질 수 있습니다(예: 차와 건물이 있는 이미지).
또한 각 객체는 이미지의 다른 부분에 존재할 수 있습니다(예: 이미지에 여러 대의 차가 있을 수 있음).
이 작업은 보행자, 도로 표지판, 신호등과 같은 것들을 감지하는 자율 주행에 일반적으로 사용됩니다.
다른 응용 분야로는 이미지 내 객체 수 계산 및 이미지 검색 등이 있습니다.</p> <p data-svelte-h="svelte-65ekbi">이 가이드에서 다음을 배울 것입니다:</p> <ol data-svelte-h="svelte-x4xxmg"><li>합성곱 백본(인풋 데이터의 특성을 추출하는 합성곱 네트워크)과 인코더-디코더 트랜스포머 모델을 결합한 <a href="https://huggingface.co/docs/transformers/model_doc/detr" rel="nofollow">DETR</a> 모델을 <a href="https://huggingface.co/datasets/cppe-5" rel="nofollow">CPPE-5</a> 데이터 세트에 대해 미세조정 하기</li> <li>미세조정 한 모델을 추론에 사용하기.</li></ol> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-uwsr4s">이 작업과 호환되는 모든 아키텍처와 체크포인트를 보려면 <a href="https://huggingface.co/tasks/object-detection" rel="nofollow">작업 페이지</a>를 확인하는 것이 좋습니다.</p></div> <p data-svelte-h="svelte-18iigii">시작하기 전에 필요한 모든 라이브러리가 설치되어 있는지 확인하세요:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install -q datasets transformers evaluate timm albumentations<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-yq3w55">허깅페이스 허브에서 데이터 세트를 가져오기 위한 🤗 Datasets과 모델을 학습하기 위한 🤗 Transformers, 데이터를 증강하기 위한 <code>albumentations</code>를 사용합니다.
DETR 모델의 합성곱 백본을 가져오기 위해서는 현재 <code>timm</code>이 필요합니다.</p> <p data-svelte-h="svelte-1ylwhj7">커뮤니티에 모델을 업로드하고 공유할 수 있도록 Hugging Face 계정에 로그인하는 것을 권장합니다. 프롬프트가 나타나면 토큰을 입력하여 로그인하세요:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> notebook_login
<span class="hljs-meta">&gt;&gt;&gt; </span>notebook_login()<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="load-the-CPPE-5-dataset" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#load-the-CPPE-5-dataset"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>CPPE-5 데이터 세트 가져오기</span></h2> <p data-svelte-h="svelte-1q8mu6i"><a href="https://huggingface.co/datasets/cppe-5" rel="nofollow">CPPE-5</a> 데이터 세트는 COVID-19 대유행 상황에서 의료 전문인력 보호 장비(PPE)를 식별하는 어노테이션이 포함된 이미지를 담고 있습니다.</p> <p data-svelte-h="svelte-1t1h8qm">데이터 세트를 가져오세요:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset
<span class="hljs-meta">&gt;&gt;&gt; </span>cppe5 = load_dataset(<span class="hljs-string">&quot;cppe-5&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>cppe5
DatasetDict({
train: Dataset({
features: [<span class="hljs-string">&#x27;image_id&#x27;</span>, <span class="hljs-string">&#x27;image&#x27;</span>, <span class="hljs-string">&#x27;width&#x27;</span>, <span class="hljs-string">&#x27;height&#x27;</span>, <span class="hljs-string">&#x27;objects&#x27;</span>],
num_rows: <span class="hljs-number">1000</span>
})
test: Dataset({
features: [<span class="hljs-string">&#x27;image_id&#x27;</span>, <span class="hljs-string">&#x27;image&#x27;</span>, <span class="hljs-string">&#x27;width&#x27;</span>, <span class="hljs-string">&#x27;height&#x27;</span>, <span class="hljs-string">&#x27;objects&#x27;</span>],
num_rows: <span class="hljs-number">29</span>
})
})<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ho0tdr">이 데이터 세트는 학습 세트 이미지 1,000개와 테스트 세트 이미지 29개를 갖고 있습니다.</p> <p data-svelte-h="svelte-13s1puu">데이터에 익숙해지기 위해, 예시가 어떻게 구성되어 있는지 살펴보세요.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>cppe5[<span class="hljs-string">&quot;train&quot;</span>][<span class="hljs-number">0</span>]
{<span class="hljs-string">&#x27;image_id&#x27;</span>: <span class="hljs-number">15</span>,
<span class="hljs-string">&#x27;image&#x27;</span>: &lt;PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=943x663 at <span class="hljs-number">0x7F9EC9E77C10</span>&gt;,
<span class="hljs-string">&#x27;width&#x27;</span>: <span class="hljs-number">943</span>,
<span class="hljs-string">&#x27;height&#x27;</span>: <span class="hljs-number">663</span>,
<span class="hljs-string">&#x27;objects&#x27;</span>: {<span class="hljs-string">&#x27;id&#x27;</span>: [<span class="hljs-number">114</span>, <span class="hljs-number">115</span>, <span class="hljs-number">116</span>, <span class="hljs-number">117</span>],
<span class="hljs-string">&#x27;area&#x27;</span>: [<span class="hljs-number">3796</span>, <span class="hljs-number">1596</span>, <span class="hljs-number">152768</span>, <span class="hljs-number">81002</span>],
<span class="hljs-string">&#x27;bbox&#x27;</span>: [[<span class="hljs-number">302.0</span>, <span class="hljs-number">109.0</span>, <span class="hljs-number">73.0</span>, <span class="hljs-number">52.0</span>],
[<span class="hljs-number">810.0</span>, <span class="hljs-number">100.0</span>, <span class="hljs-number">57.0</span>, <span class="hljs-number">28.0</span>],
[<span class="hljs-number">160.0</span>, <span class="hljs-number">31.0</span>, <span class="hljs-number">248.0</span>, <span class="hljs-number">616.0</span>],
[<span class="hljs-number">741.0</span>, <span class="hljs-number">68.0</span>, <span class="hljs-number">202.0</span>, <span class="hljs-number">401.0</span>]],
<span class="hljs-string">&#x27;category&#x27;</span>: [<span class="hljs-number">4</span>, <span class="hljs-number">4</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>]}}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-jqs3v3">데이터 세트에 있는 예시는 다음의 영역을 가지고 있습니다:</p> <ul data-svelte-h="svelte-1plsauu"><li><code>image_id</code>: 예시 이미지 id</li> <li><code>image</code>: 이미지를 포함하는 <code>PIL.Image.Image</code> 객체</li> <li><code>width</code>: 이미지의 너비</li> <li><code>height</code>: 이미지의 높이</li> <li><code>objects</code>: 이미지 안의 객체들의 바운딩 박스 메타데이터를 포함하는 딕셔너리:<ul><li><code>id</code>: 어노테이션 id</li> <li><code>area</code>: 바운딩 박스의 면적</li> <li><code>bbox</code>: 객체의 바운딩 박스 (<a href="https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco" rel="nofollow">COCO 포맷</a>으로)</li> <li><code>category</code>: 객체의 카테고리, 가능한 값으로는 <code>Coverall (0)</code>, <code>Face_Shield (1)</code>, <code>Gloves (2)</code>, <code>Goggles (3)</code><code>Mask (4)</code> 가 포함됩니다.</li></ul></li></ul> <p data-svelte-h="svelte-1sqjmpr"><code>bbox</code> 필드가 DETR 모델이 요구하는 COCO 형식을 따른다는 것을 알 수 있습니다.
그러나 <code>objects</code> 내부의 필드 그룹은 DETR이 요구하는 어노테이션 형식과 다릅니다. 따라서 이 데이터를 학습에 사용하기 전에 전처리를 적용해야 합니다.</p> <p data-svelte-h="svelte-m6nrtv">데이터를 더 잘 이해하기 위해서 데이터 세트에서 한 가지 예시를 시각화하세요.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> os
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image, ImageDraw
<span class="hljs-meta">&gt;&gt;&gt; </span>image = cppe5[<span class="hljs-string">&quot;train&quot;</span>][<span class="hljs-number">0</span>][<span class="hljs-string">&quot;image&quot;</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>annotations = cppe5[<span class="hljs-string">&quot;train&quot;</span>][<span class="hljs-number">0</span>][<span class="hljs-string">&quot;objects&quot;</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image)
<span class="hljs-meta">&gt;&gt;&gt; </span>categories = cppe5[<span class="hljs-string">&quot;train&quot;</span>].features[<span class="hljs-string">&quot;objects&quot;</span>].feature[<span class="hljs-string">&quot;category&quot;</span>].names
<span class="hljs-meta">&gt;&gt;&gt; </span>id2label = {index: x <span class="hljs-keyword">for</span> index, x <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(categories, start=<span class="hljs-number">0</span>)}
<span class="hljs-meta">&gt;&gt;&gt; </span>label2id = {v: k <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> id2label.items()}
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(annotations[<span class="hljs-string">&quot;id&quot;</span>])):
<span class="hljs-meta">... </span> box = annotations[<span class="hljs-string">&quot;bbox&quot;</span>][i - <span class="hljs-number">1</span>]
<span class="hljs-meta">... </span> class_idx = annotations[<span class="hljs-string">&quot;category&quot;</span>][i - <span class="hljs-number">1</span>]
<span class="hljs-meta">... </span> x, y, w, h = <span class="hljs-built_in">tuple</span>(box)
<span class="hljs-meta">... </span> draw.rectangle((x, y, x + w, y + h), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((x, y), id2label[class_idx], fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1mkaz8h"><img src="https://i.imgur.com/TdaqPJO.png" alt="CPPE-5 Image Example"></div> <p data-svelte-h="svelte-1ghb0cw">바운딩 박스와 연결된 레이블을 시각화하려면 데이터 세트의 메타 데이터, 특히 <code>category</code> 필드에서 레이블을 가져와야 합니다.
또한 레이블 ID를 레이블 클래스에 매핑하는 <code>id2label</code>과 반대로 매핑하는 <code>label2id</code> 딕셔너리를 만들어야 합니다.
모델을 설정할 때 이러한 매핑을 사용할 수 있습니다. 이러한 매핑은 허깅페이스 허브에서 모델을 공유했을 때 다른 사람들이 재사용할 수 있습니다.</p> <p data-svelte-h="svelte-1su17da">데이터를 더 잘 이해하기 위한 최종 단계로, 잠재적인 문제를 찾아보세요.
객체 감지를 위한 데이터 세트에서 자주 발생하는 문제 중 하나는 바운딩 박스가 이미지의 가장자리를 넘어가는 것입니다.
이러한 바운딩 박스를 “넘어가는 것(run away)“은 훈련 중에 오류를 발생시킬 수 있기에 이 단계에서 처리해야 합니다.
이 데이터 세트에도 같은 문제가 있는 몇 가지 예가 있습니다. 이 가이드에서는 간단하게하기 위해 데이터에서 이러한 이미지를 제거합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>remove_idx = [<span class="hljs-number">590</span>, <span class="hljs-number">821</span>, <span class="hljs-number">822</span>, <span class="hljs-number">875</span>, <span class="hljs-number">876</span>, <span class="hljs-number">878</span>, <span class="hljs-number">879</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>keep = [i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(cppe5[<span class="hljs-string">&quot;train&quot;</span>])) <span class="hljs-keyword">if</span> i <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> remove_idx]
<span class="hljs-meta">&gt;&gt;&gt; </span>cppe5[<span class="hljs-string">&quot;train&quot;</span>] = cppe5[<span class="hljs-string">&quot;train&quot;</span>].select(keep)<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="preprocess-the-data" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#preprocess-the-data"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>데이터 전처리하기</span></h2> <p data-svelte-h="svelte-7onj24">모델을 미세 조정 하려면, 미리 학습된 모델에서 사용한 전처리 방식과 정확하게 일치하도록 사용할 데이터를 전처리해야 합니다.
<code>AutoImageProcessor</code>는 이미지 데이터를 처리하여 DETR 모델이 학습에 사용할 수 있는 <code>pixel_values</code>, <code>pixel_mask</code>, 그리고 <code>labels</code>를 생성하는 작업을 담당합니다.
이 이미지 프로세서에는 걱정하지 않아도 되는 몇 가지 속성이 있습니다:</p> <ul data-svelte-h="svelte-9xz2l6"><li><code>image_mean = [0.485, 0.456, 0.406 ]</code></li> <li><code>image_std = [0.229, 0.224, 0.225]</code></li></ul> <p data-svelte-h="svelte-v6xjz5">이 값들은 모델 사전 훈련 중 이미지를 정규화하는 데 사용되는 평균과 표준 편차입니다.
이 값들은 추론 또는 사전 훈련된 이미지 모델을 세밀하게 조정할 때 복제해야 하는 중요한 값입니다.</p> <p data-svelte-h="svelte-1re8t5o">사전 훈련된 모델과 동일한 체크포인트에서 이미지 프로세서를 인스턴스화합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoImageProcessor
<span class="hljs-meta">&gt;&gt;&gt; </span>checkpoint = <span class="hljs-string">&quot;facebook/detr-resnet-50&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>image_processor = AutoImageProcessor.from_pretrained(checkpoint)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1g6pp4q"><code>image_processor</code>에 이미지를 전달하기 전에, 데이터 세트에 두 가지 전처리를 적용해야 합니다:</p> <ul data-svelte-h="svelte-xcuztx"><li>이미지 증강</li> <li>DETR 모델의 요구에 맞게 어노테이션을 다시 포맷팅</li></ul> <p data-svelte-h="svelte-10gzg1j">첫째로, 모델이 학습 데이터에 과적합 되지 않도록 데이터 증강 라이브러리 중 아무거나 사용하여 변환을 적용할 수 있습니다. 여기에서는 <a href="https://albumentations.ai/docs/" rel="nofollow">Albumentations</a> 라이브러리를 사용합니다…
이 라이브러리는 변환을 이미지에 적용하고 바운딩 박스를 적절하게 업데이트하도록 보장합니다.
🤗 Datasets 라이브러리 문서에는 <a href="https://huggingface.co/docs/datasets/object_detection" rel="nofollow">객체 탐지를 위해 이미지를 보강하는 방법에 대한 자세한 가이드</a>가 있으며,
이 예제와 정확히 동일한 데이터 세트를 사용합니다. 여기서는 각 이미지를 (480, 480) 크기로 조정하고, 좌우로 뒤집고, 밝기를 높이는 동일한 접근법을 적용합니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> albumentations
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> torch
<span class="hljs-meta">&gt;&gt;&gt; </span>transform = albumentations.Compose(
<span class="hljs-meta">... </span> [
<span class="hljs-meta">... </span> albumentations.Resize(<span class="hljs-number">480</span>, <span class="hljs-number">480</span>),
<span class="hljs-meta">... </span> albumentations.HorizontalFlip(p=<span class="hljs-number">1.0</span>),
<span class="hljs-meta">... </span> albumentations.RandomBrightnessContrast(p=<span class="hljs-number">1.0</span>),
<span class="hljs-meta">... </span> ],
<span class="hljs-meta">... </span> bbox_params=albumentations.BboxParams(<span class="hljs-built_in">format</span>=<span class="hljs-string">&quot;coco&quot;</span>, label_fields=[<span class="hljs-string">&quot;category&quot;</span>]),
<span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1pbo3mb">이미지 프로세서는 어노테이션이 다음과 같은 형식일 것으로 예상합니다: <code>{&#39;image_id&#39;: int, &#39;annotations&#39;: List[Dict]}</code>, 여기서 각 딕셔너리는 COCO 객체 어노테이션입니다. 단일 예제에 대해 어노테이션의 형식을 다시 지정하는 함수를 추가해 보겠습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">formatted_anns</span>(<span class="hljs-params">image_id, category, area, bbox</span>):
<span class="hljs-meta">... </span> annotations = []
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-number">0</span>, <span class="hljs-built_in">len</span>(category)):
<span class="hljs-meta">... </span> new_ann = {
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;image_id&quot;</span>: image_id,
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;category_id&quot;</span>: category[i],
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;isCrowd&quot;</span>: <span class="hljs-number">0</span>,
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;area&quot;</span>: area[i],
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;bbox&quot;</span>: <span class="hljs-built_in">list</span>(bbox[i]),
<span class="hljs-meta">... </span> }
<span class="hljs-meta">... </span> annotations.append(new_ann)
<span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> annotations<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1lyazxr">이제 이미지와 어노테이션 전처리 변환을 결합하여 예제 배치에 사용할 수 있습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-comment"># transforming a batch</span>
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">transform_aug_ann</span>(<span class="hljs-params">examples</span>):
<span class="hljs-meta">... </span> image_ids = examples[<span class="hljs-string">&quot;image_id&quot;</span>]
<span class="hljs-meta">... </span> images, bboxes, area, categories = [], [], [], []
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> image, objects <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(examples[<span class="hljs-string">&quot;image&quot;</span>], examples[<span class="hljs-string">&quot;objects&quot;</span>]):
<span class="hljs-meta">... </span> image = np.array(image.convert(<span class="hljs-string">&quot;RGB&quot;</span>))[:, :, ::-<span class="hljs-number">1</span>]
<span class="hljs-meta">... </span> out = transform(image=image, bboxes=objects[<span class="hljs-string">&quot;bbox&quot;</span>], category=objects[<span class="hljs-string">&quot;category&quot;</span>])
<span class="hljs-meta">... </span> area.append(objects[<span class="hljs-string">&quot;area&quot;</span>])
<span class="hljs-meta">... </span> images.append(out[<span class="hljs-string">&quot;image&quot;</span>])
<span class="hljs-meta">... </span> bboxes.append(out[<span class="hljs-string">&quot;bboxes&quot;</span>])
<span class="hljs-meta">... </span> categories.append(out[<span class="hljs-string">&quot;category&quot;</span>])
<span class="hljs-meta">... </span> targets = [
<span class="hljs-meta">... </span> {<span class="hljs-string">&quot;image_id&quot;</span>: id_, <span class="hljs-string">&quot;annotations&quot;</span>: formatted_anns(id_, cat_, ar_, box_)}
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> id_, cat_, ar_, box_ <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(image_ids, categories, area, bboxes)
<span class="hljs-meta">... </span> ]
<span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> image_processor(images=images, annotations=targets, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1c1xyu9">이전 단계에서 만든 전처리 함수를 🤗 Datasets의 <code>with_transform</code> 메소드를 사용하여 데이터 세트 전체에 적용합니다.
이 메소드는 데이터 세트의 요소를 가져올 때마다 전처리 함수를 적용합니다.</p> <p data-svelte-h="svelte-17mwlu6">이 시점에서는 전처리 후 데이터 세트에서 예시 하나를 가져와서 변환 후 모양이 어떻게 되는지 확인해 볼 수 있습니다.
이때, <code>pixel_values</code> 텐서, <code>pixel_mask</code> 텐서, 그리고 <code>labels</code>로 구성된 텐서가 있어야 합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>cppe5[<span class="hljs-string">&quot;train&quot;</span>] = cppe5[<span class="hljs-string">&quot;train&quot;</span>].with_transform(transform_aug_ann)
<span class="hljs-meta">&gt;&gt;&gt; </span>cppe5[<span class="hljs-string">&quot;train&quot;</span>][<span class="hljs-number">15</span>]
{<span class="hljs-string">&#x27;pixel_values&#x27;</span>: tensor([[[ <span class="hljs-number">0.9132</span>, <span class="hljs-number">0.9132</span>, <span class="hljs-number">0.9132</span>, ..., -<span class="hljs-number">1.9809</span>, -<span class="hljs-number">1.9809</span>, -<span class="hljs-number">1.9809</span>],
[ <span class="hljs-number">0.9132</span>, <span class="hljs-number">0.9132</span>, <span class="hljs-number">0.9132</span>, ..., -<span class="hljs-number">1.9809</span>, -<span class="hljs-number">1.9809</span>, -<span class="hljs-number">1.9809</span>],
[ <span class="hljs-number">0.9132</span>, <span class="hljs-number">0.9132</span>, <span class="hljs-number">0.9132</span>, ..., -<span class="hljs-number">1.9638</span>, -<span class="hljs-number">1.9638</span>, -<span class="hljs-number">1.9638</span>],
...,
[-<span class="hljs-number">1.5699</span>, -<span class="hljs-number">1.5699</span>, -<span class="hljs-number">1.5699</span>, ..., -<span class="hljs-number">1.9980</span>, -<span class="hljs-number">1.9980</span>, -<span class="hljs-number">1.9980</span>],
[-<span class="hljs-number">1.5528</span>, -<span class="hljs-number">1.5528</span>, -<span class="hljs-number">1.5528</span>, ..., -<span class="hljs-number">1.9980</span>, -<span class="hljs-number">1.9809</span>, -<span class="hljs-number">1.9809</span>],
[-<span class="hljs-number">1.5528</span>, -<span class="hljs-number">1.5528</span>, -<span class="hljs-number">1.5528</span>, ..., -<span class="hljs-number">1.9980</span>, -<span class="hljs-number">1.9809</span>, -<span class="hljs-number">1.9809</span>]],
[[ <span class="hljs-number">1.3081</span>, <span class="hljs-number">1.3081</span>, <span class="hljs-number">1.3081</span>, ..., -<span class="hljs-number">1.8431</span>, -<span class="hljs-number">1.8431</span>, -<span class="hljs-number">1.8431</span>],
[ <span class="hljs-number">1.3081</span>, <span class="hljs-number">1.3081</span>, <span class="hljs-number">1.3081</span>, ..., -<span class="hljs-number">1.8431</span>, -<span class="hljs-number">1.8431</span>, -<span class="hljs-number">1.8431</span>],
[ <span class="hljs-number">1.3081</span>, <span class="hljs-number">1.3081</span>, <span class="hljs-number">1.3081</span>, ..., -<span class="hljs-number">1.8256</span>, -<span class="hljs-number">1.8256</span>, -<span class="hljs-number">1.8256</span>],
...,
[-<span class="hljs-number">1.3179</span>, -<span class="hljs-number">1.3179</span>, -<span class="hljs-number">1.3179</span>, ..., -<span class="hljs-number">1.8606</span>, -<span class="hljs-number">1.8606</span>, -<span class="hljs-number">1.8606</span>],
[-<span class="hljs-number">1.3004</span>, -<span class="hljs-number">1.3004</span>, -<span class="hljs-number">1.3004</span>, ..., -<span class="hljs-number">1.8606</span>, -<span class="hljs-number">1.8431</span>, -<span class="hljs-number">1.8431</span>],
[-<span class="hljs-number">1.3004</span>, -<span class="hljs-number">1.3004</span>, -<span class="hljs-number">1.3004</span>, ..., -<span class="hljs-number">1.8606</span>, -<span class="hljs-number">1.8431</span>, -<span class="hljs-number">1.8431</span>]],
[[ <span class="hljs-number">1.4200</span>, <span class="hljs-number">1.4200</span>, <span class="hljs-number">1.4200</span>, ..., -<span class="hljs-number">1.6476</span>, -<span class="hljs-number">1.6476</span>, -<span class="hljs-number">1.6476</span>],
[ <span class="hljs-number">1.4200</span>, <span class="hljs-number">1.4200</span>, <span class="hljs-number">1.4200</span>, ..., -<span class="hljs-number">1.6476</span>, -<span class="hljs-number">1.6476</span>, -<span class="hljs-number">1.6476</span>],
[ <span class="hljs-number">1.4200</span>, <span class="hljs-number">1.4200</span>, <span class="hljs-number">1.4200</span>, ..., -<span class="hljs-number">1.6302</span>, -<span class="hljs-number">1.6302</span>, -<span class="hljs-number">1.6302</span>],
...,
[-<span class="hljs-number">1.0201</span>, -<span class="hljs-number">1.0201</span>, -<span class="hljs-number">1.0201</span>, ..., -<span class="hljs-number">1.5604</span>, -<span class="hljs-number">1.5604</span>, -<span class="hljs-number">1.5604</span>],
[-<span class="hljs-number">1.0027</span>, -<span class="hljs-number">1.0027</span>, -<span class="hljs-number">1.0027</span>, ..., -<span class="hljs-number">1.5604</span>, -<span class="hljs-number">1.5430</span>, -<span class="hljs-number">1.5430</span>],
[-<span class="hljs-number">1.0027</span>, -<span class="hljs-number">1.0027</span>, -<span class="hljs-number">1.0027</span>, ..., -<span class="hljs-number">1.5604</span>, -<span class="hljs-number">1.5430</span>, -<span class="hljs-number">1.5430</span>]]]),
<span class="hljs-string">&#x27;pixel_mask&#x27;</span>: tensor([[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, ..., <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, ..., <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, ..., <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
...,
[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, ..., <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, ..., <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, ..., <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>]]),
<span class="hljs-string">&#x27;labels&#x27;</span>: {<span class="hljs-string">&#x27;size&#x27;</span>: tensor([<span class="hljs-number">800</span>, <span class="hljs-number">800</span>]), <span class="hljs-string">&#x27;image_id&#x27;</span>: tensor([<span class="hljs-number">756</span>]), <span class="hljs-string">&#x27;class_labels&#x27;</span>: tensor([<span class="hljs-number">4</span>]), <span class="hljs-string">&#x27;boxes&#x27;</span>: tensor([[<span class="hljs-number">0.7340</span>, <span class="hljs-number">0.6986</span>, <span class="hljs-number">0.3414</span>, <span class="hljs-number">0.5944</span>]]), <span class="hljs-string">&#x27;area&#x27;</span>: tensor([<span class="hljs-number">519544.4375</span>]), <span class="hljs-string">&#x27;iscrowd&#x27;</span>: tensor([<span class="hljs-number">0</span>]), <span class="hljs-string">&#x27;orig_size&#x27;</span>: tensor([<span class="hljs-number">480</span>, <span class="hljs-number">480</span>])}}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ilcaq9">각각의 이미지를 성공적으로 증강하고 이미지의 어노테이션을 준비했습니다.
그러나 전처리는 아직 끝나지 않았습니다. 마지막 단계로, 이미지를 배치로 만들 사용자 정의 <code>collate_fn</code>을 생성합니다.
해당 배치에서 가장 큰 이미지에 이미지(현재 <code>pixel_values</code> 인)를 패드하고, 실제 픽셀(1)과 패딩(0)을 나타내기 위해 그에 해당하는 새로운 <code>pixel_mask</code>를 생성해야 합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">collate_fn</span>(<span class="hljs-params">batch</span>):
<span class="hljs-meta">... </span> pixel_values = [item[<span class="hljs-string">&quot;pixel_values&quot;</span>] <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> batch]
<span class="hljs-meta">... </span> encoding = image_processor.pad(pixel_values, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)
<span class="hljs-meta">... </span> labels = [item[<span class="hljs-string">&quot;labels&quot;</span>] <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> batch]
<span class="hljs-meta">... </span> batch = {}
<span class="hljs-meta">... </span> batch[<span class="hljs-string">&quot;pixel_values&quot;</span>] = encoding[<span class="hljs-string">&quot;pixel_values&quot;</span>]
<span class="hljs-meta">... </span> batch[<span class="hljs-string">&quot;pixel_mask&quot;</span>] = encoding[<span class="hljs-string">&quot;pixel_mask&quot;</span>]
<span class="hljs-meta">... </span> batch[<span class="hljs-string">&quot;labels&quot;</span>] = labels
<span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> batch<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="training-the-DETR-model" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#training-the-DETR-model"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>DETR 모델 학습시키기</span></h2> <p data-svelte-h="svelte-zq2fh5">이전 섹션에서 대부분의 작업을 수행하여 이제 모델을 학습할 준비가 되었습니다!
이 데이터 세트의 이미지는 리사이즈 후에도 여전히 용량이 크기 때문에, 이 모델을 미세 조정 하려면 적어도 하나의 GPU가 필요합니다.</p> <p data-svelte-h="svelte-1nfps6">학습은 다음의 단계를 수행합니다:</p> <ol data-svelte-h="svelte-156ljxz"><li><code>AutoModelForObjectDetection</code>을 사용하여 전처리와 동일한 체크포인트를 사용하여 모델을 가져옵니다.</li> <li><code>TrainingArguments</code>에서 학습 하이퍼파라미터를 정의합니다.</li> <li>모델, 데이터 세트, 이미지 프로세서 및 데이터 콜레이터와 함께 <code>Trainer</code>에 훈련 인수를 전달합니다.</li> <li><code>train()</code>를 호출하여 모델을 미세 조정 합니다.</li></ol> <p data-svelte-h="svelte-vuemo6">전처리에 사용한 체크포인트와 동일한 체크포인트에서 모델을 가져올 때, 데이터 세트의 메타데이터에서 만든 <code>label2id</code><code>id2label</code> 매핑을 전달해야 합니다.
또한, <code>ignore_mismatched_sizes=True</code>를 지정하여 기존 분류 헤드(모델에서 분류에 사용되는 마지막 레이어)를 새 분류 헤드로 대체합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForObjectDetection
<span class="hljs-meta">&gt;&gt;&gt; </span>model = AutoModelForObjectDetection.from_pretrained(
<span class="hljs-meta">... </span> checkpoint,
<span class="hljs-meta">... </span> id2label=id2label,
<span class="hljs-meta">... </span> label2id=label2id,
<span class="hljs-meta">... </span> ignore_mismatched_sizes=<span class="hljs-literal">True</span>,
<span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-y81cw9"><code>TrainingArguments</code>에서 <code>output_dir</code>을 사용하여 모델을 저장할 위치를 지정한 다음, 필요에 따라 하이퍼파라미터를 구성하세요.
사용하지 않는 열을 제거하지 않도록 주의해야 합니다. 만약 <code>remove_unused_columns</code><code>True</code>일 경우 이미지 열이 삭제됩니다.
이미지 열이 없는 경우 <code>pixel_values</code>를 생성할 수 없기 때문에 <code>remove_unused_columns</code><code>False</code>로 설정해야 합니다.
모델을 Hub에 업로드하여 공유하려면 <code>push_to_hub</code><code>True</code>로 설정하십시오(허깅페이스에 로그인하여 모델을 업로드해야 합니다).</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments
<span class="hljs-meta">&gt;&gt;&gt; </span>training_args = TrainingArguments(
<span class="hljs-meta">... </span> output_dir=<span class="hljs-string">&quot;detr-resnet-50_finetuned_cppe5&quot;</span>,
<span class="hljs-meta">... </span> per_device_train_batch_size=<span class="hljs-number">8</span>,
<span class="hljs-meta">... </span> num_train_epochs=<span class="hljs-number">10</span>,
<span class="hljs-meta">... </span> fp16=<span class="hljs-literal">True</span>,
<span class="hljs-meta">... </span> save_steps=<span class="hljs-number">200</span>,
<span class="hljs-meta">... </span> logging_steps=<span class="hljs-number">50</span>,
<span class="hljs-meta">... </span> learning_rate=<span class="hljs-number">1e-5</span>,
<span class="hljs-meta">... </span> weight_decay=<span class="hljs-number">1e-4</span>,
<span class="hljs-meta">... </span> save_total_limit=<span class="hljs-number">2</span>,
<span class="hljs-meta">... </span> remove_unused_columns=<span class="hljs-literal">False</span>,
<span class="hljs-meta">... </span> push_to_hub=<span class="hljs-literal">True</span>,
<span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-5c405v">마지막으로 <code>model</code>, <code>training_args</code>, <code>collate_fn</code>, <code>image_processor</code>와 데이터 세트(<code>cppe5</code>)를 모두 가져온 후, <code>train()</code>를 호출합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer
<span class="hljs-meta">&gt;&gt;&gt; </span>trainer = Trainer(
<span class="hljs-meta">... </span> model=model,
<span class="hljs-meta">... </span> args=training_args,
<span class="hljs-meta">... </span> data_collator=collate_fn,
<span class="hljs-meta">... </span> train_dataset=cppe5[<span class="hljs-string">&quot;train&quot;</span>],
<span class="hljs-meta">... </span> tokenizer=image_processor,
<span class="hljs-meta">... </span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>trainer.train()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-12nwk92"><code>training_args</code>에서 <code>push_to_hub</code><code>True</code>로 설정한 경우, 학습 체크포인트는 허깅페이스 허브에 업로드됩니다.
학습 완료 후, <code>push_to_hub()</code> 메소드를 호출하여 최종 모델을 허깅페이스 허브에 업로드합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>trainer.push_to_hub()<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="evaluate" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#evaluate"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>평가하기</span></h2> <p data-svelte-h="svelte-16ugmlc">객체 탐지 모델은 일반적으로 일련의 <a href="https://cocodataset.org/#detection-eval">COCO-스타일 지표</a>로 평가됩니다.
기존에 구현된 평가 지표 중 하나를 사용할 수도 있지만, 여기에서는 허깅페이스 허브에 푸시한 최종 모델을 평가하는 데 <code>torchvision</code>에서 제공하는 평가 지표를 사용합니다.</p> <p data-svelte-h="svelte-67pxix"><code>torchvision</code> 평가자(evaluator)를 사용하려면 실측값인 COCO 데이터 세트를 준비해야 합니다.
COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로 저장해야 하므로, 먼저 이미지와 어노테이션을 디스크에 저장해야 합니다.
학습을 위해 데이터를 준비할 때와 마찬가지로, cppe5[“test”]에서의 어노테이션은 포맷을 맞춰야 합니다. 그러나 이미지는 그대로 유지해야 합니다.</p> <p data-svelte-h="svelte-czrynv">평가 단계는 약간의 작업이 필요하지만, 크게 세 가지 주요 단계로 나눌 수 있습니다.
먼저, <code>cppe5[&quot;test&quot;]</code> 세트를 준비합니다: 어노테이션을 포맷에 맞게 만들고 데이터를 디스크에 저장합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> json
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-comment"># format annotations the same as for training, no need for data augmentation</span>
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">val_formatted_anns</span>(<span class="hljs-params">image_id, objects</span>):
<span class="hljs-meta">... </span> annotations = []
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-number">0</span>, <span class="hljs-built_in">len</span>(objects[<span class="hljs-string">&quot;id&quot;</span>])):
<span class="hljs-meta">... </span> new_ann = {
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;id&quot;</span>: objects[<span class="hljs-string">&quot;id&quot;</span>][i],
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;category_id&quot;</span>: objects[<span class="hljs-string">&quot;category&quot;</span>][i],
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;iscrowd&quot;</span>: <span class="hljs-number">0</span>,
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;image_id&quot;</span>: image_id,
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;area&quot;</span>: objects[<span class="hljs-string">&quot;area&quot;</span>][i],
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;bbox&quot;</span>: objects[<span class="hljs-string">&quot;bbox&quot;</span>][i],
<span class="hljs-meta">... </span> }
<span class="hljs-meta">... </span> annotations.append(new_ann)
<span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> annotations
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-comment"># Save images and annotations into the files torchvision.datasets.CocoDetection expects</span>
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">def</span> <span class="hljs-title function_">save_cppe5_annotation_file_images</span>(<span class="hljs-params">cppe5</span>):
<span class="hljs-meta">... </span> output_json = {}
<span class="hljs-meta">... </span> path_output_cppe5 = <span class="hljs-string">f&quot;<span class="hljs-subst">{os.getcwd()}</span>/cppe5/&quot;</span>
<span class="hljs-meta">... </span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> os.path.exists(path_output_cppe5):
<span class="hljs-meta">... </span> os.makedirs(path_output_cppe5)
<span class="hljs-meta">... </span> path_anno = os.path.join(path_output_cppe5, <span class="hljs-string">&quot;cppe5_ann.json&quot;</span>)
<span class="hljs-meta">... </span> categories_json = [{<span class="hljs-string">&quot;supercategory&quot;</span>: <span class="hljs-string">&quot;none&quot;</span>, <span class="hljs-string">&quot;id&quot;</span>: <span class="hljs-built_in">id</span>, <span class="hljs-string">&quot;name&quot;</span>: id2label[<span class="hljs-built_in">id</span>]} <span class="hljs-keyword">for</span> <span class="hljs-built_in">id</span> <span class="hljs-keyword">in</span> id2label]
<span class="hljs-meta">... </span> output_json[<span class="hljs-string">&quot;images&quot;</span>] = []
<span class="hljs-meta">... </span> output_json[<span class="hljs-string">&quot;annotations&quot;</span>] = []
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> example <span class="hljs-keyword">in</span> cppe5:
<span class="hljs-meta">... </span> ann = val_formatted_anns(example[<span class="hljs-string">&quot;image_id&quot;</span>], example[<span class="hljs-string">&quot;objects&quot;</span>])
<span class="hljs-meta">... </span> output_json[<span class="hljs-string">&quot;images&quot;</span>].append(
<span class="hljs-meta">... </span> {
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;id&quot;</span>: example[<span class="hljs-string">&quot;image_id&quot;</span>],
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;width&quot;</span>: example[<span class="hljs-string">&quot;image&quot;</span>].width,
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;height&quot;</span>: example[<span class="hljs-string">&quot;image&quot;</span>].height,
<span class="hljs-meta">... </span> <span class="hljs-string">&quot;file_name&quot;</span>: <span class="hljs-string">f&quot;<span class="hljs-subst">{example[<span class="hljs-string">&#x27;image_id&#x27;</span>]}</span>.png&quot;</span>,
<span class="hljs-meta">... </span> }
<span class="hljs-meta">... </span> )
<span class="hljs-meta">... </span> output_json[<span class="hljs-string">&quot;annotations&quot;</span>].extend(ann)
<span class="hljs-meta">... </span> output_json[<span class="hljs-string">&quot;categories&quot;</span>] = categories_json
<span class="hljs-meta">... </span> <span class="hljs-keyword">with</span> <span class="hljs-built_in">open</span>(path_anno, <span class="hljs-string">&quot;w&quot;</span>) <span class="hljs-keyword">as</span> file:
<span class="hljs-meta">... </span> json.dump(output_json, file, ensure_ascii=<span class="hljs-literal">False</span>, indent=<span class="hljs-number">4</span>)
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> im, img_id <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(cppe5[<span class="hljs-string">&quot;image&quot;</span>], cppe5[<span class="hljs-string">&quot;image_id&quot;</span>]):
<span class="hljs-meta">... </span> path_img = os.path.join(path_output_cppe5, <span class="hljs-string">f&quot;<span class="hljs-subst">{img_id}</span>.png&quot;</span>)
<span class="hljs-meta">... </span> im.save(path_img)
<span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> path_output_cppe5, path_anno<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1qg2kqc">다음으로, <code>cocoevaluator</code>와 함께 사용할 수 있는 <code>CocoDetection</code> 클래스의 인스턴스를 준비합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> torchvision
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">class</span> <span class="hljs-title class_">CocoDetection</span>(torchvision.datasets.CocoDetection):
<span class="hljs-meta">... </span> <span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self, img_folder, image_processor, ann_file</span>):
<span class="hljs-meta">... </span> <span class="hljs-built_in">super</span>().__init__(img_folder, ann_file)
<span class="hljs-meta">... </span> self.image_processor = image_processor
<span class="hljs-meta">... </span> <span class="hljs-keyword">def</span> <span class="hljs-title function_">__getitem__</span>(<span class="hljs-params">self, idx</span>):
<span class="hljs-meta">... </span> <span class="hljs-comment"># read in PIL image and target in COCO format</span>
<span class="hljs-meta">... </span> img, target = <span class="hljs-built_in">super</span>(CocoDetection, self).__getitem__(idx)
<span class="hljs-meta">... </span> <span class="hljs-comment"># preprocess image and target: converting target to DETR format,</span>
<span class="hljs-meta">... </span> <span class="hljs-comment"># resizing + normalization of both image and target)</span>
<span class="hljs-meta">... </span> image_id = self.ids[idx]
<span class="hljs-meta">... </span> target = {<span class="hljs-string">&quot;image_id&quot;</span>: image_id, <span class="hljs-string">&quot;annotations&quot;</span>: target}
<span class="hljs-meta">... </span> encoding = self.image_processor(images=img, annotations=target, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)
<span class="hljs-meta">... </span> pixel_values = encoding[<span class="hljs-string">&quot;pixel_values&quot;</span>].squeeze() <span class="hljs-comment"># remove batch dimension</span>
<span class="hljs-meta">... </span> target = encoding[<span class="hljs-string">&quot;labels&quot;</span>][<span class="hljs-number">0</span>] <span class="hljs-comment"># remove batch dimension</span>
<span class="hljs-meta">... </span> <span class="hljs-keyword">return</span> {<span class="hljs-string">&quot;pixel_values&quot;</span>: pixel_values, <span class="hljs-string">&quot;labels&quot;</span>: target}
<span class="hljs-meta">&gt;&gt;&gt; </span>im_processor = AutoImageProcessor.from_pretrained(<span class="hljs-string">&quot;devonho/detr-resnet-50_finetuned_cppe5&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>path_output_cppe5, path_anno = save_cppe5_annotation_file_images(cppe5[<span class="hljs-string">&quot;test&quot;</span>])
<span class="hljs-meta">&gt;&gt;&gt; </span>test_ds_coco_format = CocoDetection(path_output_cppe5, im_processor, path_anno)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-v8f9p3">마지막으로, 평가 지표를 가져와서 평가를 실행합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> evaluate
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> tqdm <span class="hljs-keyword">import</span> tqdm
<span class="hljs-meta">&gt;&gt;&gt; </span>model = AutoModelForObjectDetection.from_pretrained(<span class="hljs-string">&quot;devonho/detr-resnet-50_finetuned_cppe5&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>module = evaluate.load(<span class="hljs-string">&quot;ybelkada/cocoevaluate&quot;</span>, coco=test_ds_coco_format.coco)
<span class="hljs-meta">&gt;&gt;&gt; </span>val_dataloader = torch.utils.data.DataLoader(
<span class="hljs-meta">... </span> test_ds_coco_format, batch_size=<span class="hljs-number">8</span>, shuffle=<span class="hljs-literal">False</span>, num_workers=<span class="hljs-number">4</span>, collate_fn=collate_fn
<span class="hljs-meta">... </span>)
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> <span class="hljs-keyword">for</span> idx, batch <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(tqdm(val_dataloader)):
<span class="hljs-meta">... </span> pixel_values = batch[<span class="hljs-string">&quot;pixel_values&quot;</span>]
<span class="hljs-meta">... </span> pixel_mask = batch[<span class="hljs-string">&quot;pixel_mask&quot;</span>]
<span class="hljs-meta">... </span> labels = [
<span class="hljs-meta">... </span> {k: v <span class="hljs-keyword">for</span> k, v <span class="hljs-keyword">in</span> t.items()} <span class="hljs-keyword">for</span> t <span class="hljs-keyword">in</span> batch[<span class="hljs-string">&quot;labels&quot;</span>]
<span class="hljs-meta">... </span> ] <span class="hljs-comment"># these are in DETR format, resized + normalized</span>
<span class="hljs-meta">... </span> <span class="hljs-comment"># forward pass</span>
<span class="hljs-meta">... </span> outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask)
<span class="hljs-meta">... </span> orig_target_sizes = torch.stack([target[<span class="hljs-string">&quot;orig_size&quot;</span>] <span class="hljs-keyword">for</span> target <span class="hljs-keyword">in</span> labels], dim=<span class="hljs-number">0</span>)
<span class="hljs-meta">... </span> results = im_processor.post_process(outputs, orig_target_sizes) <span class="hljs-comment"># convert outputs of model to Pascal VOC format (xmin, ymin, xmax, ymax)</span>
<span class="hljs-meta">... </span> module.add(prediction=results, reference=labels)
<span class="hljs-meta">... </span> <span class="hljs-keyword">del</span> batch
<span class="hljs-meta">&gt;&gt;&gt; </span>results = module.compute()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-built_in">print</span>(results)
Accumulating evaluation results...
DONE (t=<span class="hljs-number">0.08</span>s).
IoU metric: bbox
Average Precision (AP) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= <span class="hljs-built_in">all</span> | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.352</span>
Average Precision (AP) @[ IoU=<span class="hljs-number">0.50</span> | area= <span class="hljs-built_in">all</span> | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.681</span>
Average Precision (AP) @[ IoU=<span class="hljs-number">0.75</span> | area= <span class="hljs-built_in">all</span> | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.292</span>
Average Precision (AP) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= small | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.168</span>
Average Precision (AP) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area=medium | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.208</span>
Average Precision (AP) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= large | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.429</span>
Average Recall (AR) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= <span class="hljs-built_in">all</span> | maxDets= <span class="hljs-number">1</span> ] = <span class="hljs-number">0.274</span>
Average Recall (AR) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= <span class="hljs-built_in">all</span> | maxDets= <span class="hljs-number">10</span> ] = <span class="hljs-number">0.484</span>
Average Recall (AR) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= <span class="hljs-built_in">all</span> | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.501</span>
Average Recall (AR) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= small | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.191</span>
Average Recall (AR) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area=medium | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.323</span>
Average Recall (AR) @[ IoU=<span class="hljs-number">0.50</span>:<span class="hljs-number">0.95</span> | area= large | maxDets=<span class="hljs-number">100</span> ] = <span class="hljs-number">0.590</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-hm0n81">이러한 결과는 <code>TrainingArguments</code>의 하이퍼파라미터를 조정하여 더욱 개선될 수 있습니다. 한번 시도해 보세요!</p> <h2 class="relative group"><a id="inference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#inference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>추론하기</span></h2> <p data-svelte-h="svelte-1ltlv9s">DETR 모델을 미세 조정 및 평가하고, 허깅페이스 허브에 업로드 했으므로 추론에 사용할 수 있습니다.</p> <p data-svelte-h="svelte-1l3j011">미세 조정된 모델을 추론에 사용하는 가장 간단한 방법은 <code>pipeline()</code>에서 모델을 사용하는 것입니다.
모델과 함께 객체 탐지를 위한 파이프라인을 인스턴스화하고, 이미지를 전달하세요:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> requests
<span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;https://i.imgur.com/2lnWoly.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>image = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>obj_detector = pipeline(<span class="hljs-string">&quot;object-detection&quot;</span>, model=<span class="hljs-string">&quot;devonho/detr-resnet-50_finetuned_cppe5&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>obj_detector(image)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-pg1zl2">만약 원한다면 수동으로 <code>pipeline</code>의 결과를 재현할 수 있습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>image_processor = AutoImageProcessor.from_pretrained(<span class="hljs-string">&quot;devonho/detr-resnet-50_finetuned_cppe5&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>model = AutoModelForObjectDetection.from_pretrained(<span class="hljs-string">&quot;devonho/detr-resnet-50_finetuned_cppe5&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> inputs = image_processor(images=image, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([image.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = image_processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.5</span>, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> score, label, box <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(results[<span class="hljs-string">&quot;scores&quot;</span>], results[<span class="hljs-string">&quot;labels&quot;</span>], results[<span class="hljs-string">&quot;boxes&quot;</span>]):
<span class="hljs-meta">... </span> box = [<span class="hljs-built_in">round</span>(i, <span class="hljs-number">2</span>) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> box.tolist()]
<span class="hljs-meta">... </span> <span class="hljs-built_in">print</span>(
<span class="hljs-meta">... </span> <span class="hljs-string">f&quot;Detected <span class="hljs-subst">{model.config.id2label[label.item()]}</span> with confidence &quot;</span>
<span class="hljs-meta">... </span> <span class="hljs-string">f&quot;<span class="hljs-subst">{<span class="hljs-built_in">round</span>(score.item(), <span class="hljs-number">3</span>)}</span> at location <span class="hljs-subst">{box}</span>&quot;</span>
<span class="hljs-meta">... </span> )
Detected Coverall <span class="hljs-keyword">with</span> confidence <span class="hljs-number">0.566</span> at location [<span class="hljs-number">1215.32</span>, <span class="hljs-number">147.38</span>, <span class="hljs-number">4401.81</span>, <span class="hljs-number">3227.08</span>]
Detected Mask <span class="hljs-keyword">with</span> confidence <span class="hljs-number">0.584</span> at location [<span class="hljs-number">2449.06</span>, <span class="hljs-number">823.19</span>, <span class="hljs-number">3256.43</span>, <span class="hljs-number">1413.9</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ha2nzy">결과를 시각화하겠습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image)
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> score, label, box <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(results[<span class="hljs-string">&quot;scores&quot;</span>], results[<span class="hljs-string">&quot;labels&quot;</span>], results[<span class="hljs-string">&quot;boxes&quot;</span>]):
<span class="hljs-meta">... </span> box = [<span class="hljs-built_in">round</span>(i, <span class="hljs-number">2</span>) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> box.tolist()]
<span class="hljs-meta">... </span> x, y, x2, y2 = <span class="hljs-built_in">tuple</span>(box)
<span class="hljs-meta">... </span> draw.rectangle((x, y, x2, y2), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((x, y), model.config.id2label[label.item()], fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-16oi5q2"><img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"></div> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/ko/tasks/object_detection.md" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_1hrx8 = {
assets: "/docs/transformers/main/ko",
base: "/docs/transformers/main/ko",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/transformers/main/ko/_app/immutable/entry/start.9aa88961.js"),
import("/docs/transformers/main/ko/_app/immutable/entry/app.84fb67c3.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 75],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
97.7 kB
·
Xet hash:
61733140332456e0ea2e2b1fe1d21f74621cd846d68e9100b3fcf9e9dab63cce

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.