Buckets:

hf-doc-build/doc / transformers /v4.30.2 /ko /tasks /zero_shot_object_detection.html
rtrm's picture
download
raw
50 kB
<meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{&quot;local&quot;:&quot;zeroshot-object-detection&quot;,&quot;sections&quot;:[{&quot;local&quot;:&quot;zeroshot-object-detection-pipeline&quot;,&quot;title&quot;:&quot;제로샷(zero-shot) 객체 탐지 파이프라인&quot;},{&quot;local&quot;:&quot;textprompted-zeroshot-object-detection-by-hand&quot;,&quot;title&quot;:&quot;텍스트 프롬프트 기반 객체 탐지&quot;},{&quot;local&quot;:&quot;batch-processing&quot;,&quot;title&quot;:&quot;일괄 처리&quot;},{&quot;local&quot;:&quot;imageguided-object-detection&quot;,&quot;title&quot;:&quot;이미지 가이드 객체 탐지&quot;}],&quot;title&quot;:&quot;제로샷(zero-shot) 객체 탐지&quot;}" data-svelte="svelte-1phssyn">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/assets/pages/__layout.svelte-hf-doc-builder.css">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/start-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/chunks/vendor-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/chunks/paths-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/pages/__layout.svelte-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/pages/tasks/zero_shot_object_detection.mdx-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/chunks/IconCopyLink-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/chunks/CodeBlock-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/transformers/v4.30.2/ko/_app/chunks/DocNotebookDropdown-hf-doc-builder.js">
<h1 class="relative group"><a id="zeroshot-object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#zeroshot-object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>제로샷(zero-shot) 객체 탐지
</span></h1>
<div class="flex space-x-1 absolute z-10 right-0 top-0">
<div class="relative colab-dropdown ">
<button class=" " type="button">
<img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg">
</button>
</div>
<div class="relative colab-dropdown ">
<button class=" " type="button">
<img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg">
</button>
</div></div>
<p>일반적으로 <a href="object_detection">객체 탐지</a>에 사용되는 모델을 학습하기 위해서는 레이블이 지정된 이미지 데이터 세트가 필요합니다.
그리고 학습 데이터에 존재하는 클래스(레이블)만 탐지할 수 있다는 한계점이 있습니다.</p>
<p>다른 방식을 사용하는 <a href="../model_doc/owlvit">OWL-ViT</a> 모델로 제로샷 객체 탐지가 가능합니다.
OWL-ViT는 개방형 어휘(open-vocabulary) 객체 탐지기입니다.
즉, 레이블이 지정된 데이터 세트에 미세 조정하지 않고 자유 텍스트 쿼리를 기반으로 이미지에서 객체를 탐지할 수 있습니다.</p>
<p>OWL-ViT 모델은 멀티 모달 표현을 활용해 개방형 어휘 탐지(open-vocabulary detection)를 수행합니다.
<a href="../model_doc/clip">CLIP</a> 모델에 경량화(lightweight)된 객체 분류와 지역화(localization) 헤드를 결합합니다.
개방형 어휘 탐지는 CLIP의 텍스트 인코더로 free-text 쿼리를 임베딩하고, 객체 분류와 지역화 헤드의 입력으로 사용합니다.
이미지와 해당 텍스트 설명을 연결하면 ViT가 이미지 패치(image patches)를 입력으로 처리합니다.
OWL-ViT 모델의 저자들은 CLIP 모델을 처음부터 학습(scratch learning)한 후에, bipartite matching loss를 사용하여 표준 객체 인식 데이터셋으로 OWL-ViT 모델을 미세 조정했습니다.</p>
<p>이 접근 방식을 사용하면 모델은 레이블이 지정된 데이터 세트에 대한 사전 학습 없이도 텍스트 설명을 기반으로 객체를 탐지할 수 있습니다.</p>
<p>이번 가이드에서는 OWL-ViT 모델의 사용법을 다룰 것입니다:</p>
<ul><li>텍스트 프롬프트 기반 객체 탐지</li>
<li>일괄 객체 탐지</li>
<li>이미지 가이드 객체 탐지</li></ul>
<p>시작하기 전에 필요한 라이브러리가 모두 설치되어 있는지 확인하세요:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START -->pip install -q transformers<!-- HTML_TAG_END --></pre></div>
<h2 class="relative group"><a id="zeroshot-object-detection-pipeline" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#zeroshot-object-detection-pipeline"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>제로샷(zero-shot) 객체 탐지 파이프라인
</span></h2>
<p><code>pipeline()</code>을 활용하면 가장 간단하게 OWL-ViT 모델을 추론해볼 수 있습니다.
<a href="https://huggingface.co/models?pipeline_tag=zero-shot-image-classification&sort=downloads" rel="nofollow">Hugging Face Hub에 업로드된 체크포인트</a>에서 제로샷(zero-shot) 객체 탐지용 파이프라인을 인스턴스화합니다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline
<span class="hljs-meta">&gt;&gt;&gt; </span>checkpoint = <span class="hljs-string">&quot;google/owlvit-base-patch32&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>detector = pipeline(model=checkpoint, task=<span class="hljs-string">&quot;zero-shot-object-detection&quot;</span>)<!-- HTML_TAG_END --></pre></div>
<p>다음으로, 객체를 탐지하고 싶은 이미지를 선택하세요.
여기서는 <a href="https://www.nasa.gov/multimedia/imagegallery/index.html" rel="nofollow">NASA</a> Great Images 데이터 세트의 일부인 우주비행사 에일린 콜린스(Eileen Collins) 사진을 사용하겠습니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> skimage
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-meta">&gt;&gt;&gt; </span>image = skimage.data.astronaut()
<span class="hljs-meta">&gt;&gt;&gt; </span>image = Image.fromarray(np.uint8(image)).convert(<span class="hljs-string">&quot;RGB&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_1.png" alt="Astronaut Eileen Collins"></div>
<p>이미지와 해당 이미지의 후보 레이블을 파이프라인으로 전달합니다.
여기서는 이미지를 직접 전달하지만, 컴퓨터에 저장된 이미지의 경로나 url로 전달할 수도 있습니다.
candidate_labels는 이 예시처럼 간단한 단어일 수도 있고 좀 더 설명적인 단어일 수도 있습니다.
또한, 이미지를 검색(query)하려는 모든 항목에 대한 텍스트 설명도 전달합니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>predictions = detector(
<span class="hljs-meta">... </span> image,
<span class="hljs-meta">... </span> candidate_labels=[<span class="hljs-string">&quot;human face&quot;</span>, <span class="hljs-string">&quot;rocket&quot;</span>, <span class="hljs-string">&quot;nasa badge&quot;</span>, <span class="hljs-string">&quot;star-spangled banner&quot;</span>],
<span class="hljs-meta">... </span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>predictions
[{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.3571370542049408</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;human face&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">180</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">71</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">271</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">178</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.28099656105041504</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;nasa badge&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">129</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">348</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">206</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">427</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.2110239565372467</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;rocket&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">350</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: -<span class="hljs-number">1</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">468</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">288</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.13790413737297058</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;star-spangled banner&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">1</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">1</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">105</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">509</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.11950037628412247</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;nasa badge&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">277</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">338</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">327</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">380</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.10649408400058746</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;rocket&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">358</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">64</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">424</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">280</span>}}]<!-- HTML_TAG_END --></pre></div>
<p>이제 예측값을 시각화해봅시다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> ImageDraw
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image)
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> prediction <span class="hljs-keyword">in</span> predictions:
<span class="hljs-meta">... </span> box = prediction[<span class="hljs-string">&quot;box&quot;</span>]
<span class="hljs-meta">... </span> label = prediction[<span class="hljs-string">&quot;label&quot;</span>]
<span class="hljs-meta">... </span> score = prediction[<span class="hljs-string">&quot;score&quot;</span>]
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box.values()
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{label}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_2.png" alt="Visualized predictions on NASA image"></div>
<h2 class="relative group"><a id="textprompted-zeroshot-object-detection-by-hand" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#textprompted-zeroshot-object-detection-by-hand"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>텍스트 프롬프트 기반 객체 탐지
</span></h2>
<p>제로샷 객체 탐지 파이프라인 사용법에 대해 살펴보았으니, 이제 동일한 결과를 복제해보겠습니다.</p>
<p><a href="https://huggingface.co/models?other=owlvit" rel="nofollow">Hugging Face Hub에 업로드된 체크포인트</a>에서 관련 모델과 프로세서를 가져오는 것으로 시작합니다.
여기서는 이전과 동일한 체크포인트를 사용하겠습니다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor, AutoModelForZeroShotObjectDetection
<span class="hljs-meta">&gt;&gt;&gt; </span>model = AutoModelForZeroShotObjectDetection.from_pretrained(checkpoint)
<span class="hljs-meta">&gt;&gt;&gt; </span>processor = AutoProcessor.from_pretrained(checkpoint)<!-- HTML_TAG_END --></pre></div>
<p>다른 이미지를 사용해 보겠습니다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> requests
<span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;https://unsplash.com/photos/oj0zeY2Ltk4/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MTR8fHBpY25pY3xlbnwwfHx8fDE2Nzc0OTE1NDk&amp;force=true&amp;w=640&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>im = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>im<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_3.png" alt="Beach photo"></div>
<p>프로세서를 사용해 모델의 입력을 준비합니다.
프로세서는 모델의 입력으로 사용하기 위해 이미지 크기를 변환하고 정규화하는 이미지 프로세서와 텍스트 입력을 처리하는 <code>CLIPTokenizer</code>로 구성됩니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>text_queries = [<span class="hljs-string">&quot;hat&quot;</span>, <span class="hljs-string">&quot;book&quot;</span>, <span class="hljs-string">&quot;sunglasses&quot;</span>, <span class="hljs-string">&quot;camera&quot;</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(text=text_queries, images=im, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div>
<p>모델에 입력을 전달하고 결과를 후처리 및 시각화합니다.
이미지 프로세서가 모델에 이미지를 입력하기 전에 이미지 크기를 조정했기 때문에, <code>post_process_object_detection()</code> 메소드를 사용해
예측값의 바운딩 박스(bounding box)가 원본 이미지의 좌표와 상대적으로 동일한지 확인해야 합니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> torch
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([im.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.1</span>, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(im)
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>labels = results[<span class="hljs-string">&quot;labels&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{text_queries[label]}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>im<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_4.png" alt="Beach photo with detected objects"></div>
<h2 class="relative group"><a id="batch-processing" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#batch-processing"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>일괄 처리
</span></h2>
<p>여러 이미지와 텍스트 쿼리를 전달하여 여러 이미지에서 서로 다른(또는 동일한) 객체를 검색할 수 있습니다.
일괄 처리를 위해서 텍스트 쿼리는 이중 리스트로, 이미지는 PIL 이미지, PyTorch 텐서, 또는 NumPy 배열로 이루어진 리스트로 프로세서에 전달해야 합니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>images = [image, im]
<span class="hljs-meta">&gt;&gt;&gt; </span>text_queries = [
<span class="hljs-meta">... </span> [<span class="hljs-string">&quot;human face&quot;</span>, <span class="hljs-string">&quot;rocket&quot;</span>, <span class="hljs-string">&quot;nasa badge&quot;</span>, <span class="hljs-string">&quot;star-spangled banner&quot;</span>],
<span class="hljs-meta">... </span> [<span class="hljs-string">&quot;hat&quot;</span>, <span class="hljs-string">&quot;book&quot;</span>, <span class="hljs-string">&quot;sunglasses&quot;</span>, <span class="hljs-string">&quot;camera&quot;</span>],
<span class="hljs-meta">... </span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(text=text_queries, images=images, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div>
<p>이전에는 후처리를 위해 단일 이미지의 크기를 텐서로 전달했지만, 튜플을 전달할 수 있고, 여러 이미지를 처리하는 경우에는 튜플로 이루어진 리스트를 전달할 수도 있습니다.
아래 두 예제에 대한 예측을 생성하고, 두 번째 이미지(<code>image_idx = 1</code>)를 시각화해 보겠습니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = [x.size[::-<span class="hljs-number">1</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> images]
<span class="hljs-meta">... </span> results = processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.1</span>, target_sizes=target_sizes)
<span class="hljs-meta">&gt;&gt;&gt; </span>image_idx = <span class="hljs-number">1</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(images[image_idx])
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[image_idx][<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>labels = results[image_idx][<span class="hljs-string">&quot;labels&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[image_idx][<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{text_queries[image_idx][label]}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>images[image_idx]<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_4.png" alt="Beach photo with detected objects"></div>
<h2 class="relative group"><a id="imageguided-object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#imageguided-object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>이미지 가이드 객체 탐지
</span></h2>
<p>텍스트 쿼리를 이용한 제로샷 객체 탐지 외에도 OWL-ViT 모델은 이미지 가이드 객체 탐지 기능을 제공합니다.
이미지를 쿼리로 사용해 대상 이미지에서 유사한 객체를 찾을 수 있다는 의미입니다.
텍스트 쿼리와 달리 하나의 예제 이미지에서만 가능합니다.</p>
<p>소파에 고양이 두 마리가 있는 이미지를 대상 이미지(target image)로, 고양이 한 마리가 있는 이미지를 쿼리로 사용해보겠습니다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;http://images.cocodataset.org/val2017/000000039769.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>image_target = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>query_url = <span class="hljs-string">&quot;http://images.cocodataset.org/val2017/000000524280.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>query_image = Image.<span class="hljs-built_in">open</span>(requests.get(query_url, stream=<span class="hljs-literal">True</span>).raw)<!-- HTML_TAG_END --></pre></div>
<p>다음 이미지를 살펴보겠습니다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-meta">&gt;&gt;&gt; </span>fig, ax = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>ax[<span class="hljs-number">0</span>].imshow(image_target)
<span class="hljs-meta">&gt;&gt;&gt; </span>ax[<span class="hljs-number">1</span>].imshow(query_image)<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_5.png" alt="Cats"></div>
<p>전처리 단계에서 텍스트 쿼리 대신에 <code>query_images</code>를 사용합니다:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(images=image_target, query_images=query_image, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div>
<p>예측의 경우, 모델에 입력을 전달하는 대신 <code>image_guided_detection()</code>에 전달합니다.
레이블이 없다는 점을 제외하면 이전과 동일합니다.
이전과 동일하게 이미지를 시각화합니다.</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model.image_guided_detection(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([image_target.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = processor.post_process_image_guided_detection(outputs=outputs, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image_target)
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;white&quot;</span>, width=<span class="hljs-number">4</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image_target<!-- HTML_TAG_END --></pre></div>
<div class="flex justify-center"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_6.png" alt="Cats with bounding boxes"></div>
<p>OWL-ViT 모델을 추론하고 싶다면 아래 데모를 확인하세요:</p>
<iframe src="https://adirik-owl-vit.hf.space" frameborder="0" width="850" height="450"></iframe>
<script type="module" data-hydrate="n6c9o4">
import { start } from "/docs/transformers/v4.30.2/ko/_app/start-hf-doc-builder.js";
start({
target: document.querySelector('[data-hydrate="n6c9o4"]').parentNode,
paths: {"base":"/docs/transformers/v4.30.2/ko","assets":"/docs/transformers/v4.30.2/ko"},
session: {},
route: false,
spa: false,
trailing_slash: "never",
hydrate: {
status: 200,
error: null,
nodes: [
import("/docs/transformers/v4.30.2/ko/_app/pages/__layout.svelte-hf-doc-builder.js"),
import("/docs/transformers/v4.30.2/ko/_app/pages/tasks/zero_shot_object_detection.mdx-hf-doc-builder.js")
],
params: {}
}
});
</script>

Xet Storage Details

Size:
50 kB
·
Xet hash:
080eb52d83156ff27dc4c7ffc9ea7bc9996f1c5b2889c9b1bc4996d868471e2f

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.