Buckets:

hf-doc-build/doc-dev / transformers /main /ko /tasks /zero_shot_object_detection.html
rtrm's picture
download
raw
53 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;제로샷(zero-shot) 객체 탐지&quot;,&quot;local&quot;:&quot;zeroshot-object-detection&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;제로샷(zero-shot) 객체 탐지 파이프라인&quot;,&quot;local&quot;:&quot;zeroshot-object-detection-pipeline&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;텍스트 프롬프트 기반 객체 탐지&quot;,&quot;local&quot;:&quot;textprompted-zeroshot-object-detection-by-hand&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;일괄 처리&quot;,&quot;local&quot;:&quot;batch-processing&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;이미지 가이드 객체 탐지&quot;,&quot;local&quot;:&quot;imageguided-object-detection&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/transformers/main/ko/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/entry/start.9aa88961.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/scheduler.9bc65507.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/singletons.9eec45c3.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/index.3b203c72.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/paths.566078f7.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/entry/app.84fb67c3.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/index.707bf1b6.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/nodes/0.1c99376b.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/nodes/86.e3f144f4.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/CodeBlock.54a9f38d.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/DocNotebookDropdown.41f65cb5.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/globals.7f7f1b26.js">
<link rel="modulepreload" href="/docs/transformers/main/ko/_app/immutable/chunks/EditOnGithub.922df6ba.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;제로샷(zero-shot) 객체 탐지&quot;,&quot;local&quot;:&quot;zeroshot-object-detection&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;제로샷(zero-shot) 객체 탐지 파이프라인&quot;,&quot;local&quot;:&quot;zeroshot-object-detection-pipeline&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;텍스트 프롬프트 기반 객체 탐지&quot;,&quot;local&quot;:&quot;textprompted-zeroshot-object-detection-by-hand&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;일괄 처리&quot;,&quot;local&quot;:&quot;batch-processing&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;이미지 가이드 객체 탐지&quot;,&quot;local&quot;:&quot;imageguided-object-detection&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="zeroshot-object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#zeroshot-object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>제로샷(zero-shot) 객체 탐지</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <p data-svelte-h="svelte-6eje4v">일반적으로 <a href="object_detection">객체 탐지</a>에 사용되는 모델을 학습하기 위해서는 레이블이 지정된 이미지 데이터 세트가 필요합니다.
그리고 학습 데이터에 존재하는 클래스(레이블)만 탐지할 수 있다는 한계점이 있습니다.</p> <p data-svelte-h="svelte-17o36b3">다른 방식을 사용하는 <a href="../model_doc/owlvit">OWL-ViT</a> 모델로 제로샷 객체 탐지가 가능합니다.
OWL-ViT는 개방형 어휘(open-vocabulary) 객체 탐지기입니다.
즉, 레이블이 지정된 데이터 세트에 미세 조정하지 않고 자유 텍스트 쿼리를 기반으로 이미지에서 객체를 탐지할 수 있습니다.</p> <p data-svelte-h="svelte-1148tws">OWL-ViT 모델은 멀티 모달 표현을 활용해 개방형 어휘 탐지(open-vocabulary detection)를 수행합니다.
<a href="../model_doc/clip">CLIP</a> 모델에 경량화(lightweight)된 객체 분류와 지역화(localization) 헤드를 결합합니다.
개방형 어휘 탐지는 CLIP의 텍스트 인코더로 free-text 쿼리를 임베딩하고, 객체 분류와 지역화 헤드의 입력으로 사용합니다.
이미지와 해당 텍스트 설명을 연결하면 ViT가 이미지 패치(image patches)를 입력으로 처리합니다.
OWL-ViT 모델의 저자들은 CLIP 모델을 처음부터 학습(scratch learning)한 후에, bipartite matching loss를 사용하여 표준 객체 인식 데이터셋으로 OWL-ViT 모델을 미세 조정했습니다.</p> <p data-svelte-h="svelte-1au2n2f">이 접근 방식을 사용하면 모델은 레이블이 지정된 데이터 세트에 대한 사전 학습 없이도 텍스트 설명을 기반으로 객체를 탐지할 수 있습니다.</p> <p data-svelte-h="svelte-114e4n4">이번 가이드에서는 OWL-ViT 모델의 사용법을 다룰 것입니다:</p> <ul data-svelte-h="svelte-m6pgk5"><li>텍스트 프롬프트 기반 객체 탐지</li> <li>일괄 객체 탐지</li> <li>이미지 가이드 객체 탐지</li></ul> <p data-svelte-h="svelte-1k0z9pm">시작하기 전에 필요한 라이브러리가 모두 설치되어 있는지 확인하세요:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install -q transformers<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="zeroshot-object-detection-pipeline" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#zeroshot-object-detection-pipeline"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>제로샷(zero-shot) 객체 탐지 파이프라인</span></h2> <p data-svelte-h="svelte-1nqv12v"><code>pipeline()</code>을 활용하면 가장 간단하게 OWL-ViT 모델을 추론해볼 수 있습니다.
<a href="https://huggingface.co/models?pipeline_tag=zero-shot-image-classification&sort=downloads" rel="nofollow">Hugging Face Hub에 업로드된 체크포인트</a>에서 제로샷(zero-shot) 객체 탐지용 파이프라인을 인스턴스화합니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline
<span class="hljs-meta">&gt;&gt;&gt; </span>checkpoint = <span class="hljs-string">&quot;google/owlvit-base-patch32&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>detector = pipeline(model=checkpoint, task=<span class="hljs-string">&quot;zero-shot-object-detection&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1p9c2je">다음으로, 객체를 탐지하고 싶은 이미지를 선택하세요.
여기서는 <a href="https://www.nasa.gov/multimedia/imagegallery/index.html" rel="nofollow">NASA</a> Great Images 데이터 세트의 일부인 우주비행사 에일린 콜린스(Eileen Collins) 사진을 사용하겠습니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> skimage
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-meta">&gt;&gt;&gt; </span>image = skimage.data.astronaut()
<span class="hljs-meta">&gt;&gt;&gt; </span>image = Image.fromarray(np.uint8(image)).convert(<span class="hljs-string">&quot;RGB&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-17qmfee"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_1.png" alt="Astronaut Eileen Collins"></div> <p data-svelte-h="svelte-1argm47">이미지와 해당 이미지의 후보 레이블을 파이프라인으로 전달합니다.
여기서는 이미지를 직접 전달하지만, 컴퓨터에 저장된 이미지의 경로나 url로 전달할 수도 있습니다.
candidate_labels는 이 예시처럼 간단한 단어일 수도 있고 좀 더 설명적인 단어일 수도 있습니다.
또한, 이미지를 검색(query)하려는 모든 항목에 대한 텍스트 설명도 전달합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>predictions = detector(
<span class="hljs-meta">... </span> image,
<span class="hljs-meta">... </span> candidate_labels=[<span class="hljs-string">&quot;human face&quot;</span>, <span class="hljs-string">&quot;rocket&quot;</span>, <span class="hljs-string">&quot;nasa badge&quot;</span>, <span class="hljs-string">&quot;star-spangled banner&quot;</span>],
<span class="hljs-meta">... </span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>predictions
[{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.3571370542049408</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;human face&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">180</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">71</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">271</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">178</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.28099656105041504</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;nasa badge&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">129</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">348</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">206</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">427</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.2110239565372467</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;rocket&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">350</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: -<span class="hljs-number">1</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">468</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">288</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.13790413737297058</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;star-spangled banner&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">1</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">1</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">105</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">509</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.11950037628412247</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;nasa badge&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">277</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">338</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">327</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">380</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.10649408400058746</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;rocket&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">358</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">64</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">424</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">280</span>}}]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1uqzhy9">이제 예측값을 시각화해봅시다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> ImageDraw
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image)
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> prediction <span class="hljs-keyword">in</span> predictions:
<span class="hljs-meta">... </span> box = prediction[<span class="hljs-string">&quot;box&quot;</span>]
<span class="hljs-meta">... </span> label = prediction[<span class="hljs-string">&quot;label&quot;</span>]
<span class="hljs-meta">... </span> score = prediction[<span class="hljs-string">&quot;score&quot;</span>]
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box.values()
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{label}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1fwpqdn"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_2.png" alt="Visualized predictions on NASA image"></div> <h2 class="relative group"><a id="textprompted-zeroshot-object-detection-by-hand" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#textprompted-zeroshot-object-detection-by-hand"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>텍스트 프롬프트 기반 객체 탐지</span></h2> <p data-svelte-h="svelte-1ipgrv9">제로샷 객체 탐지 파이프라인 사용법에 대해 살펴보았으니, 이제 동일한 결과를 복제해보겠습니다.</p> <p data-svelte-h="svelte-qzn480"><a href="https://huggingface.co/models?other=owlvit" rel="nofollow">Hugging Face Hub에 업로드된 체크포인트</a>에서 관련 모델과 프로세서를 가져오는 것으로 시작합니다.
여기서는 이전과 동일한 체크포인트를 사용하겠습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor, AutoModelForZeroShotObjectDetection
<span class="hljs-meta">&gt;&gt;&gt; </span>model = AutoModelForZeroShotObjectDetection.from_pretrained(checkpoint)
<span class="hljs-meta">&gt;&gt;&gt; </span>processor = AutoProcessor.from_pretrained(checkpoint)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1ist5bc">다른 이미지를 사용해 보겠습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> requests
<span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;https://unsplash.com/photos/oj0zeY2Ltk4/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MTR8fHBpY25pY3xlbnwwfHx8fDE2Nzc0OTE1NDk&amp;force=true&amp;w=640&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>im = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>im<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-owux8y"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_3.png" alt="Beach photo"></div> <p data-svelte-h="svelte-18gx5db">프로세서를 사용해 모델의 입력을 준비합니다.
프로세서는 모델의 입력으로 사용하기 위해 이미지 크기를 변환하고 정규화하는 이미지 프로세서와 텍스트 입력을 처리하는 <code>CLIPTokenizer</code>로 구성됩니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>text_queries = [<span class="hljs-string">&quot;hat&quot;</span>, <span class="hljs-string">&quot;book&quot;</span>, <span class="hljs-string">&quot;sunglasses&quot;</span>, <span class="hljs-string">&quot;camera&quot;</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(text=text_queries, images=im, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-5kpzn3">모델에 입력을 전달하고 결과를 후처리 및 시각화합니다.
이미지 프로세서가 모델에 이미지를 입력하기 전에 이미지 크기를 조정했기 때문에, <code>post_process_object_detection()</code> 메소드를 사용해
예측값의 바운딩 박스(bounding box)가 원본 이미지의 좌표와 상대적으로 동일한지 확인해야 합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> torch
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([im.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.1</span>, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(im)
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>labels = results[<span class="hljs-string">&quot;labels&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{text_queries[label]}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>im<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1m863ar"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_4.png" alt="Beach photo with detected objects"></div> <h2 class="relative group"><a id="batch-processing" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#batch-processing"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>일괄 처리</span></h2> <p data-svelte-h="svelte-xr561s">여러 이미지와 텍스트 쿼리를 전달하여 여러 이미지에서 서로 다른(또는 동일한) 객체를 검색할 수 있습니다.
일괄 처리를 위해서 텍스트 쿼리는 이중 리스트로, 이미지는 PIL 이미지, PyTorch 텐서, 또는 NumPy 배열로 이루어진 리스트로 프로세서에 전달해야 합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>images = [image, im]
<span class="hljs-meta">&gt;&gt;&gt; </span>text_queries = [
<span class="hljs-meta">... </span> [<span class="hljs-string">&quot;human face&quot;</span>, <span class="hljs-string">&quot;rocket&quot;</span>, <span class="hljs-string">&quot;nasa badge&quot;</span>, <span class="hljs-string">&quot;star-spangled banner&quot;</span>],
<span class="hljs-meta">... </span> [<span class="hljs-string">&quot;hat&quot;</span>, <span class="hljs-string">&quot;book&quot;</span>, <span class="hljs-string">&quot;sunglasses&quot;</span>, <span class="hljs-string">&quot;camera&quot;</span>],
<span class="hljs-meta">... </span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(text=text_queries, images=images, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1e7zpei">이전에는 후처리를 위해 단일 이미지의 크기를 텐서로 전달했지만, 튜플을 전달할 수 있고, 여러 이미지를 처리하는 경우에는 튜플로 이루어진 리스트를 전달할 수도 있습니다.
아래 두 예제에 대한 예측을 생성하고, 두 번째 이미지(<code>image_idx = 1</code>)를 시각화해 보겠습니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = [x.size[::-<span class="hljs-number">1</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> images]
<span class="hljs-meta">... </span> results = processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.1</span>, target_sizes=target_sizes)
<span class="hljs-meta">&gt;&gt;&gt; </span>image_idx = <span class="hljs-number">1</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(images[image_idx])
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[image_idx][<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>labels = results[image_idx][<span class="hljs-string">&quot;labels&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[image_idx][<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{text_queries[image_idx][label]}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>images[image_idx]<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1m863ar"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_4.png" alt="Beach photo with detected objects"></div> <h2 class="relative group"><a id="imageguided-object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#imageguided-object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>이미지 가이드 객체 탐지</span></h2> <p data-svelte-h="svelte-1oyzclh">텍스트 쿼리를 이용한 제로샷 객체 탐지 외에도 OWL-ViT 모델은 이미지 가이드 객체 탐지 기능을 제공합니다.
이미지를 쿼리로 사용해 대상 이미지에서 유사한 객체를 찾을 수 있다는 의미입니다.
텍스트 쿼리와 달리 하나의 예제 이미지에서만 가능합니다.</p> <p data-svelte-h="svelte-3m9urc">소파에 고양이 두 마리가 있는 이미지를 대상 이미지(target image)로, 고양이 한 마리가 있는 이미지를 쿼리로 사용해보겠습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;http://images.cocodataset.org/val2017/000000039769.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>image_target = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>query_url = <span class="hljs-string">&quot;http://images.cocodataset.org/val2017/000000524280.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>query_image = Image.<span class="hljs-built_in">open</span>(requests.get(query_url, stream=<span class="hljs-literal">True</span>).raw)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-17azu4t">다음 이미지를 살펴보겠습니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-meta">&gt;&gt;&gt; </span>fig, ax = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>ax[<span class="hljs-number">0</span>].imshow(image_target)
<span class="hljs-meta">&gt;&gt;&gt; </span>ax[<span class="hljs-number">1</span>].imshow(query_image)<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-y78yu"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_5.png" alt="Cats"></div> <p data-svelte-h="svelte-1lh7bnx">전처리 단계에서 텍스트 쿼리 대신에 <code>query_images</code>를 사용합니다:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(images=image_target, query_images=query_image, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-c93o6k">예측의 경우, 모델에 입력을 전달하는 대신 <code>image_guided_detection()</code>에 전달합니다.
레이블이 없다는 점을 제외하면 이전과 동일합니다.
이전과 동일하게 이미지를 시각화합니다.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model.image_guided_detection(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([image_target.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = processor.post_process_image_guided_detection(outputs=outputs, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image_target)
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;white&quot;</span>, width=<span class="hljs-number">4</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image_target<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1f4dev0"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_6.png" alt="Cats with bounding boxes"></div> <p data-svelte-h="svelte-16bo4q7">OWL-ViT 모델을 추론하고 싶다면 아래 데모를 확인하세요:</p> <iframe src="https://adirik-owl-vit.hf.space" frameborder="0" width="850" height="450"></iframe> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/ko/tasks/zero_shot_object_detection.md" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_1hrx8 = {
assets: "/docs/transformers/main/ko",
base: "/docs/transformers/main/ko",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/transformers/main/ko/_app/immutable/entry/start.9aa88961.js"),
import("/docs/transformers/main/ko/_app/immutable/entry/app.84fb67c3.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 86],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
53 kB
·
Xet hash:
3e1dbc1a2b02298856856d4ca7c639e1961aad156fd7daa19088675ff7ceabbe

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.