Buckets:

hf-doc-build/doc-dev / transformers /main /ja /tasks /zero_shot_object_detection.html
rtrm's picture
download
raw
53.4 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Zero-shot object detection&quot;,&quot;local&quot;:&quot;zero-shot-object-detection&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Zero-shot object detection pipeline&quot;,&quot;local&quot;:&quot;zero-shot-object-detection-pipeline&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Text-prompted zero-shot object detection by hand&quot;,&quot;local&quot;:&quot;text-prompted-zero-shot-object-detection-by-hand&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Batch processing&quot;,&quot;local&quot;:&quot;batch-processing&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Image-guided object detection&quot;,&quot;local&quot;:&quot;image-guided-object-detection&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/transformers/main/ja/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/entry/start.1486e459.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/scheduler.9bc65507.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/singletons.eee55cbf.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/index.3b203c72.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/paths.59da1547.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/entry/app.d9ae818f.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/index.707bf1b6.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/nodes/0.c06aa070.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/nodes/159.8ff646d4.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/CodeBlock.54a9f38d.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/DocNotebookDropdown.41f65cb5.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/globals.7f7f1b26.js">
<link rel="modulepreload" href="/docs/transformers/main/ja/_app/immutable/chunks/EditOnGithub.922df6ba.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Zero-shot object detection&quot;,&quot;local&quot;:&quot;zero-shot-object-detection&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Zero-shot object detection pipeline&quot;,&quot;local&quot;:&quot;zero-shot-object-detection-pipeline&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Text-prompted zero-shot object detection by hand&quot;,&quot;local&quot;:&quot;text-prompted-zero-shot-object-detection-by-hand&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Batch processing&quot;,&quot;local&quot;:&quot;batch-processing&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Image-guided object detection&quot;,&quot;local&quot;:&quot;image-guided-object-detection&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="zero-shot-object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#zero-shot-object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Zero-shot object detection</span></h1> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <p data-svelte-h="svelte-1jmn15x">従来、<a href="object_detection">オブジェクト検出</a> に使用されるモデルには、トレーニング用のラベル付き画像データセットが必要でした。
トレーニング データからのクラスのセットの検出に限定されます。</p> <p data-svelte-h="svelte-316ad1">ゼロショットオブジェクト検出は、別のアプローチを使用する <a href="../model_doc/owlvit">OWL-ViT</a> モデルによってサポートされています。 OWL-ViT
オープン語彙オブジェクト検出器です。これは、フリーテキストクエリに基づいて画像内のオブジェクトを検出できることを意味します。
ラベル付きデータセットでモデルを微調整する必要性。</p> <p data-svelte-h="svelte-13iuwwr">OWL-ViTは、マルチモーダル表現を利用してオープン語彙の検出を実行します。 <a href="../model_doc/clip">CLIP</a> とを組み合わせます。
軽量のオブジェクト分類および位置特定ヘッド。オープン語彙の検出は、CLIP のテキスト エンコーダーを使用してフリーテキスト クエリを埋め込み、それらをオブジェクト分類およびローカリゼーション ヘッドへの入力として使用することによって実現されます。
画像とそれに対応するテキストの説明を関連付け、ViT は画像パッチを入力として処理します。作家たち
のOWL-ViTは、まずCLIPをゼロからトレーニングし、次に標準の物体検出データセットを使用してOWL-ViTをエンドツーエンドで微調整しました。
二部マッチング損失。</p> <p data-svelte-h="svelte-1o1n265">このアプローチを使用すると、モデルはラベル付きデータセットで事前にトレーニングしなくても、テキストの説明に基づいてオブジェクトを検出できます。</p> <p data-svelte-h="svelte-nsc34w">このガイドでは、OWL-ViT の使用方法を学習します。</p> <ul data-svelte-h="svelte-v788np"><li>テキストプロンプトに基づいてオブジェクトを検出します</li> <li>バッチオブジェクト検出用</li> <li>画像誘導物体検出用</li></ul> <p data-svelte-h="svelte-1lya3k8">始める前に、必要なライブラリがすべてインストールされていることを確認してください。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install -q transformers<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="zero-shot-object-detection-pipeline" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#zero-shot-object-detection-pipeline"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Zero-shot object detection pipeline</span></h2> <p data-svelte-h="svelte-507qa0">OWL-ViTによる推論を試す最も簡単な方法は、OWL-ViTを<a href="/docs/transformers/main/ja/main_classes/pipelines#transformers.pipeline">pipeline()</a>で使用することです。パイプラインをインスタンス化する
<a href="https://huggingface.co/models?other=owlvit" rel="nofollow">Hugging Face Hub のチェックポイント</a> からのゼロショット オブジェクト検出の場合:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> pipeline
<span class="hljs-meta">&gt;&gt;&gt; </span>checkpoint = <span class="hljs-string">&quot;google/owlvit-base-patch32&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>detector = pipeline(model=checkpoint, task=<span class="hljs-string">&quot;zero-shot-object-detection&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-t24qb7">次に、物体を検出したい画像を選択します。ここでは、宇宙飛行士アイリーン・コリンズの画像を使用します。
<a href="https://www.nasa.gov/multimedia/imagegallery/index.html" rel="nofollow">NASA</a> Great Images データセットの一部。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> skimage
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-meta">&gt;&gt;&gt; </span>image = skimage.data.astronaut()
<span class="hljs-meta">&gt;&gt;&gt; </span>image = Image.fromarray(np.uint8(image)).convert(<span class="hljs-string">&quot;RGB&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-17qmfee"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_1.png" alt="Astronaut Eileen Collins"></div> <p data-svelte-h="svelte-1pcmuvb">検索する画像と候補オブジェクトのラベルをパイプラインに渡します。
ここでは画像を直接渡します。他の適切なオプションには、画像へのローカル パスまたは画像 URL が含まれます。また、画像をクエリするすべてのアイテムのテキスト説明も渡します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>predictions = detector(
<span class="hljs-meta">... </span> image,
<span class="hljs-meta">... </span> candidate_labels=[<span class="hljs-string">&quot;human face&quot;</span>, <span class="hljs-string">&quot;rocket&quot;</span>, <span class="hljs-string">&quot;nasa badge&quot;</span>, <span class="hljs-string">&quot;star-spangled banner&quot;</span>],
<span class="hljs-meta">... </span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>predictions
[{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.3571370542049408</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;human face&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">180</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">71</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">271</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">178</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.28099656105041504</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;nasa badge&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">129</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">348</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">206</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">427</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.2110239565372467</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;rocket&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">350</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: -<span class="hljs-number">1</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">468</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">288</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.13790413737297058</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;star-spangled banner&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">1</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">1</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">105</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">509</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.11950037628412247</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;nasa badge&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">277</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">338</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">327</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">380</span>}},
{<span class="hljs-string">&#x27;score&#x27;</span>: <span class="hljs-number">0.10649408400058746</span>,
<span class="hljs-string">&#x27;label&#x27;</span>: <span class="hljs-string">&#x27;rocket&#x27;</span>,
<span class="hljs-string">&#x27;box&#x27;</span>: {<span class="hljs-string">&#x27;xmin&#x27;</span>: <span class="hljs-number">358</span>, <span class="hljs-string">&#x27;ymin&#x27;</span>: <span class="hljs-number">64</span>, <span class="hljs-string">&#x27;xmax&#x27;</span>: <span class="hljs-number">424</span>, <span class="hljs-string">&#x27;ymax&#x27;</span>: <span class="hljs-number">280</span>}}]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-3md4si">予測を視覚化してみましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> ImageDraw
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image)
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> prediction <span class="hljs-keyword">in</span> predictions:
<span class="hljs-meta">... </span> box = prediction[<span class="hljs-string">&quot;box&quot;</span>]
<span class="hljs-meta">... </span> label = prediction[<span class="hljs-string">&quot;label&quot;</span>]
<span class="hljs-meta">... </span> score = prediction[<span class="hljs-string">&quot;score&quot;</span>]
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box.values()
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{label}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1fwpqdn"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_2.png" alt="Visualized predictions on NASA image"></div> <h2 class="relative group"><a id="text-prompted-zero-shot-object-detection-by-hand" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#text-prompted-zero-shot-object-detection-by-hand"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Text-prompted zero-shot object detection by hand</span></h2> <p data-svelte-h="svelte-fghste">ゼロショット物体検出パイプラインの使用方法を確認したので、同じことを再現してみましょう。
手動で結果を取得します。</p> <p data-svelte-h="svelte-1faeuph">まず、<a href="https://huggingface.co/models?other=owlvit" rel="nofollow">Hugging Face Hub のチェックポイント</a> からモデルと関連プロセッサをロードします。
ここでは、前と同じチェックポイントを使用します。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoProcessor, AutoModelForZeroShotObjectDetection
<span class="hljs-meta">&gt;&gt;&gt; </span>model = AutoModelForZeroShotObjectDetection.from_pretrained(checkpoint)
<span class="hljs-meta">&gt;&gt;&gt; </span>processor = AutoProcessor.from_pretrained(checkpoint)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-vbbgku">気分を変えて、別の画像を撮ってみましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> requests
<span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;https://unsplash.com/photos/oj0zeY2Ltk4/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MTR8fHBpY25pY3xlbnwwfHx8fDE2Nzc0OTE1NDk&amp;force=true&amp;w=640&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>im = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>im<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-owux8y"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_3.png" alt="Beach photo"></div> <p data-svelte-h="svelte-wuah9x">プロセッサを使用してモデルの入力を準備します。プロセッサーは、
サイズ変更と正規化によるモデルの画像と、テキスト入力を処理する <a href="/docs/transformers/main/ja/model_doc/clip#transformers.CLIPTokenizer">CLIPTokenizer</a> です。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>text_queries = [<span class="hljs-string">&quot;hat&quot;</span>, <span class="hljs-string">&quot;book&quot;</span>, <span class="hljs-string">&quot;sunglasses&quot;</span>, <span class="hljs-string">&quot;camera&quot;</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(text=text_queries, images=im, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-v25fhy">入力をモデルに渡し、後処理し、結果を視覚化します。以前は画像プロセッサによって画像のサイズが変更されていたため、
それらをモデルにフィードするには、<code>post_process_object_detection()</code> メソッドを使用して、予測された境界を確認する必要があります。
ボックスは元の画像を基準とした正しい座標を持ちます。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> torch
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([im.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.1</span>, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(im)
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>labels = results[<span class="hljs-string">&quot;labels&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{text_queries[label]}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>im<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1m863ar"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_4.png" alt="Beach photo with detected objects"></div> <h2 class="relative group"><a id="batch-processing" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#batch-processing"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Batch processing</span></h2> <p data-svelte-h="svelte-1d0cvqy">複数の画像セットとテキスト クエリを渡して、複数の画像内の異なる (または同じ) オブジェクトを検索できます。
宇宙飛行士の画像とビーチの画像を組み合わせてみましょう。
バッチ処理の場合、テキスト クエリをネストされたリストとしてプロセッサに渡し、画像を PIL イメージのリストとして渡す必要があります。
PyTorch テンソル、または NumPy 配列。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>images = [image, im]
<span class="hljs-meta">&gt;&gt;&gt; </span>text_queries = [
<span class="hljs-meta">... </span> [<span class="hljs-string">&quot;human face&quot;</span>, <span class="hljs-string">&quot;rocket&quot;</span>, <span class="hljs-string">&quot;nasa badge&quot;</span>, <span class="hljs-string">&quot;star-spangled banner&quot;</span>],
<span class="hljs-meta">... </span> [<span class="hljs-string">&quot;hat&quot;</span>, <span class="hljs-string">&quot;book&quot;</span>, <span class="hljs-string">&quot;sunglasses&quot;</span>, <span class="hljs-string">&quot;camera&quot;</span>],
<span class="hljs-meta">... </span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(text=text_queries, images=images, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-7sl4ju">以前は後処理のために単一の画像のサイズをテンソルとして渡していましたが、タプルを渡すこともできます。
複数の画像のタプルのリスト。 2 つの例の予測を作成し、2 番目の例 (<code>image_idx = 1</code>) を視覚化しましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model(**inputs)
<span class="hljs-meta">... </span> target_sizes = [x.size[::-<span class="hljs-number">1</span>] <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> images]
<span class="hljs-meta">... </span> results = processor.post_process_object_detection(outputs, threshold=<span class="hljs-number">0.1</span>, target_sizes=target_sizes)
<span class="hljs-meta">&gt;&gt;&gt; </span>image_idx = <span class="hljs-number">1</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(images[image_idx])
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[image_idx][<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>labels = results[image_idx][<span class="hljs-string">&quot;labels&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[image_idx][<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;red&quot;</span>, width=<span class="hljs-number">1</span>)
<span class="hljs-meta">... </span> draw.text((xmin, ymin), <span class="hljs-string">f&quot;<span class="hljs-subst">{text_queries[image_idx][label]}</span>: <span class="hljs-subst">{<span class="hljs-built_in">round</span>(score,<span class="hljs-number">2</span>)}</span>&quot;</span>, fill=<span class="hljs-string">&quot;white&quot;</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>images[image_idx]<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1m863ar"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_4.png" alt="Beach photo with detected objects"></div> <h2 class="relative group"><a id="image-guided-object-detection" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#image-guided-object-detection"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Image-guided object detection</span></h2> <p data-svelte-h="svelte-cpc5ya">テキストクエリによるゼロショットオブジェクト検出に加えて、OWL-ViTは画像ガイドによるオブジェクト検出を提供します。これはつまり
画像クエリを使用して、ターゲット画像内の類似したオブジェクトを検索できます。
テキスト クエリとは異なり、使用できるサンプル画像は 1 つだけです。</p> <p data-svelte-h="svelte-16l357l">対象画像としてソファに2匹の猫がいる画像と、1匹の猫の画像を撮影しましょう
クエリとして:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>url = <span class="hljs-string">&quot;http://images.cocodataset.org/val2017/000000039769.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>image_target = Image.<span class="hljs-built_in">open</span>(requests.get(url, stream=<span class="hljs-literal">True</span>).raw)
<span class="hljs-meta">&gt;&gt;&gt; </span>query_url = <span class="hljs-string">&quot;http://images.cocodataset.org/val2017/000000524280.jpg&quot;</span>
<span class="hljs-meta">&gt;&gt;&gt; </span>query_image = Image.<span class="hljs-built_in">open</span>(requests.get(query_url, stream=<span class="hljs-literal">True</span>).raw)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-9k435k">画像を簡単に見てみましょう。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-meta">&gt;&gt;&gt; </span>fig, ax = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>ax[<span class="hljs-number">0</span>].imshow(image_target)
<span class="hljs-meta">&gt;&gt;&gt; </span>ax[<span class="hljs-number">1</span>].imshow(query_image)<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-y78yu"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_5.png" alt="Cats"></div> <p data-svelte-h="svelte-1g2qd19">前処理ステップでは、テキスト クエリの代わりに <code>query_images</code> を使用する必要があります。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span>inputs = processor(images=image_target, query_images=query_image, return_tensors=<span class="hljs-string">&quot;pt&quot;</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ccdvl">予測の場合、入力をモデルに渡す代わりに、<code>image_guided_detection()</code> に渡します。予測を描く
ラベルがないことを除いては以前と同様です。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">with</span> torch.no_grad():
<span class="hljs-meta">... </span> outputs = model.image_guided_detection(**inputs)
<span class="hljs-meta">... </span> target_sizes = torch.tensor([image_target.size[::-<span class="hljs-number">1</span>]])
<span class="hljs-meta">... </span> results = processor.post_process_image_guided_detection(outputs=outputs, target_sizes=target_sizes)[<span class="hljs-number">0</span>]
<span class="hljs-meta">&gt;&gt;&gt; </span>draw = ImageDraw.Draw(image_target)
<span class="hljs-meta">&gt;&gt;&gt; </span>scores = results[<span class="hljs-string">&quot;scores&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span>boxes = results[<span class="hljs-string">&quot;boxes&quot;</span>].tolist()
<span class="hljs-meta">&gt;&gt;&gt; </span><span class="hljs-keyword">for</span> box, score, label <span class="hljs-keyword">in</span> <span class="hljs-built_in">zip</span>(boxes, scores, labels):
<span class="hljs-meta">... </span> xmin, ymin, xmax, ymax = box
<span class="hljs-meta">... </span> draw.rectangle((xmin, ymin, xmax, ymax), outline=<span class="hljs-string">&quot;white&quot;</span>, width=<span class="hljs-number">4</span>)
<span class="hljs-meta">&gt;&gt;&gt; </span>image_target<!-- HTML_TAG_END --></pre></div> <div class="flex justify-center" data-svelte-h="svelte-1f4dev0"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/zero-sh-obj-detection_6.png" alt="Cats with bounding boxes"></div> <p data-svelte-h="svelte-1xcb9x7">OWL-ViTによる推論をインタラクティブに試したい場合は、このデモをチェックしてください。</p> <iframe src="https://adirik-owl-vit.hf.space" frameborder="0" width="850" height="450"></iframe> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/ja/tasks/zero_shot_object_detection.md" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_jement = {
assets: "/docs/transformers/main/ja",
base: "/docs/transformers/main/ja",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/transformers/main/ja/_app/immutable/entry/start.1486e459.js"),
import("/docs/transformers/main/ja/_app/immutable/entry/app.d9ae818f.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 159],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
53.4 kB
·
Xet hash:
8cb3e8f69d1387df50250f2d0cc0030fc0f15b7e6e6950aea9a16316e695afe9

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.