| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| โ ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # ๊ฐ์ฒด ํ์ง [[object-detection]] | |
| [[open-in-colab]] | |
| ๊ฐ์ฒด ํ์ง๋ ์ด๋ฏธ์ง์์ ์ธ์คํด์ค(์: ์ฌ๋, ๊ฑด๋ฌผ ๋๋ ์๋์ฐจ)๋ฅผ ๊ฐ์งํ๋ ์ปดํจํฐ ๋น์ ์์ ์ ๋๋ค. ๊ฐ์ฒด ํ์ง ๋ชจ๋ธ์ ์ด๋ฏธ์ง๋ฅผ ์ ๋ ฅ์ผ๋ก ๋ฐ๊ณ ํ์ง๋ ๋ฐ์ด๋ฉ ๋ฐ์ค์ ์ขํ์ ๊ด๋ จ๋ ๋ ์ด๋ธ์ ์ถ๋ ฅํฉ๋๋ค. | |
| ํ๋์ ์ด๋ฏธ์ง์๋ ์ฌ๋ฌ ๊ฐ์ฒด๊ฐ ์์ ์ ์์ผ๋ฉฐ ๊ฐ๊ฐ์ ์์ฒด์ ์ธ ๋ฐ์ด๋ฉ ๋ฐ์ค์ ๋ ์ด๋ธ์ ๊ฐ์ง ์ ์์ต๋๋ค(์: ์ฐจ์ ๊ฑด๋ฌผ์ด ์๋ ์ด๋ฏธ์ง). | |
| ๋ํ ๊ฐ ๊ฐ์ฒด๋ ์ด๋ฏธ์ง์ ๋ค๋ฅธ ๋ถ๋ถ์ ์กด์ฌํ ์ ์์ต๋๋ค(์: ์ด๋ฏธ์ง์ ์ฌ๋ฌ ๋์ ์ฐจ๊ฐ ์์ ์ ์์). | |
| ์ด ์์ ์ ๋ณดํ์, ๋๋ก ํ์งํ, ์ ํธ๋ฑ๊ณผ ๊ฐ์ ๊ฒ๋ค์ ๊ฐ์งํ๋ ์์จ ์ฃผํ์ ์ผ๋ฐ์ ์ผ๋ก ์ฌ์ฉ๋ฉ๋๋ค. | |
| ๋ค๋ฅธ ์์ฉ ๋ถ์ผ๋ก๋ ์ด๋ฏธ์ง ๋ด ๊ฐ์ฒด ์ ๊ณ์ฐ ๋ฐ ์ด๋ฏธ์ง ๊ฒ์ ๋ฑ์ด ์์ต๋๋ค. | |
| ์ด ๊ฐ์ด๋์์ ๋ค์์ ๋ฐฐ์ธ ๊ฒ์ ๋๋ค: | |
| 1. ํฉ์ฑ๊ณฑ ๋ฐฑ๋ณธ(์ธํ ๋ฐ์ดํฐ์ ํน์ฑ์ ์ถ์ถํ๋ ํฉ์ฑ๊ณฑ ๋คํธ์ํฌ)๊ณผ ์ธ์ฝ๋-๋์ฝ๋ ํธ๋์คํฌ๋จธ ๋ชจ๋ธ์ ๊ฒฐํฉํ [DETR](https://huggingface.co/docs/transformers/model_doc/detr) ๋ชจ๋ธ์ [CPPE-5](https://huggingface.co/datasets/cppe-5) ๋ฐ์ดํฐ ์ธํธ์ ๋ํด ๋ฏธ์ธ์กฐ์ ํ๊ธฐ | |
| 2. ๋ฏธ์ธ์กฐ์ ํ ๋ชจ๋ธ์ ์ถ๋ก ์ ์ฌ์ฉํ๊ธฐ. | |
| <Tip> | |
| ์ด ํํ ๋ฆฌ์ผ์ ํ์คํฌ๋ ๋ค์ ๋ชจ๋ธ ์ํคํ ์ฒ์์ ์ง์๋ฉ๋๋ค: | |
| <!--This tip is automatically generated by `make fix-copies`, do not fill manually!--> | |
| [Conditional DETR](../model_doc/conditional_detr), [Deformable DETR](../model_doc/deformable_detr), [DETA](../model_doc/deta), [DETR](../model_doc/detr), [Table Transformer](../model_doc/table-transformer), [YOLOS](../model_doc/yolos) | |
| <!--End of the generated tip--> | |
| </Tip> | |
| ์์ํ๊ธฐ ์ ์ ํ์ํ ๋ชจ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์: | |
| ```bash | |
| pip install -q datasets transformers evaluate timm albumentations | |
| ``` | |
| ํ๊น ํ์ด์ค ํ๋ธ์์ ๋ฐ์ดํฐ ์ธํธ๋ฅผ ๊ฐ์ ธ์ค๊ธฐ ์ํ ๐ค Datasets๊ณผ ๋ชจ๋ธ์ ํ์ตํ๊ธฐ ์ํ ๐ค Transformers, ๋ฐ์ดํฐ๋ฅผ ์ฆ๊ฐํ๊ธฐ ์ํ `albumentations`๋ฅผ ์ฌ์ฉํฉ๋๋ค. | |
| DETR ๋ชจ๋ธ์ ํฉ์ฑ๊ณฑ ๋ฐฑ๋ณธ์ ๊ฐ์ ธ์ค๊ธฐ ์ํด์๋ ํ์ฌ `timm`์ด ํ์ํฉ๋๋ค. | |
| ์ปค๋ฎค๋ํฐ์ ๋ชจ๋ธ์ ์ ๋ก๋ํ๊ณ ๊ณต์ ํ ์ ์๋๋ก Hugging Face ๊ณ์ ์ ๋ก๊ทธ์ธํ๋ ๊ฒ์ ๊ถ์ฅํฉ๋๋ค. ํ๋กฌํํธ๊ฐ ๋ํ๋๋ฉด ํ ํฐ์ ์ ๋ ฅํ์ฌ ๋ก๊ทธ์ธํ์ธ์: | |
| ```py | |
| >>> from huggingface_hub import notebook_login | |
| >>> notebook_login() | |
| ``` | |
| ## CPPE-5 ๋ฐ์ดํฐ ์ธํธ ๊ฐ์ ธ์ค๊ธฐ [[load-the-CPPE-5-dataset]] | |
| [CPPE-5](https://huggingface.co/datasets/cppe-5) ๋ฐ์ดํฐ ์ธํธ๋ COVID-19 ๋์ ํ ์ํฉ์์ ์๋ฃ ์ ๋ฌธ์ธ๋ ฅ ๋ณดํธ ์ฅ๋น(PPE)๋ฅผ ์๋ณํ๋ ์ด๋ ธํ ์ด์ ์ด ํฌํจ๋ ์ด๋ฏธ์ง๋ฅผ ๋ด๊ณ ์์ต๋๋ค. | |
| ๋ฐ์ดํฐ ์ธํธ๋ฅผ ๊ฐ์ ธ์ค์ธ์: | |
| ```py | |
| >>> from datasets import load_dataset | |
| >>> cppe5 = load_dataset("cppe-5") | |
| >>> cppe5 | |
| DatasetDict({ | |
| train: Dataset({ | |
| features: ['image_id', 'image', 'width', 'height', 'objects'], | |
| num_rows: 1000 | |
| }) | |
| test: Dataset({ | |
| features: ['image_id', 'image', 'width', 'height', 'objects'], | |
| num_rows: 29 | |
| }) | |
| }) | |
| ``` | |
| ์ด ๋ฐ์ดํฐ ์ธํธ๋ ํ์ต ์ธํธ ์ด๋ฏธ์ง 1,000๊ฐ์ ํ ์คํธ ์ธํธ ์ด๋ฏธ์ง 29๊ฐ๋ฅผ ๊ฐ๊ณ ์์ต๋๋ค. | |
| ๋ฐ์ดํฐ์ ์ต์ํด์ง๊ธฐ ์ํด, ์์๊ฐ ์ด๋ป๊ฒ ๊ตฌ์ฑ๋์ด ์๋์ง ์ดํด๋ณด์ธ์. | |
| ```py | |
| >>> cppe5["train"][0] | |
| {'image_id': 15, | |
| 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=943x663 at 0x7F9EC9E77C10>, | |
| 'width': 943, | |
| 'height': 663, | |
| 'objects': {'id': [114, 115, 116, 117], | |
| 'area': [3796, 1596, 152768, 81002], | |
| 'bbox': [[302.0, 109.0, 73.0, 52.0], | |
| [810.0, 100.0, 57.0, 28.0], | |
| [160.0, 31.0, 248.0, 616.0], | |
| [741.0, 68.0, 202.0, 401.0]], | |
| 'category': [4, 4, 0, 0]}} | |
| ``` | |
| ๋ฐ์ดํฐ ์ธํธ์ ์๋ ์์๋ ๋ค์์ ์์ญ์ ๊ฐ์ง๊ณ ์์ต๋๋ค: | |
| - `image_id`: ์์ ์ด๋ฏธ์ง id | |
| - `image`: ์ด๋ฏธ์ง๋ฅผ ํฌํจํ๋ `PIL.Image.Image` ๊ฐ์ฒด | |
| - `width`: ์ด๋ฏธ์ง์ ๋๋น | |
| - `height`: ์ด๋ฏธ์ง์ ๋์ด | |
| - `objects`: ์ด๋ฏธ์ง ์์ ๊ฐ์ฒด๋ค์ ๋ฐ์ด๋ฉ ๋ฐ์ค ๋ฉํ๋ฐ์ดํฐ๋ฅผ ํฌํจํ๋ ๋์ ๋๋ฆฌ: | |
| - `id`: ์ด๋ ธํ ์ด์ id | |
| - `area`: ๋ฐ์ด๋ฉ ๋ฐ์ค์ ๋ฉด์ | |
| - `bbox`: ๊ฐ์ฒด์ ๋ฐ์ด๋ฉ ๋ฐ์ค ([COCO ํฌ๋งท](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco)์ผ๋ก) | |
| - `category`: ๊ฐ์ฒด์ ์นดํ ๊ณ ๋ฆฌ, ๊ฐ๋ฅํ ๊ฐ์ผ๋ก๋ `Coverall (0)`, `Face_Shield (1)`, `Gloves (2)`, `Goggles (3)` ๋ฐ `Mask (4)` ๊ฐ ํฌํจ๋ฉ๋๋ค. | |
| `bbox` ํ๋๊ฐ DETR ๋ชจ๋ธ์ด ์๊ตฌํ๋ COCO ํ์์ ๋ฐ๋ฅธ๋ค๋ ๊ฒ์ ์ ์ ์์ต๋๋ค. | |
| ๊ทธ๋ฌ๋ `objects` ๋ด๋ถ์ ํ๋ ๊ทธ๋ฃน์ DETR์ด ์๊ตฌํ๋ ์ด๋ ธํ ์ด์ ํ์๊ณผ ๋ค๋ฆ ๋๋ค. ๋ฐ๋ผ์ ์ด ๋ฐ์ดํฐ๋ฅผ ํ์ต์ ์ฌ์ฉํ๊ธฐ ์ ์ ์ ์ฒ๋ฆฌ๋ฅผ ์ ์ฉํด์ผ ํฉ๋๋ค. | |
| ๋ฐ์ดํฐ๋ฅผ ๋ ์ ์ดํดํ๊ธฐ ์ํด์ ๋ฐ์ดํฐ ์ธํธ์์ ํ ๊ฐ์ง ์์๋ฅผ ์๊ฐํํ์ธ์. | |
| ```py | |
| >>> import numpy as np | |
| >>> import os | |
| >>> from PIL import Image, ImageDraw | |
| >>> image = cppe5["train"][0]["image"] | |
| >>> annotations = cppe5["train"][0]["objects"] | |
| >>> draw = ImageDraw.Draw(image) | |
| >>> categories = cppe5["train"].features["objects"].feature["category"].names | |
| >>> id2label = {index: x for index, x in enumerate(categories, start=0)} | |
| >>> label2id = {v: k for k, v in id2label.items()} | |
| >>> for i in range(len(annotations["id"])): | |
| ... box = annotations["bbox"][i - 1] | |
| ... class_idx = annotations["category"][i - 1] | |
| ... x, y, w, h = tuple(box) | |
| ... draw.rectangle((x, y, x + w, y + h), outline="red", width=1) | |
| ... draw.text((x, y), id2label[class_idx], fill="white") | |
| >>> image | |
| ``` | |
| <div class="flex justify-center"> | |
| <img src="https://i.imgur.com/TdaqPJO.png" alt="CPPE-5 Image Example"/> | |
| </div> | |
| ๋ฐ์ด๋ฉ ๋ฐ์ค์ ์ฐ๊ฒฐ๋ ๋ ์ด๋ธ์ ์๊ฐํํ๋ ค๋ฉด ๋ฐ์ดํฐ ์ธํธ์ ๋ฉํ ๋ฐ์ดํฐ, ํนํ `category` ํ๋์์ ๋ ์ด๋ธ์ ๊ฐ์ ธ์์ผ ํฉ๋๋ค. | |
| ๋ํ ๋ ์ด๋ธ ID๋ฅผ ๋ ์ด๋ธ ํด๋์ค์ ๋งคํํ๋ `id2label`๊ณผ ๋ฐ๋๋ก ๋งคํํ๋ `label2id` ๋์ ๋๋ฆฌ๋ฅผ ๋ง๋ค์ด์ผ ํฉ๋๋ค. | |
| ๋ชจ๋ธ์ ์ค์ ํ ๋ ์ด๋ฌํ ๋งคํ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์ด๋ฌํ ๋งคํ์ ํ๊น ํ์ด์ค ํ๋ธ์์ ๋ชจ๋ธ์ ๊ณต์ ํ์ ๋ ๋ค๋ฅธ ์ฌ๋๋ค์ด ์ฌ์ฌ์ฉํ ์ ์์ต๋๋ค. | |
| ๋ฐ์ดํฐ๋ฅผ ๋ ์ ์ดํดํ๊ธฐ ์ํ ์ต์ข ๋จ๊ณ๋ก, ์ ์ฌ์ ์ธ ๋ฌธ์ ๋ฅผ ์ฐพ์๋ณด์ธ์. | |
| ๊ฐ์ฒด ๊ฐ์ง๋ฅผ ์ํ ๋ฐ์ดํฐ ์ธํธ์์ ์์ฃผ ๋ฐ์ํ๋ ๋ฌธ์ ์ค ํ๋๋ ๋ฐ์ด๋ฉ ๋ฐ์ค๊ฐ ์ด๋ฏธ์ง์ ๊ฐ์ฅ์๋ฆฌ๋ฅผ ๋์ด๊ฐ๋ ๊ฒ์ ๋๋ค. | |
| ์ด๋ฌํ ๋ฐ์ด๋ฉ ๋ฐ์ค๋ฅผ "๋์ด๊ฐ๋ ๊ฒ(run away)"์ ํ๋ จ ์ค์ ์ค๋ฅ๋ฅผ ๋ฐ์์ํฌ ์ ์๊ธฐ์ ์ด ๋จ๊ณ์์ ์ฒ๋ฆฌํด์ผ ํฉ๋๋ค. | |
| ์ด ๋ฐ์ดํฐ ์ธํธ์๋ ๊ฐ์ ๋ฌธ์ ๊ฐ ์๋ ๋ช ๊ฐ์ง ์๊ฐ ์์ต๋๋ค. ์ด ๊ฐ์ด๋์์๋ ๊ฐ๋จํ๊ฒํ๊ธฐ ์ํด ๋ฐ์ดํฐ์์ ์ด๋ฌํ ์ด๋ฏธ์ง๋ฅผ ์ ๊ฑฐํฉ๋๋ค. | |
| ```py | |
| >>> remove_idx = [590, 821, 822, 875, 876, 878, 879] | |
| >>> keep = [i for i in range(len(cppe5["train"])) if i not in remove_idx] | |
| >>> cppe5["train"] = cppe5["train"].select(keep) | |
| ``` | |
| ## ๋ฐ์ดํฐ ์ ์ฒ๋ฆฌํ๊ธฐ [[preprocess-the-data]] | |
| ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ค๋ฉด, ๋ฏธ๋ฆฌ ํ์ต๋ ๋ชจ๋ธ์์ ์ฌ์ฉํ ์ ์ฒ๋ฆฌ ๋ฐฉ์๊ณผ ์ ํํ๊ฒ ์ผ์นํ๋๋ก ์ฌ์ฉํ ๋ฐ์ดํฐ๋ฅผ ์ ์ฒ๋ฆฌํด์ผ ํฉ๋๋ค. | |
| [`AutoImageProcessor`]๋ ์ด๋ฏธ์ง ๋ฐ์ดํฐ๋ฅผ ์ฒ๋ฆฌํ์ฌ DETR ๋ชจ๋ธ์ด ํ์ต์ ์ฌ์ฉํ ์ ์๋ `pixel_values`, `pixel_mask`, ๊ทธ๋ฆฌ๊ณ `labels`๋ฅผ ์์ฑํ๋ ์์ ์ ๋ด๋นํฉ๋๋ค. | |
| ์ด ์ด๋ฏธ์ง ํ๋ก์ธ์์๋ ๊ฑฑ์ ํ์ง ์์๋ ๋๋ ๋ช ๊ฐ์ง ์์ฑ์ด ์์ต๋๋ค: | |
| - `image_mean = [0.485, 0.456, 0.406 ]` | |
| - `image_std = [0.229, 0.224, 0.225]` | |
| ์ด ๊ฐ๋ค์ ๋ชจ๋ธ ์ฌ์ ํ๋ จ ์ค ์ด๋ฏธ์ง๋ฅผ ์ ๊ทํํ๋ ๋ฐ ์ฌ์ฉ๋๋ ํ๊ท ๊ณผ ํ์ค ํธ์ฐจ์ ๋๋ค. | |
| ์ด ๊ฐ๋ค์ ์ถ๋ก ๋๋ ์ฌ์ ํ๋ จ๋ ์ด๋ฏธ์ง ๋ชจ๋ธ์ ์ธ๋ฐํ๊ฒ ์กฐ์ ํ ๋ ๋ณต์ ํด์ผ ํ๋ ์ค์ํ ๊ฐ์ ๋๋ค. | |
| ์ฌ์ ํ๋ จ๋ ๋ชจ๋ธ๊ณผ ๋์ผํ ์ฒดํฌํฌ์ธํธ์์ ์ด๋ฏธ์ง ํ๋ก์ธ์๋ฅผ ์ธ์คํด์คํํฉ๋๋ค. | |
| ```py | |
| >>> from transformers import AutoImageProcessor | |
| >>> checkpoint = "facebook/detr-resnet-50" | |
| >>> image_processor = AutoImageProcessor.from_pretrained(checkpoint) | |
| ``` | |
| `image_processor`์ ์ด๋ฏธ์ง๋ฅผ ์ ๋ฌํ๊ธฐ ์ ์, ๋ฐ์ดํฐ ์ธํธ์ ๋ ๊ฐ์ง ์ ์ฒ๋ฆฌ๋ฅผ ์ ์ฉํด์ผ ํฉ๋๋ค: | |
| - ์ด๋ฏธ์ง ์ฆ๊ฐ | |
| - DETR ๋ชจ๋ธ์ ์๊ตฌ์ ๋ง๊ฒ ์ด๋ ธํ ์ด์ ์ ๋ค์ ํฌ๋งทํ | |
| ์ฒซ์งธ๋ก, ๋ชจ๋ธ์ด ํ์ต ๋ฐ์ดํฐ์ ๊ณผ์ ํฉ ๋์ง ์๋๋ก ๋ฐ์ดํฐ ์ฆ๊ฐ ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ค ์๋ฌด๊ฑฐ๋ ์ฌ์ฉํ์ฌ ๋ณํ์ ์ ์ฉํ ์ ์์ต๋๋ค. ์ฌ๊ธฐ์์๋ [Albumentations](https://albumentations.ai/docs/) ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํฉ๋๋ค... | |
| ์ด ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ ๋ณํ์ ์ด๋ฏธ์ง์ ์ ์ฉํ๊ณ ๋ฐ์ด๋ฉ ๋ฐ์ค๋ฅผ ์ ์ ํ๊ฒ ์ ๋ฐ์ดํธํ๋๋ก ๋ณด์ฅํฉ๋๋ค. | |
| ๐ค Datasets ๋ผ์ด๋ธ๋ฌ๋ฆฌ ๋ฌธ์์๋ [๊ฐ์ฒด ํ์ง๋ฅผ ์ํด ์ด๋ฏธ์ง๋ฅผ ๋ณด๊ฐํ๋ ๋ฐฉ๋ฒ์ ๋ํ ์์ธํ ๊ฐ์ด๋](https://huggingface.co/docs/datasets/object_detection)๊ฐ ์์ผ๋ฉฐ, | |
| ์ด ์์ ์ ์ ํํ ๋์ผํ ๋ฐ์ดํฐ ์ธํธ๋ฅผ ์ฌ์ฉํฉ๋๋ค. ์ฌ๊ธฐ์๋ ๊ฐ ์ด๋ฏธ์ง๋ฅผ (480, 480) ํฌ๊ธฐ๋ก ์กฐ์ ํ๊ณ , ์ข์ฐ๋ก ๋ค์ง๊ณ , ๋ฐ๊ธฐ๋ฅผ ๋์ด๋ ๋์ผํ ์ ๊ทผ๋ฒ์ ์ ์ฉํฉ๋๋ค: | |
| ```py | |
| >>> import albumentations | |
| >>> import numpy as np | |
| >>> import torch | |
| >>> transform = albumentations.Compose( | |
| ... [ | |
| ... albumentations.Resize(480, 480), | |
| ... albumentations.HorizontalFlip(p=1.0), | |
| ... albumentations.RandomBrightnessContrast(p=1.0), | |
| ... ], | |
| ... bbox_params=albumentations.BboxParams(format="coco", label_fields=["category"]), | |
| ... ) | |
| ``` | |
| ์ด๋ฏธ์ง ํ๋ก์ธ์๋ ์ด๋ ธํ ์ด์ ์ด ๋ค์๊ณผ ๊ฐ์ ํ์์ผ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค: `{'image_id': int, 'annotations': List[Dict]}`, ์ฌ๊ธฐ์ ๊ฐ ๋์ ๋๋ฆฌ๋ COCO ๊ฐ์ฒด ์ด๋ ธํ ์ด์ ์ ๋๋ค. ๋จ์ผ ์์ ์ ๋ํด ์ด๋ ธํ ์ด์ ์ ํ์์ ๋ค์ ์ง์ ํ๋ ํจ์๋ฅผ ์ถ๊ฐํด ๋ณด๊ฒ ์ต๋๋ค: | |
| ```py | |
| >>> def formatted_anns(image_id, category, area, bbox): | |
| ... annotations = [] | |
| ... for i in range(0, len(category)): | |
| ... new_ann = { | |
| ... "image_id": image_id, | |
| ... "category_id": category[i], | |
| ... "isCrowd": 0, | |
| ... "area": area[i], | |
| ... "bbox": list(bbox[i]), | |
| ... } | |
| ... annotations.append(new_ann) | |
| ... return annotations | |
| ``` | |
| ์ด์ ์ด๋ฏธ์ง์ ์ด๋ ธํ ์ด์ ์ ์ฒ๋ฆฌ ๋ณํ์ ๊ฒฐํฉํ์ฌ ์์ ๋ฐฐ์น์ ์ฌ์ฉํ ์ ์์ต๋๋ค: | |
| ```py | |
| >>> # transforming a batch | |
| >>> def transform_aug_ann(examples): | |
| ... image_ids = examples["image_id"] | |
| ... images, bboxes, area, categories = [], [], [], [] | |
| ... for image, objects in zip(examples["image"], examples["objects"]): | |
| ... image = np.array(image.convert("RGB"))[:, :, ::-1] | |
| ... out = transform(image=image, bboxes=objects["bbox"], category=objects["category"]) | |
| ... area.append(objects["area"]) | |
| ... images.append(out["image"]) | |
| ... bboxes.append(out["bboxes"]) | |
| ... categories.append(out["category"]) | |
| ... targets = [ | |
| ... {"image_id": id_, "annotations": formatted_anns(id_, cat_, ar_, box_)} | |
| ... for id_, cat_, ar_, box_ in zip(image_ids, categories, area, bboxes) | |
| ... ] | |
| ... return image_processor(images=images, annotations=targets, return_tensors="pt") | |
| ``` | |
| ์ด์ ๋จ๊ณ์์ ๋ง๋ ์ ์ฒ๋ฆฌ ํจ์๋ฅผ ๐ค Datasets์ [`~datasets.Dataset.with_transform`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ๋ฐ์ดํฐ ์ธํธ ์ ์ฒด์ ์ ์ฉํฉ๋๋ค. | |
| ์ด ๋ฉ์๋๋ ๋ฐ์ดํฐ ์ธํธ์ ์์๋ฅผ ๊ฐ์ ธ์ฌ ๋๋ง๋ค ์ ์ฒ๋ฆฌ ํจ์๋ฅผ ์ ์ฉํฉ๋๋ค. | |
| ์ด ์์ ์์๋ ์ ์ฒ๋ฆฌ ํ ๋ฐ์ดํฐ ์ธํธ์์ ์์ ํ๋๋ฅผ ๊ฐ์ ธ์์ ๋ณํ ํ ๋ชจ์์ด ์ด๋ป๊ฒ ๋๋์ง ํ์ธํด ๋ณผ ์ ์์ต๋๋ค. | |
| ์ด๋, `pixel_values` ํ ์, `pixel_mask` ํ ์, ๊ทธ๋ฆฌ๊ณ `labels`๋ก ๊ตฌ์ฑ๋ ํ ์๊ฐ ์์ด์ผ ํฉ๋๋ค. | |
| ```py | |
| >>> cppe5["train"] = cppe5["train"].with_transform(transform_aug_ann) | |
| >>> cppe5["train"][15] | |
| {'pixel_values': tensor([[[ 0.9132, 0.9132, 0.9132, ..., -1.9809, -1.9809, -1.9809], | |
| [ 0.9132, 0.9132, 0.9132, ..., -1.9809, -1.9809, -1.9809], | |
| [ 0.9132, 0.9132, 0.9132, ..., -1.9638, -1.9638, -1.9638], | |
| ..., | |
| [-1.5699, -1.5699, -1.5699, ..., -1.9980, -1.9980, -1.9980], | |
| [-1.5528, -1.5528, -1.5528, ..., -1.9980, -1.9809, -1.9809], | |
| [-1.5528, -1.5528, -1.5528, ..., -1.9980, -1.9809, -1.9809]], | |
| [[ 1.3081, 1.3081, 1.3081, ..., -1.8431, -1.8431, -1.8431], | |
| [ 1.3081, 1.3081, 1.3081, ..., -1.8431, -1.8431, -1.8431], | |
| [ 1.3081, 1.3081, 1.3081, ..., -1.8256, -1.8256, -1.8256], | |
| ..., | |
| [-1.3179, -1.3179, -1.3179, ..., -1.8606, -1.8606, -1.8606], | |
| [-1.3004, -1.3004, -1.3004, ..., -1.8606, -1.8431, -1.8431], | |
| [-1.3004, -1.3004, -1.3004, ..., -1.8606, -1.8431, -1.8431]], | |
| [[ 1.4200, 1.4200, 1.4200, ..., -1.6476, -1.6476, -1.6476], | |
| [ 1.4200, 1.4200, 1.4200, ..., -1.6476, -1.6476, -1.6476], | |
| [ 1.4200, 1.4200, 1.4200, ..., -1.6302, -1.6302, -1.6302], | |
| ..., | |
| [-1.0201, -1.0201, -1.0201, ..., -1.5604, -1.5604, -1.5604], | |
| [-1.0027, -1.0027, -1.0027, ..., -1.5604, -1.5430, -1.5430], | |
| [-1.0027, -1.0027, -1.0027, ..., -1.5604, -1.5430, -1.5430]]]), | |
| 'pixel_mask': tensor([[1, 1, 1, ..., 1, 1, 1], | |
| [1, 1, 1, ..., 1, 1, 1], | |
| [1, 1, 1, ..., 1, 1, 1], | |
| ..., | |
| [1, 1, 1, ..., 1, 1, 1], | |
| [1, 1, 1, ..., 1, 1, 1], | |
| [1, 1, 1, ..., 1, 1, 1]]), | |
| 'labels': {'size': tensor([800, 800]), 'image_id': tensor([756]), 'class_labels': tensor([4]), 'boxes': tensor([[0.7340, 0.6986, 0.3414, 0.5944]]), 'area': tensor([519544.4375]), 'iscrowd': tensor([0]), 'orig_size': tensor([480, 480])}} | |
| ``` | |
| ๊ฐ๊ฐ์ ์ด๋ฏธ์ง๋ฅผ ์ฑ๊ณต์ ์ผ๋ก ์ฆ๊ฐํ๊ณ ์ด๋ฏธ์ง์ ์ด๋ ธํ ์ด์ ์ ์ค๋นํ์ต๋๋ค. | |
| ๊ทธ๋ฌ๋ ์ ์ฒ๋ฆฌ๋ ์์ง ๋๋์ง ์์์ต๋๋ค. ๋ง์ง๋ง ๋จ๊ณ๋ก, ์ด๋ฏธ์ง๋ฅผ ๋ฐฐ์น๋ก ๋ง๋ค ์ฌ์ฉ์ ์ ์ `collate_fn`์ ์์ฑํฉ๋๋ค. | |
| ํด๋น ๋ฐฐ์น์์ ๊ฐ์ฅ ํฐ ์ด๋ฏธ์ง์ ์ด๋ฏธ์ง(ํ์ฌ `pixel_values` ์ธ)๋ฅผ ํจ๋ํ๊ณ , ์ค์ ํฝ์ (1)๊ณผ ํจ๋ฉ(0)์ ๋ํ๋ด๊ธฐ ์ํด ๊ทธ์ ํด๋นํ๋ ์๋ก์ด `pixel_mask`๋ฅผ ์์ฑํด์ผ ํฉ๋๋ค. | |
| ```py | |
| >>> def collate_fn(batch): | |
| ... pixel_values = [item["pixel_values"] for item in batch] | |
| ... encoding = image_processor.pad(pixel_values, return_tensors="pt") | |
| ... labels = [item["labels"] for item in batch] | |
| ... batch = {} | |
| ... batch["pixel_values"] = encoding["pixel_values"] | |
| ... batch["pixel_mask"] = encoding["pixel_mask"] | |
| ... batch["labels"] = labels | |
| ... return batch | |
| ``` | |
| ## DETR ๋ชจ๋ธ ํ์ต์ํค๊ธฐ [[training-the-DETR-model]] | |
| ์ด์ ์น์ ์์ ๋๋ถ๋ถ์ ์์ ์ ์ํํ์ฌ ์ด์ ๋ชจ๋ธ์ ํ์ตํ ์ค๋น๊ฐ ๋์์ต๋๋ค! | |
| ์ด ๋ฐ์ดํฐ ์ธํธ์ ์ด๋ฏธ์ง๋ ๋ฆฌ์ฌ์ด์ฆ ํ์๋ ์ฌ์ ํ ์ฉ๋์ด ํฌ๊ธฐ ๋๋ฌธ์, ์ด ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ค๋ฉด ์ ์ด๋ ํ๋์ GPU๊ฐ ํ์ํฉ๋๋ค. | |
| ํ์ต์ ๋ค์์ ๋จ๊ณ๋ฅผ ์ํํฉ๋๋ค: | |
| 1. [`AutoModelForObjectDetection`]์ ์ฌ์ฉํ์ฌ ์ ์ฒ๋ฆฌ์ ๋์ผํ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ๊ฐ์ ธ์ต๋๋ค. | |
| 2. [`TrainingArguments`]์์ ํ์ต ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์ ์ํฉ๋๋ค. | |
| 3. ๋ชจ๋ธ, ๋ฐ์ดํฐ ์ธํธ, ์ด๋ฏธ์ง ํ๋ก์ธ์ ๋ฐ ๋ฐ์ดํฐ ์ฝ๋ ์ดํฐ์ ํจ๊ป [`Trainer`]์ ํ๋ จ ์ธ์๋ฅผ ์ ๋ฌํฉ๋๋ค. | |
| 4. [`~Trainer.train`]๋ฅผ ํธ์ถํ์ฌ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํฉ๋๋ค. | |
| ์ ์ฒ๋ฆฌ์ ์ฌ์ฉํ ์ฒดํฌํฌ์ธํธ์ ๋์ผํ ์ฒดํฌํฌ์ธํธ์์ ๋ชจ๋ธ์ ๊ฐ์ ธ์ฌ ๋, ๋ฐ์ดํฐ ์ธํธ์ ๋ฉํ๋ฐ์ดํฐ์์ ๋ง๋ `label2id`์ `id2label` ๋งคํ์ ์ ๋ฌํด์ผ ํฉ๋๋ค. | |
| ๋ํ, `ignore_mismatched_sizes=True`๋ฅผ ์ง์ ํ์ฌ ๊ธฐ์กด ๋ถ๋ฅ ํค๋(๋ชจ๋ธ์์ ๋ถ๋ฅ์ ์ฌ์ฉ๋๋ ๋ง์ง๋ง ๋ ์ด์ด)๋ฅผ ์ ๋ถ๋ฅ ํค๋๋ก ๋์ฒดํฉ๋๋ค. | |
| ```py | |
| >>> from transformers import AutoModelForObjectDetection | |
| >>> model = AutoModelForObjectDetection.from_pretrained( | |
| ... checkpoint, | |
| ... id2label=id2label, | |
| ... label2id=label2id, | |
| ... ignore_mismatched_sizes=True, | |
| ... ) | |
| ``` | |
| [`TrainingArguments`]์์ `output_dir`์ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ์ ์ฅํ ์์น๋ฅผ ์ง์ ํ ๋ค์, ํ์์ ๋ฐ๋ผ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ๊ตฌ์ฑํ์ธ์. | |
| ์ฌ์ฉํ์ง ์๋ ์ด์ ์ ๊ฑฐํ์ง ์๋๋ก ์ฃผ์ํด์ผ ํฉ๋๋ค. ๋ง์ฝ `remove_unused_columns`๊ฐ `True`์ผ ๊ฒฝ์ฐ ์ด๋ฏธ์ง ์ด์ด ์ญ์ ๋ฉ๋๋ค. | |
| ์ด๋ฏธ์ง ์ด์ด ์๋ ๊ฒฝ์ฐ `pixel_values`๋ฅผ ์์ฑํ ์ ์๊ธฐ ๋๋ฌธ์ `remove_unused_columns`๋ฅผ `False`๋ก ์ค์ ํด์ผ ํฉ๋๋ค. | |
| ๋ชจ๋ธ์ Hub์ ์ ๋ก๋ํ์ฌ ๊ณต์ ํ๋ ค๋ฉด `push_to_hub`๋ฅผ `True`๋ก ์ค์ ํ์ญ์์ค(ํ๊น ํ์ด์ค์ ๋ก๊ทธ์ธํ์ฌ ๋ชจ๋ธ์ ์ ๋ก๋ํด์ผ ํฉ๋๋ค). | |
| ```py | |
| >>> from transformers import TrainingArguments | |
| >>> training_args = TrainingArguments( | |
| ... output_dir="detr-resnet-50_finetuned_cppe5", | |
| ... per_device_train_batch_size=8, | |
| ... num_train_epochs=10, | |
| ... fp16=True, | |
| ... save_steps=200, | |
| ... logging_steps=50, | |
| ... learning_rate=1e-5, | |
| ... weight_decay=1e-4, | |
| ... save_total_limit=2, | |
| ... remove_unused_columns=False, | |
| ... push_to_hub=True, | |
| ... ) | |
| ``` | |
| ๋ง์ง๋ง์ผ๋ก `model`, `training_args`, `collate_fn`, `image_processor`์ ๋ฐ์ดํฐ ์ธํธ(`cppe5`)๋ฅผ ๋ชจ๋ ๊ฐ์ ธ์จ ํ, [`~transformers.Trainer.train`]๋ฅผ ํธ์ถํฉ๋๋ค. | |
| ```py | |
| >>> from transformers import Trainer | |
| >>> trainer = Trainer( | |
| ... model=model, | |
| ... args=training_args, | |
| ... data_collator=collate_fn, | |
| ... train_dataset=cppe5["train"], | |
| ... tokenizer=image_processor, | |
| ... ) | |
| >>> trainer.train() | |
| ``` | |
| `training_args`์์ `push_to_hub`๋ฅผ `True`๋ก ์ค์ ํ ๊ฒฝ์ฐ, ํ์ต ์ฒดํฌํฌ์ธํธ๋ ํ๊น ํ์ด์ค ํ๋ธ์ ์ ๋ก๋๋ฉ๋๋ค. | |
| ํ์ต ์๋ฃ ํ, [`~transformers.Trainer.push_to_hub`] ๋ฉ์๋๋ฅผ ํธ์ถํ์ฌ ์ต์ข ๋ชจ๋ธ์ ํ๊น ํ์ด์ค ํ๋ธ์ ์ ๋ก๋ํฉ๋๋ค. | |
| ```py | |
| >>> trainer.push_to_hub() | |
| ``` | |
| ## ํ๊ฐํ๊ธฐ [[evaluate]] | |
| ๊ฐ์ฒด ํ์ง ๋ชจ๋ธ์ ์ผ๋ฐ์ ์ผ๋ก ์ผ๋ จ์ <a href="https://cocodataset.org/#detection-eval">COCO-์คํ์ผ ์งํ</a>๋ก ํ๊ฐ๋ฉ๋๋ค. | |
| ๊ธฐ์กด์ ๊ตฌํ๋ ํ๊ฐ ์งํ ์ค ํ๋๋ฅผ ์ฌ์ฉํ ์๋ ์์ง๋ง, ์ฌ๊ธฐ์์๋ ํ๊น ํ์ด์ค ํ๋ธ์ ํธ์ํ ์ต์ข ๋ชจ๋ธ์ ํ๊ฐํ๋ ๋ฐ `torchvision`์์ ์ ๊ณตํ๋ ํ๊ฐ ์งํ๋ฅผ ์ฌ์ฉํฉ๋๋ค. | |
| `torchvision` ํ๊ฐ์(evaluator)๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด ์ค์ธก๊ฐ์ธ COCO ๋ฐ์ดํฐ ์ธํธ๋ฅผ ์ค๋นํด์ผ ํฉ๋๋ค. | |
| COCO ๋ฐ์ดํฐ ์ธํธ๋ฅผ ๋น๋ํ๋ API๋ ๋ฐ์ดํฐ๋ฅผ ํน์ ํ์์ผ๋ก ์ ์ฅํด์ผ ํ๋ฏ๋ก, ๋จผ์ ์ด๋ฏธ์ง์ ์ด๋ ธํ ์ด์ ์ ๋์คํฌ์ ์ ์ฅํด์ผ ํฉ๋๋ค. | |
| ํ์ต์ ์ํด ๋ฐ์ดํฐ๋ฅผ ์ค๋นํ ๋์ ๋ง์ฐฌ๊ฐ์ง๋ก, cppe5["test"]์์์ ์ด๋ ธํ ์ด์ ์ ํฌ๋งท์ ๋ง์ถฐ์ผ ํฉ๋๋ค. ๊ทธ๋ฌ๋ ์ด๋ฏธ์ง๋ ๊ทธ๋๋ก ์ ์งํด์ผ ํฉ๋๋ค. | |
| ํ๊ฐ ๋จ๊ณ๋ ์ฝ๊ฐ์ ์์ ์ด ํ์ํ์ง๋ง, ํฌ๊ฒ ์ธ ๊ฐ์ง ์ฃผ์ ๋จ๊ณ๋ก ๋๋ ์ ์์ต๋๋ค. | |
| ๋จผ์ , `cppe5["test"]` ์ธํธ๋ฅผ ์ค๋นํฉ๋๋ค: ์ด๋ ธํ ์ด์ ์ ํฌ๋งท์ ๋ง๊ฒ ๋ง๋ค๊ณ ๋ฐ์ดํฐ๋ฅผ ๋์คํฌ์ ์ ์ฅํฉ๋๋ค. | |
| ```py | |
| >>> import json | |
| >>> # format annotations the same as for training, no need for data augmentation | |
| >>> def val_formatted_anns(image_id, objects): | |
| ... annotations = [] | |
| ... for i in range(0, len(objects["id"])): | |
| ... new_ann = { | |
| ... "id": objects["id"][i], | |
| ... "category_id": objects["category"][i], | |
| ... "iscrowd": 0, | |
| ... "image_id": image_id, | |
| ... "area": objects["area"][i], | |
| ... "bbox": objects["bbox"][i], | |
| ... } | |
| ... annotations.append(new_ann) | |
| ... return annotations | |
| >>> # Save images and annotations into the files torchvision.datasets.CocoDetection expects | |
| >>> def save_cppe5_annotation_file_images(cppe5): | |
| ... output_json = {} | |
| ... path_output_cppe5 = f"{os.getcwd()}/cppe5/" | |
| ... if not os.path.exists(path_output_cppe5): | |
| ... os.makedirs(path_output_cppe5) | |
| ... path_anno = os.path.join(path_output_cppe5, "cppe5_ann.json") | |
| ... categories_json = [{"supercategory": "none", "id": id, "name": id2label[id]} for id in id2label] | |
| ... output_json["images"] = [] | |
| ... output_json["annotations"] = [] | |
| ... for example in cppe5: | |
| ... ann = val_formatted_anns(example["image_id"], example["objects"]) | |
| ... output_json["images"].append( | |
| ... { | |
| ... "id": example["image_id"], | |
| ... "width": example["image"].width, | |
| ... "height": example["image"].height, | |
| ... "file_name": f"{example['image_id']}.png", | |
| ... } | |
| ... ) | |
| ... output_json["annotations"].extend(ann) | |
| ... output_json["categories"] = categories_json | |
| ... with open(path_anno, "w") as file: | |
| ... json.dump(output_json, file, ensure_ascii=False, indent=4) | |
| ... for im, img_id in zip(cppe5["image"], cppe5["image_id"]): | |
| ... path_img = os.path.join(path_output_cppe5, f"{img_id}.png") | |
| ... im.save(path_img) | |
| ... return path_output_cppe5, path_anno | |
| ``` | |
| ๋ค์์ผ๋ก, `cocoevaluator`์ ํจ๊ป ์ฌ์ฉํ ์ ์๋ `CocoDetection` ํด๋์ค์ ์ธ์คํด์ค๋ฅผ ์ค๋นํฉ๋๋ค. | |
| ```py | |
| >>> import torchvision | |
| >>> class CocoDetection(torchvision.datasets.CocoDetection): | |
| ... def __init__(self, img_folder, image_processor, ann_file): | |
| ... super().__init__(img_folder, ann_file) | |
| ... self.image_processor = image_processor | |
| ... def __getitem__(self, idx): | |
| ... # read in PIL image and target in COCO format | |
| ... img, target = super(CocoDetection, self).__getitem__(idx) | |
| ... # preprocess image and target: converting target to DETR format, | |
| ... # resizing + normalization of both image and target) | |
| ... image_id = self.ids[idx] | |
| ... target = {"image_id": image_id, "annotations": target} | |
| ... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt") | |
| ... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension | |
| ... target = encoding["labels"][0] # remove batch dimension | |
| ... return {"pixel_values": pixel_values, "labels": target} | |
| >>> im_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") | |
| >>> path_output_cppe5, path_anno = save_cppe5_annotation_file_images(cppe5["test"]) | |
| >>> test_ds_coco_format = CocoDetection(path_output_cppe5, im_processor, path_anno) | |
| ``` | |
| ๋ง์ง๋ง์ผ๋ก, ํ๊ฐ ์งํ๋ฅผ ๊ฐ์ ธ์์ ํ๊ฐ๋ฅผ ์คํํฉ๋๋ค. | |
| ```py | |
| >>> import evaluate | |
| >>> from tqdm import tqdm | |
| >>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") | |
| >>> module = evaluate.load("ybelkada/cocoevaluate", coco=test_ds_coco_format.coco) | |
| >>> val_dataloader = torch.utils.data.DataLoader( | |
| ... test_ds_coco_format, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn | |
| ... ) | |
| >>> with torch.no_grad(): | |
| ... for idx, batch in enumerate(tqdm(val_dataloader)): | |
| ... pixel_values = batch["pixel_values"] | |
| ... pixel_mask = batch["pixel_mask"] | |
| ... labels = [ | |
| ... {k: v for k, v in t.items()} for t in batch["labels"] | |
| ... ] # these are in DETR format, resized + normalized | |
| ... # forward pass | |
| ... outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask) | |
| ... orig_target_sizes = torch.stack([target["orig_size"] for target in labels], dim=0) | |
| ... results = im_processor.post_process(outputs, orig_target_sizes) # convert outputs of model to Pascal VOC format (xmin, ymin, xmax, ymax) | |
| ... module.add(prediction=results, reference=labels) | |
| ... del batch | |
| >>> results = module.compute() | |
| >>> print(results) | |
| Accumulating evaluation results... | |
| DONE (t=0.08s). | |
| IoU metric: bbox | |
| Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.352 | |
| Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.681 | |
| Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.292 | |
| Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168 | |
| Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.208 | |
| Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429 | |
| Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.274 | |
| Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.484 | |
| Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.501 | |
| Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191 | |
| Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323 | |
| Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590 | |
| ``` | |
| ์ด๋ฌํ ๊ฒฐ๊ณผ๋ [`~transformers.TrainingArguments`]์ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์กฐ์ ํ์ฌ ๋์ฑ ๊ฐ์ ๋ ์ ์์ต๋๋ค. ํ๋ฒ ์๋ํด ๋ณด์ธ์! | |
| ## ์ถ๋ก ํ๊ธฐ [[inference]] | |
| DETR ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ๋ฐ ํ๊ฐํ๊ณ , ํ๊น ํ์ด์ค ํ๋ธ์ ์ ๋ก๋ ํ์ผ๋ฏ๋ก ์ถ๋ก ์ ์ฌ์ฉํ ์ ์์ต๋๋ค. | |
| ๋ฏธ์ธ ์กฐ์ ๋ ๋ชจ๋ธ์ ์ถ๋ก ์ ์ฌ์ฉํ๋ ๊ฐ์ฅ ๊ฐ๋จํ ๋ฐฉ๋ฒ์ [`pipeline`]์์ ๋ชจ๋ธ์ ์ฌ์ฉํ๋ ๊ฒ์ ๋๋ค. | |
| ๋ชจ๋ธ๊ณผ ํจ๊ป ๊ฐ์ฒด ํ์ง๋ฅผ ์ํ ํ์ดํ๋ผ์ธ์ ์ธ์คํด์คํํ๊ณ , ์ด๋ฏธ์ง๋ฅผ ์ ๋ฌํ์ธ์: | |
| ```py | |
| >>> from transformers import pipeline | |
| >>> import requests | |
| >>> url = "https://i.imgur.com/2lnWoly.jpg" | |
| >>> image = Image.open(requests.get(url, stream=True).raw) | |
| >>> obj_detector = pipeline("object-detection", model="devonho/detr-resnet-50_finetuned_cppe5") | |
| >>> obj_detector(image) | |
| ``` | |
| ๋ง์ฝ ์ํ๋ค๋ฉด ์๋์ผ๋ก `pipeline`์ ๊ฒฐ๊ณผ๋ฅผ ์ฌํํ ์ ์์ต๋๋ค: | |
| ```py | |
| >>> image_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") | |
| >>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5") | |
| >>> with torch.no_grad(): | |
| ... inputs = image_processor(images=image, return_tensors="pt") | |
| ... outputs = model(**inputs) | |
| ... target_sizes = torch.tensor([image.size[::-1]]) | |
| ... results = image_processor.post_process_object_detection(outputs, threshold=0.5, target_sizes=target_sizes)[0] | |
| >>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): | |
| ... box = [round(i, 2) for i in box.tolist()] | |
| ... print( | |
| ... f"Detected {model.config.id2label[label.item()]} with confidence " | |
| ... f"{round(score.item(), 3)} at location {box}" | |
| ... ) | |
| Detected Coverall with confidence 0.566 at location [1215.32, 147.38, 4401.81, 3227.08] | |
| Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.9] | |
| ``` | |
| ๊ฒฐ๊ณผ๋ฅผ ์๊ฐํํ๊ฒ ์ต๋๋ค: | |
| ```py | |
| >>> draw = ImageDraw.Draw(image) | |
| >>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]): | |
| ... box = [round(i, 2) for i in box.tolist()] | |
| ... x, y, x2, y2 = tuple(box) | |
| ... draw.rectangle((x, y, x2, y2), outline="red", width=1) | |
| ... draw.text((x, y), model.config.id2label[label.item()], fill="white") | |
| >>> image | |
| ``` | |
| <div class="flex justify-center"> | |
| <img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/> | |
| </div> | |