xingzhikb's picture
init
002bd9b
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
โš ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# ๊ฐ์ฒด ํƒ์ง€ [[object-detection]]
[[open-in-colab]]
๊ฐ์ฒด ํƒ์ง€๋Š” ์ด๋ฏธ์ง€์—์„œ ์ธ์Šคํ„ด์Šค(์˜ˆ: ์‚ฌ๋žŒ, ๊ฑด๋ฌผ ๋˜๋Š” ์ž๋™์ฐจ)๋ฅผ ๊ฐ์ง€ํ•˜๋Š” ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์ž…๋‹ˆ๋‹ค. ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ  ํƒ์ง€๋œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์ขŒํ‘œ์™€ ๊ด€๋ จ๋œ ๋ ˆ์ด๋ธ”์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ฐ๊ฐ์€ ์ž์ฒด์ ์ธ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ๋ ˆ์ด๋ธ”์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์ฐจ์™€ ๊ฑด๋ฌผ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€).
๋˜ํ•œ ๊ฐ ๊ฐ์ฒด๋Š” ์ด๋ฏธ์ง€์˜ ๋‹ค๋ฅธ ๋ถ€๋ถ„์— ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์ด๋ฏธ์ง€์— ์—ฌ๋Ÿฌ ๋Œ€์˜ ์ฐจ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Œ).
์ด ์ž‘์—…์€ ๋ณดํ–‰์ž, ๋„๋กœ ํ‘œ์ง€ํŒ, ์‹ ํ˜ธ๋“ฑ๊ณผ ๊ฐ™์€ ๊ฒƒ๋“ค์„ ๊ฐ์ง€ํ•˜๋Š” ์ž์œจ ์ฃผํ–‰์— ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์‘์šฉ ๋ถ„์•ผ๋กœ๋Š” ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด ์ˆ˜ ๊ณ„์‚ฐ ๋ฐ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ๊ฐ€์ด๋“œ์—์„œ ๋‹ค์Œ์„ ๋ฐฐ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค:
1. ํ•ฉ์„ฑ๊ณฑ ๋ฐฑ๋ณธ(์ธํ’‹ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ์ถ”์ถœํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ๋„คํŠธ์›Œํฌ)๊ณผ ์ธ์ฝ”๋”-๋””์ฝ”๋” ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ [DETR](https://huggingface.co/docs/transformers/model_doc/detr) ๋ชจ๋ธ์„ [CPPE-5](https://huggingface.co/datasets/cppe-5) ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ๋ฏธ์„ธ์กฐ์ • ํ•˜๊ธฐ
2. ๋ฏธ์„ธ์กฐ์ • ํ•œ ๋ชจ๋ธ์„ ์ถ”๋ก ์— ์‚ฌ์šฉํ•˜๊ธฐ.
<Tip>
์ด ํŠœํ† ๋ฆฌ์–ผ์˜ ํƒœ์Šคํฌ๋Š” ๋‹ค์Œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค:
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!-->
[Conditional DETR](../model_doc/conditional_detr), [Deformable DETR](../model_doc/deformable_detr), [DETA](../model_doc/deta), [DETR](../model_doc/detr), [Table Transformer](../model_doc/table-transformer), [YOLOS](../model_doc/yolos)
<!--End of the generated tip-->
</Tip>
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ชจ๋“  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:
```bash
pip install -q datasets transformers evaluate timm albumentations
```
ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์—์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ ๐Ÿค— Datasets๊ณผ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๐Ÿค— Transformers, ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•ํ•˜๊ธฐ ์œ„ํ•œ `albumentations`๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
DETR ๋ชจ๋ธ์˜ ํ•ฉ์„ฑ๊ณฑ ๋ฐฑ๋ณธ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ˜„์žฌ `timm`์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๊ณ  ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก Hugging Face ๊ณ„์ •์— ๋กœ๊ทธ์ธํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ๊ฐ€ ๋‚˜ํƒ€๋‚˜๋ฉด ํ† ํฐ์„ ์ž…๋ ฅํ•˜์—ฌ ๋กœ๊ทธ์ธํ•˜์„ธ์š”:
```py
>>> from huggingface_hub import notebook_login
>>> notebook_login()
```
## CPPE-5 ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐ€์ ธ์˜ค๊ธฐ [[load-the-CPPE-5-dataset]]
[CPPE-5](https://huggingface.co/datasets/cppe-5) ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” COVID-19 ๋Œ€์œ ํ–‰ ์ƒํ™ฉ์—์„œ ์˜๋ฃŒ ์ „๋ฌธ์ธ๋ ฅ ๋ณดํ˜ธ ์žฅ๋น„(PPE)๋ฅผ ์‹๋ณ„ํ•˜๋Š” ์–ด๋…ธํ…Œ์ด์…˜์ด ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ฐ€์ ธ์˜ค์„ธ์š”:
```py
>>> from datasets import load_dataset
>>> cppe5 = load_dataset("cppe-5")
>>> cppe5
DatasetDict({
train: Dataset({
features: ['image_id', 'image', 'width', 'height', 'objects'],
num_rows: 1000
})
test: Dataset({
features: ['image_id', 'image', 'width', 'height', 'objects'],
num_rows: 29
})
})
```
์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” ํ•™์Šต ์„ธํŠธ ์ด๋ฏธ์ง€ 1,000๊ฐœ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ด๋ฏธ์ง€ 29๊ฐœ๋ฅผ ๊ฐ–๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ์— ์ต์ˆ™ํ•ด์ง€๊ธฐ ์œ„ํ•ด, ์˜ˆ์‹œ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด์„ธ์š”.
```py
>>> cppe5["train"][0]
{'image_id': 15,
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=943x663 at 0x7F9EC9E77C10>,
'width': 943,
'height': 663,
'objects': {'id': [114, 115, 116, 117],
'area': [3796, 1596, 152768, 81002],
'bbox': [[302.0, 109.0, 73.0, 52.0],
[810.0, 100.0, 57.0, 28.0],
[160.0, 31.0, 248.0, 616.0],
[741.0, 68.0, 202.0, 401.0]],
'category': [4, 4, 0, 0]}}
```
๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์žˆ๋Š” ์˜ˆ์‹œ๋Š” ๋‹ค์Œ์˜ ์˜์—ญ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:
- `image_id`: ์˜ˆ์‹œ ์ด๋ฏธ์ง€ id
- `image`: ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•˜๋Š” `PIL.Image.Image` ๊ฐ์ฒด
- `width`: ์ด๋ฏธ์ง€์˜ ๋„ˆ๋น„
- `height`: ์ด๋ฏธ์ง€์˜ ๋†’์ด
- `objects`: ์ด๋ฏธ์ง€ ์•ˆ์˜ ๊ฐ์ฒด๋“ค์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ:
- `id`: ์–ด๋…ธํ…Œ์ด์…˜ id
- `area`: ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ๋ฉด์ 
- `bbox`: ๊ฐ์ฒด์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ([COCO ํฌ๋งท](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco)์œผ๋กœ)
- `category`: ๊ฐ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ, ๊ฐ€๋Šฅํ•œ ๊ฐ’์œผ๋กœ๋Š” `Coverall (0)`, `Face_Shield (1)`, `Gloves (2)`, `Goggles (3)` ๋ฐ `Mask (4)` ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
`bbox` ํ•„๋“œ๊ฐ€ DETR ๋ชจ๋ธ์ด ์š”๊ตฌํ•˜๋Š” COCO ํ˜•์‹์„ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ `objects` ๋‚ด๋ถ€์˜ ํ•„๋“œ ๊ทธ๋ฃน์€ DETR์ด ์š”๊ตฌํ•˜๋Š” ์–ด๋…ธํ…Œ์ด์…˜ ํ˜•์‹๊ณผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ธฐ ์ „์— ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ํ•œ ๊ฐ€์ง€ ์˜ˆ์‹œ๋ฅผ ์‹œ๊ฐํ™”ํ•˜์„ธ์š”.
```py
>>> import numpy as np
>>> import os
>>> from PIL import Image, ImageDraw
>>> image = cppe5["train"][0]["image"]
>>> annotations = cppe5["train"][0]["objects"]
>>> draw = ImageDraw.Draw(image)
>>> categories = cppe5["train"].features["objects"].feature["category"].names
>>> id2label = {index: x for index, x in enumerate(categories, start=0)}
>>> label2id = {v: k for k, v in id2label.items()}
>>> for i in range(len(annotations["id"])):
... box = annotations["bbox"][i - 1]
... class_idx = annotations["category"][i - 1]
... x, y, w, h = tuple(box)
... draw.rectangle((x, y, x + w, y + h), outline="red", width=1)
... draw.text((x, y), id2label[class_idx], fill="white")
>>> image
```
<div class="flex justify-center">
<img src="https://i.imgur.com/TdaqPJO.png" alt="CPPE-5 Image Example"/>
</div>
๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ์—ฐ๊ฒฐ๋œ ๋ ˆ์ด๋ธ”์„ ์‹œ๊ฐํ™”ํ•˜๋ ค๋ฉด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ, ํŠนํžˆ `category` ํ•„๋“œ์—์„œ ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ ๋ ˆ์ด๋ธ” ID๋ฅผ ๋ ˆ์ด๋ธ” ํด๋ž˜์Šค์— ๋งคํ•‘ํ•˜๋Š” `id2label`๊ณผ ๋ฐ˜๋Œ€๋กœ ๋งคํ•‘ํ•˜๋Š” `label2id` ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋ธ์„ ์„ค์ •ํ•  ๋•Œ ์ด๋Ÿฌํ•œ ๋งคํ•‘์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋งคํ•‘์€ ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์—์„œ ๋ชจ๋ธ์„ ๊ณต์œ ํ–ˆ์„ ๋•Œ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•œ ์ตœ์ข… ๋‹จ๊ณ„๋กœ, ์ž ์žฌ์ ์ธ ๋ฌธ์ œ๋ฅผ ์ฐพ์•„๋ณด์„ธ์š”.
๊ฐ์ฒด ๊ฐ์ง€๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๊ฐ€ ์ด๋ฏธ์ง€์˜ ๊ฐ€์žฅ์ž๋ฆฌ๋ฅผ ๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ "๋„˜์–ด๊ฐ€๋Š” ๊ฒƒ(run away)"์€ ํ›ˆ๋ จ ์ค‘์— ์˜ค๋ฅ˜๋ฅผ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ธฐ์— ์ด ๋‹จ๊ณ„์—์„œ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ์—์„œ ์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.
```py
>>> remove_idx = [590, 821, 822, 875, 876, 878, 879]
>>> keep = [i for i in range(len(cppe5["train"])) if i not in remove_idx]
>>> cppe5["train"] = cppe5["train"].select(keep)
```
## ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ [[preprocess-the-data]]
๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ํ•˜๋ ค๋ฉด, ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐฉ์‹๊ณผ ์ •ํ™•ํ•˜๊ฒŒ ์ผ์น˜ํ•˜๋„๋ก ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
[`AutoImageProcessor`]๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ DETR ๋ชจ๋ธ์ด ํ•™์Šต์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” `pixel_values`, `pixel_mask`, ๊ทธ๋ฆฌ๊ณ  `labels`๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์ž‘์—…์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
์ด ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์—๋Š” ๊ฑฑ์ •ํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์†์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค:
- `image_mean = [0.485, 0.456, 0.406 ]`
- `image_std = [0.229, 0.224, 0.225]`
์ด ๊ฐ’๋“ค์€ ๋ชจ๋ธ ์‚ฌ์ „ ํ›ˆ๋ จ ์ค‘ ์ด๋ฏธ์ง€๋ฅผ ์ •๊ทœํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ‰๊ท ๊ณผ ํ‘œ์ค€ ํŽธ์ฐจ์ž…๋‹ˆ๋‹ค.
์ด ๊ฐ’๋“ค์€ ์ถ”๋ก  ๋˜๋Š” ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์ด๋ฏธ์ง€ ๋ชจ๋ธ์„ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •ํ•  ๋•Œ ๋ณต์ œํ•ด์•ผ ํ•˜๋Š” ์ค‘์š”ํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.
์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ๊ณผ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค.
```py
>>> from transformers import AutoImageProcessor
>>> checkpoint = "facebook/detr-resnet-50"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)
```
`image_processor`์— ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•˜๊ธฐ ์ „์—, ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋‘ ๊ฐ€์ง€ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
- ์ด๋ฏธ์ง€ ์ฆ๊ฐ•
- DETR ๋ชจ๋ธ์˜ ์š”๊ตฌ์— ๋งž๊ฒŒ ์–ด๋…ธํ…Œ์ด์…˜์„ ๋‹ค์‹œ ํฌ๋งทํŒ…
์ฒซ์งธ๋กœ, ๋ชจ๋ธ์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ ๋˜์ง€ ์•Š๋„๋ก ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค‘ ์•„๋ฌด๊ฑฐ๋‚˜ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ€ํ™˜์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ๋Š” [Albumentations](https://albumentations.ai/docs/) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค...
์ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ๋ณ€ํ™˜์„ ์ด๋ฏธ์ง€์— ์ ์šฉํ•˜๊ณ  ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ์—…๋ฐ์ดํŠธํ•˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.
๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฌธ์„œ์—๋Š” [๊ฐ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•ด ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ฐ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๊ฐ€์ด๋“œ](https://huggingface.co/docs/datasets/object_detection)๊ฐ€ ์žˆ์œผ๋ฉฐ,
์ด ์˜ˆ์ œ์™€ ์ •ํ™•ํžˆ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ๊ฐ ์ด๋ฏธ์ง€๋ฅผ (480, 480) ํฌ๊ธฐ๋กœ ์กฐ์ •ํ•˜๊ณ , ์ขŒ์šฐ๋กœ ๋’ค์ง‘๊ณ , ๋ฐ๊ธฐ๋ฅผ ๋†’์ด๋Š” ๋™์ผํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import albumentations
>>> import numpy as np
>>> import torch
>>> transform = albumentations.Compose(
... [
... albumentations.Resize(480, 480),
... albumentations.HorizontalFlip(p=1.0),
... albumentations.RandomBrightnessContrast(p=1.0),
... ],
... bbox_params=albumentations.BboxParams(format="coco", label_fields=["category"]),
... )
```
์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ์–ด๋…ธํ…Œ์ด์…˜์ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•์‹์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค: `{'image_id': int, 'annotations': List[Dict]}`, ์—ฌ๊ธฐ์„œ ๊ฐ ๋”•์…”๋„ˆ๋ฆฌ๋Š” COCO ๊ฐ์ฒด ์–ด๋…ธํ…Œ์ด์…˜์ž…๋‹ˆ๋‹ค. ๋‹จ์ผ ์˜ˆ์ œ์— ๋Œ€ํ•ด ์–ด๋…ธํ…Œ์ด์…˜์˜ ํ˜•์‹์„ ๋‹ค์‹œ ์ง€์ •ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:
```py
>>> def formatted_anns(image_id, category, area, bbox):
... annotations = []
... for i in range(0, len(category)):
... new_ann = {
... "image_id": image_id,
... "category_id": category[i],
... "isCrowd": 0,
... "area": area[i],
... "bbox": list(bbox[i]),
... }
... annotations.append(new_ann)
... return annotations
```
์ด์ œ ์ด๋ฏธ์ง€์™€ ์–ด๋…ธํ…Œ์ด์…˜ ์ „์ฒ˜๋ฆฌ ๋ณ€ํ™˜์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์˜ˆ์ œ ๋ฐฐ์น˜์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> # transforming a batch
>>> def transform_aug_ann(examples):
... image_ids = examples["image_id"]
... images, bboxes, area, categories = [], [], [], []
... for image, objects in zip(examples["image"], examples["objects"]):
... image = np.array(image.convert("RGB"))[:, :, ::-1]
... out = transform(image=image, bboxes=objects["bbox"], category=objects["category"])
... area.append(objects["area"])
... images.append(out["image"])
... bboxes.append(out["bboxes"])
... categories.append(out["category"])
... targets = [
... {"image_id": id_, "annotations": formatted_anns(id_, cat_, ar_, box_)}
... for id_, cat_, ar_, box_ in zip(image_ids, categories, area, bboxes)
... ]
... return image_processor(images=images, annotations=targets, return_tensors="pt")
```
์ด์ „ ๋‹จ๊ณ„์—์„œ ๋งŒ๋“  ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋ฅผ ๐Ÿค— Datasets์˜ [`~datasets.Dataset.with_transform`] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ ์ „์ฒด์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด ๋ฉ”์†Œ๋“œ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์š”์†Œ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ๋งˆ๋‹ค ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด ์‹œ์ ์—์„œ๋Š” ์ „์ฒ˜๋ฆฌ ํ›„ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ์˜ˆ์‹œ ํ•˜๋‚˜๋ฅผ ๊ฐ€์ ธ์™€์„œ ๋ณ€ํ™˜ ํ›„ ๋ชจ์–‘์ด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋•Œ, `pixel_values` ํ…์„œ, `pixel_mask` ํ…์„œ, ๊ทธ๋ฆฌ๊ณ  `labels`๋กœ ๊ตฌ์„ฑ๋œ ํ…์„œ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```py
>>> cppe5["train"] = cppe5["train"].with_transform(transform_aug_ann)
>>> cppe5["train"][15]
{'pixel_values': tensor([[[ 0.9132, 0.9132, 0.9132, ..., -1.9809, -1.9809, -1.9809],
[ 0.9132, 0.9132, 0.9132, ..., -1.9809, -1.9809, -1.9809],
[ 0.9132, 0.9132, 0.9132, ..., -1.9638, -1.9638, -1.9638],
...,
[-1.5699, -1.5699, -1.5699, ..., -1.9980, -1.9980, -1.9980],
[-1.5528, -1.5528, -1.5528, ..., -1.9980, -1.9809, -1.9809],
[-1.5528, -1.5528, -1.5528, ..., -1.9980, -1.9809, -1.9809]],
[[ 1.3081, 1.3081, 1.3081, ..., -1.8431, -1.8431, -1.8431],
[ 1.3081, 1.3081, 1.3081, ..., -1.8431, -1.8431, -1.8431],
[ 1.3081, 1.3081, 1.3081, ..., -1.8256, -1.8256, -1.8256],
...,
[-1.3179, -1.3179, -1.3179, ..., -1.8606, -1.8606, -1.8606],
[-1.3004, -1.3004, -1.3004, ..., -1.8606, -1.8431, -1.8431],
[-1.3004, -1.3004, -1.3004, ..., -1.8606, -1.8431, -1.8431]],
[[ 1.4200, 1.4200, 1.4200, ..., -1.6476, -1.6476, -1.6476],
[ 1.4200, 1.4200, 1.4200, ..., -1.6476, -1.6476, -1.6476],
[ 1.4200, 1.4200, 1.4200, ..., -1.6302, -1.6302, -1.6302],
...,
[-1.0201, -1.0201, -1.0201, ..., -1.5604, -1.5604, -1.5604],
[-1.0027, -1.0027, -1.0027, ..., -1.5604, -1.5430, -1.5430],
[-1.0027, -1.0027, -1.0027, ..., -1.5604, -1.5430, -1.5430]]]),
'pixel_mask': tensor([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]]),
'labels': {'size': tensor([800, 800]), 'image_id': tensor([756]), 'class_labels': tensor([4]), 'boxes': tensor([[0.7340, 0.6986, 0.3414, 0.5944]]), 'area': tensor([519544.4375]), 'iscrowd': tensor([0]), 'orig_size': tensor([480, 480])}}
```
๊ฐ๊ฐ์˜ ์ด๋ฏธ์ง€๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์ฆ๊ฐ•ํ•˜๊ณ  ์ด๋ฏธ์ง€์˜ ์–ด๋…ธํ…Œ์ด์…˜์„ ์ค€๋น„ํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์ „์ฒ˜๋ฆฌ๋Š” ์•„์ง ๋๋‚˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋กœ, ์ด๋ฏธ์ง€๋ฅผ ๋ฐฐ์น˜๋กœ ๋งŒ๋“ค ์‚ฌ์šฉ์ž ์ •์˜ `collate_fn`์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
ํ•ด๋‹น ๋ฐฐ์น˜์—์„œ ๊ฐ€์žฅ ํฐ ์ด๋ฏธ์ง€์— ์ด๋ฏธ์ง€(ํ˜„์žฌ `pixel_values` ์ธ)๋ฅผ ํŒจ๋“œํ•˜๊ณ , ์‹ค์ œ ํ”ฝ์…€(1)๊ณผ ํŒจ๋”ฉ(0)์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ์ƒˆ๋กœ์šด `pixel_mask`๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```py
>>> def collate_fn(batch):
... pixel_values = [item["pixel_values"] for item in batch]
... encoding = image_processor.pad(pixel_values, return_tensors="pt")
... labels = [item["labels"] for item in batch]
... batch = {}
... batch["pixel_values"] = encoding["pixel_values"]
... batch["pixel_mask"] = encoding["pixel_mask"]
... batch["labels"] = labels
... return batch
```
## DETR ๋ชจ๋ธ ํ•™์Šต์‹œํ‚ค๊ธฐ [[training-the-DETR-model]]
์ด์ „ ์„น์…˜์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ด์ œ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!
์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์ด๋ฏธ์ง€๋Š” ๋ฆฌ์‚ฌ์ด์ฆˆ ํ›„์—๋„ ์—ฌ์ „ํžˆ ์šฉ๋Ÿ‰์ด ํฌ๊ธฐ ๋•Œ๋ฌธ์—, ์ด ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ํ•˜๋ ค๋ฉด ์ ์–ด๋„ ํ•˜๋‚˜์˜ GPU๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
ํ•™์Šต์€ ๋‹ค์Œ์˜ ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค:
1. [`AutoModelForObjectDetection`]์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒ˜๋ฆฌ์™€ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
2. [`TrainingArguments`]์—์„œ ํ•™์Šต ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
3. ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ์„ธํŠธ, ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ ๋ฐ ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ์™€ ํ•จ๊ป˜ [`Trainer`]์— ํ›ˆ๋ จ ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
4. [`~Trainer.train`]๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ํ•ฉ๋‹ˆ๋‹ค.
์ „์ฒ˜๋ฆฌ์— ์‚ฌ์šฉํ•œ ์ฒดํฌํฌ์ธํŠธ์™€ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์˜ฌ ๋•Œ, ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์—์„œ ๋งŒ๋“  `label2id`์™€ `id2label` ๋งคํ•‘์„ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ, `ignore_mismatched_sizes=True`๋ฅผ ์ง€์ •ํ•˜์—ฌ ๊ธฐ์กด ๋ถ„๋ฅ˜ ํ—ค๋“œ(๋ชจ๋ธ์—์„œ ๋ถ„๋ฅ˜์— ์‚ฌ์šฉ๋˜๋Š” ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด)๋ฅผ ์ƒˆ ๋ถ„๋ฅ˜ ํ—ค๋“œ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค.
```py
>>> from transformers import AutoModelForObjectDetection
>>> model = AutoModelForObjectDetection.from_pretrained(
... checkpoint,
... id2label=id2label,
... label2id=label2id,
... ignore_mismatched_sizes=True,
... )
```
[`TrainingArguments`]์—์„œ `output_dir`์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ €์žฅํ•  ์œ„์น˜๋ฅผ ์ง€์ •ํ•œ ๋‹ค์Œ, ํ•„์š”์— ๋”ฐ๋ผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌ์„ฑํ•˜์„ธ์š”.
์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์—ด์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ `remove_unused_columns`๊ฐ€ `True`์ผ ๊ฒฝ์šฐ ์ด๋ฏธ์ง€ ์—ด์ด ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€ ์—ด์ด ์—†๋Š” ๊ฒฝ์šฐ `pixel_values`๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— `remove_unused_columns`๋ฅผ `False`๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋ธ์„ Hub์— ์—…๋กœ๋“œํ•˜์—ฌ ๊ณต์œ ํ•˜๋ ค๋ฉด `push_to_hub`๋ฅผ `True`๋กœ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค(ํ—ˆ๊น…ํŽ˜์ด์Šค์— ๋กœ๊ทธ์ธํ•˜์—ฌ ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค).
```py
>>> from transformers import TrainingArguments
>>> training_args = TrainingArguments(
... output_dir="detr-resnet-50_finetuned_cppe5",
... per_device_train_batch_size=8,
... num_train_epochs=10,
... fp16=True,
... save_steps=200,
... logging_steps=50,
... learning_rate=1e-5,
... weight_decay=1e-4,
... save_total_limit=2,
... remove_unused_columns=False,
... push_to_hub=True,
... )
```
๋งˆ์ง€๋ง‰์œผ๋กœ `model`, `training_args`, `collate_fn`, `image_processor`์™€ ๋ฐ์ดํ„ฐ ์„ธํŠธ(`cppe5`)๋ฅผ ๋ชจ๋‘ ๊ฐ€์ ธ์˜จ ํ›„, [`~transformers.Trainer.train`]๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค.
```py
>>> from transformers import Trainer
>>> trainer = Trainer(
... model=model,
... args=training_args,
... data_collator=collate_fn,
... train_dataset=cppe5["train"],
... tokenizer=image_processor,
... )
>>> trainer.train()
```
`training_args`์—์„œ `push_to_hub`๋ฅผ `True`๋กœ ์„ค์ •ํ•œ ๊ฒฝ์šฐ, ํ•™์Šต ์ฒดํฌํฌ์ธํŠธ๋Š” ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.
ํ•™์Šต ์™„๋ฃŒ ํ›„, [`~transformers.Trainer.push_to_hub`] ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ตœ์ข… ๋ชจ๋ธ์„ ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
```py
>>> trainer.push_to_hub()
```
## ํ‰๊ฐ€ํ•˜๊ธฐ [[evaluate]]
๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ผ๋ จ์˜ <a href="https://cocodataset.org/#detection-eval">COCO-์Šคํƒ€์ผ ์ง€ํ‘œ</a>๋กœ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค.
๊ธฐ์กด์— ๊ตฌํ˜„๋œ ํ‰๊ฐ€ ์ง€ํ‘œ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์—ฌ๊ธฐ์—์„œ๋Š” ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•œ ์ตœ์ข… ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ `torchvision`์—์„œ ์ œ๊ณตํ•˜๋Š” ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
`torchvision` ํ‰๊ฐ€์ž(evaluator)๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์‹ค์ธก๊ฐ’์ธ COCO ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ค€๋น„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
COCO ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋นŒ๋“œํ•˜๋Š” API๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ • ํ˜•์‹์œผ๋กœ ์ €์žฅํ•ด์•ผ ํ•˜๋ฏ€๋กœ, ๋จผ์ € ์ด๋ฏธ์ง€์™€ ์–ด๋…ธํ…Œ์ด์…˜์„ ๋””์Šคํฌ์— ์ €์žฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
ํ•™์Šต์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•  ๋•Œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, cppe5["test"]์—์„œ์˜ ์–ด๋…ธํ…Œ์ด์…˜์€ ํฌ๋งท์„ ๋งž์ถฐ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋ฏธ์ง€๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
ํ‰๊ฐ€ ๋‹จ๊ณ„๋Š” ์•ฝ๊ฐ„์˜ ์ž‘์—…์ด ํ•„์š”ํ•˜์ง€๋งŒ, ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋จผ์ €, `cppe5["test"]` ์„ธํŠธ๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค: ์–ด๋…ธํ…Œ์ด์…˜์„ ํฌ๋งท์— ๋งž๊ฒŒ ๋งŒ๋“ค๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๋””์Šคํฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
```py
>>> import json
>>> # format annotations the same as for training, no need for data augmentation
>>> def val_formatted_anns(image_id, objects):
... annotations = []
... for i in range(0, len(objects["id"])):
... new_ann = {
... "id": objects["id"][i],
... "category_id": objects["category"][i],
... "iscrowd": 0,
... "image_id": image_id,
... "area": objects["area"][i],
... "bbox": objects["bbox"][i],
... }
... annotations.append(new_ann)
... return annotations
>>> # Save images and annotations into the files torchvision.datasets.CocoDetection expects
>>> def save_cppe5_annotation_file_images(cppe5):
... output_json = {}
... path_output_cppe5 = f"{os.getcwd()}/cppe5/"
... if not os.path.exists(path_output_cppe5):
... os.makedirs(path_output_cppe5)
... path_anno = os.path.join(path_output_cppe5, "cppe5_ann.json")
... categories_json = [{"supercategory": "none", "id": id, "name": id2label[id]} for id in id2label]
... output_json["images"] = []
... output_json["annotations"] = []
... for example in cppe5:
... ann = val_formatted_anns(example["image_id"], example["objects"])
... output_json["images"].append(
... {
... "id": example["image_id"],
... "width": example["image"].width,
... "height": example["image"].height,
... "file_name": f"{example['image_id']}.png",
... }
... )
... output_json["annotations"].extend(ann)
... output_json["categories"] = categories_json
... with open(path_anno, "w") as file:
... json.dump(output_json, file, ensure_ascii=False, indent=4)
... for im, img_id in zip(cppe5["image"], cppe5["image_id"]):
... path_img = os.path.join(path_output_cppe5, f"{img_id}.png")
... im.save(path_img)
... return path_output_cppe5, path_anno
```
๋‹ค์Œ์œผ๋กœ, `cocoevaluator`์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” `CocoDetection` ํด๋ž˜์Šค์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค.
```py
>>> import torchvision
>>> class CocoDetection(torchvision.datasets.CocoDetection):
... def __init__(self, img_folder, image_processor, ann_file):
... super().__init__(img_folder, ann_file)
... self.image_processor = image_processor
... def __getitem__(self, idx):
... # read in PIL image and target in COCO format
... img, target = super(CocoDetection, self).__getitem__(idx)
... # preprocess image and target: converting target to DETR format,
... # resizing + normalization of both image and target)
... image_id = self.ids[idx]
... target = {"image_id": image_id, "annotations": target}
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
... target = encoding["labels"][0] # remove batch dimension
... return {"pixel_values": pixel_values, "labels": target}
>>> im_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> path_output_cppe5, path_anno = save_cppe5_annotation_file_images(cppe5["test"])
>>> test_ds_coco_format = CocoDetection(path_output_cppe5, im_processor, path_anno)
```
๋งˆ์ง€๋ง‰์œผ๋กœ, ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ฐ€์ ธ์™€์„œ ํ‰๊ฐ€๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
```py
>>> import evaluate
>>> from tqdm import tqdm
>>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> module = evaluate.load("ybelkada/cocoevaluate", coco=test_ds_coco_format.coco)
>>> val_dataloader = torch.utils.data.DataLoader(
... test_ds_coco_format, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn
... )
>>> with torch.no_grad():
... for idx, batch in enumerate(tqdm(val_dataloader)):
... pixel_values = batch["pixel_values"]
... pixel_mask = batch["pixel_mask"]
... labels = [
... {k: v for k, v in t.items()} for t in batch["labels"]
... ] # these are in DETR format, resized + normalized
... # forward pass
... outputs = model(pixel_values=pixel_values, pixel_mask=pixel_mask)
... orig_target_sizes = torch.stack([target["orig_size"] for target in labels], dim=0)
... results = im_processor.post_process(outputs, orig_target_sizes) # convert outputs of model to Pascal VOC format (xmin, ymin, xmax, ymax)
... module.add(prediction=results, reference=labels)
... del batch
>>> results = module.compute()
>>> print(results)
Accumulating evaluation results...
DONE (t=0.08s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.352
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.681
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.292
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.168
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.208
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.274
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.484
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.323
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590
```
์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” [`~transformers.TrainingArguments`]์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋”์šฑ ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ์‹œ๋„ํ•ด ๋ณด์„ธ์š”!
## ์ถ”๋ก ํ•˜๊ธฐ [[inference]]
DETR ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ • ๋ฐ ํ‰๊ฐ€ํ•˜๊ณ , ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œ ํ–ˆ์œผ๋ฏ€๋กœ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ์„ ์ถ”๋ก ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ [`pipeline`]์—์„œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ธ์Šคํ„ด์Šคํ™”ํ•˜๊ณ , ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•˜์„ธ์š”:
```py
>>> from transformers import pipeline
>>> import requests
>>> url = "https://i.imgur.com/2lnWoly.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> obj_detector = pipeline("object-detection", model="devonho/detr-resnet-50_finetuned_cppe5")
>>> obj_detector(image)
```
๋งŒ์•ฝ ์›ํ•œ๋‹ค๋ฉด ์ˆ˜๋™์œผ๋กœ `pipeline`์˜ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> image_processor = AutoImageProcessor.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> model = AutoModelForObjectDetection.from_pretrained("devonho/detr-resnet-50_finetuned_cppe5")
>>> with torch.no_grad():
... inputs = image_processor(images=image, return_tensors="pt")
... outputs = model(**inputs)
... target_sizes = torch.tensor([image.size[::-1]])
... results = image_processor.post_process_object_detection(outputs, threshold=0.5, target_sizes=target_sizes)[0]
>>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
... box = [round(i, 2) for i in box.tolist()]
... print(
... f"Detected {model.config.id2label[label.item()]} with confidence "
... f"{round(score.item(), 3)} at location {box}"
... )
Detected Coverall with confidence 0.566 at location [1215.32, 147.38, 4401.81, 3227.08]
Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.9]
```
๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค:
```py
>>> draw = ImageDraw.Draw(image)
>>> for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
... box = [round(i, 2) for i in box.tolist()]
... x, y, x2, y2 = tuple(box)
... draw.rectangle((x, y, x2, y2), outline="red", width=1)
... draw.text((x, y), model.config.id2label[label.item()], fill="white")
>>> image
```
<div class="flex justify-center">
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
</div>