transformers / docs /source /ko /image_processors.md
AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ(Image processor) [[image-processors]]

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ํ”ฝ์…€ ๊ฐ’, ์ฆ‰ ์ด๋ฏธ์ง€์˜ ์ƒ‰์ƒ๊ณผ ํฌ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ”ฝ์…€ ๊ฐ’์€ ๋น„์ „ ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ธ์‹ํ•˜๋ ค๋ฉด ์ž…๋ ฅ๋˜๋Š” ์ด๋ฏธ์ง€์˜ ํ˜•์‹์ด ํ•™์Šต ๋‹น์‹œ ์‚ฌ์šฉํ–ˆ๋˜ ๋ฐ์ดํ„ฐ์™€ ๋˜‘๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ž‘์—…์„ ํ†ตํ•ด ์ด๋ฏธ์ง€ ํ˜•์‹์„ ํ†ต์ผ์‹œ์ผœ์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

  • ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ•˜๋Š” [~BaseImageProcessor.center_crop]
  • ํ”ฝ์…€ ๊ฐ’์„ ์ •๊ทœํ™”ํ•˜๋Š” [~BaseImageProcessor.normalize] ๋˜๋Š” ํฌ๊ธฐ๋ฅผ ์žฌ์กฐ์ •ํ•˜๋Š” [~BaseImageProcessor.rescale]

Hugging Face Hub๋‚˜ ๋กœ์ปฌ ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ๋Š” ๋น„์ „ ๋ชจ๋ธ์—์„œ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์˜ ์„ค์ •(์ด๋ฏธ์ง€ ํฌ๊ธฐ, ์ •๊ทœํ™” ๋ฐ ๋ฆฌ์‚ฌ์ด์ฆˆ ์—ฌ๋ถ€ ๋“ฑ)์„ ๋ถˆ๋Ÿฌ์˜ค๋ ค๋ฉด [~ImageProcessingMixin.from_pretrained]๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. ๊ฐ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์„ค์ •์€ preprocessor_config.json ํŒŒ์ผ์— ์ €์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")

์ด๋ฏธ์ง€๋ฅผ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์— ์ „๋‹ฌํ•˜์—ฌ ํ”ฝ์…€ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , return_tensors="pt" ๋ฅผ ์„ค์ •ํ•˜์—ฌ PyTorch ํ…์„œ๋ฅผ ๋ฐ˜ํ™˜๋ฐ›์œผ์„ธ์š”. ์ด๋ฏธ์ง€๊ฐ€ ํ…์„œ๋กœ ์–ด๋–ป๊ฒŒ ๋ณด์ด๋Š”์ง€ ๊ถ๊ธˆํ•˜๋‹ค๋ฉด ์ž…๋ ฅ๊ฐ’์„ ํ•œ๋ฒˆ ์ถœ๋ ฅํ•ด๋ณด์‹œ๋Š”๊ฑธ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค!

from PIL import Image
import requests

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/image_processor_example.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
inputs = image_processor(image, return_tensors="pt")

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ ํด๋ž˜์Šค์™€ ๋น„์ „ ๋ชจ๋ธ์„ ์œ„ํ•œ ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋‹ค๋ฃฐ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ ํด๋ž˜์Šค(Image processor classes) [[image-processor-classes]]

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋“ค์€ [~BaseImageProcessor.center_crop], [~BaseImageProcessor.normalize], [~BaseImageProcessor.rescale] ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•˜๋Š” [BaseImageProcessor] ํด๋ž˜์Šค๋ฅผ ์ƒ์†๋ฐ›์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์—๋Š” ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • [BaseImageProcessor]๋Š” ํŒŒ์ด์ฌ ๊ธฐ๋ฐ˜ ๊ตฌํ˜„์ฒด์ž…๋‹ˆ๋‹ค.
  • [BaseImageProcessorFast]๋Š” ๋” ๋น ๋ฅธ torchvision-backed ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. torch.Tensor์ž…๋ ฅ์˜ ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ์‹œ ์ตœ๋Œ€ 33๋ฐฐ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [BaseImageProcessorFast]๋Š” ํ˜„์žฌ ๋ชจ๋“  ๋น„์ „ ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์˜ API ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์ง€์› ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.

๊ฐ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ [~ImageProcessingMixin.from_pretrained]์™€ [~ImageProcessingMixin.save_pretrained] ๋ฉ”์†Œ๋“œ๋ฅผ ์ œ๊ณตํ•˜๋Š” [ImageProcessingMixin] ํด๋ž˜์Šค๋ฅผ ์ƒ์†๋ฐ›์•„ ๊ธฐ๋Šฅ์„ ํ™•์žฅ์‹œํ‚ต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์€ [AutoImageProcessor]๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๋ชจ๋ธ๋ณ„ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹ ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

AutoClass API๋Š” ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๊ฐ€ ์–ด๋–ค ๋ชจ๋ธ๊ณผ ์—ฐ๊ด€๋˜์–ด ์žˆ๋Š”์ง€ ์ง์ ‘ ์ง€์ •ํ•˜์ง€ ์•Š๊ณ ๋„ ํŽธ๋ฆฌํ•˜๊ฒŒ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

[~AutoImageProcessor.from_pretrained]๋ฅผ ์‚ฌ์šฉํ•ด ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ๋น ๋ฅธ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด use_fast=True๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True)

๊ฐ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ํŠน์ • ๋น„์ „ ๋ชจ๋ธ์— ๋งž์ถฐ์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ”„๋กœ์„ธ์„œ์˜ ์„ค์ • ํŒŒ์ผ์—๋Š” ํ•ด๋‹น ๋ชจ๋ธ์ด ํ•„์š”๋กœ ํ•˜๋Š” ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋‚˜ ์ •๊ทœํ™”, ๋ฆฌ์‚ฌ์ด์ฆˆ ์ ์šฉ ์—ฌ๋ถ€ ๊ฐ™์€ ์ •๋ณด๊ฐ€ ๋‹ด๊ฒจ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ๋ชจ๋ธ๋ณ„ ํด๋ž˜์Šค์—์„œ ์ง์ ‘ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋” ๋น ๋ฅธ ๋ฒ„์ „์˜ ์ง€์› ์—ฌ๋ถ€๋Š” ํ•ด๋‹น ๋ชจ๋ธ์˜ API ๋ฌธ์„œ์—์„œ ํ™•์ธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

from transformers import ViTImageProcessor

image_processor = ViTImageProcessor.from_pretrained("google/vit-base-patch16-224")

๋น ๋ฅธ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด fast ๊ตฌํ˜„ ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•ด๋ณด์„ธ์š”.

from transformers import ViTImageProcessorFast

image_processor = ViTImageProcessorFast.from_pretrained("google/vit-base-patch16-224")

๋น ๋ฅธ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ(Fast image processors) [[fast-image-processors]]

[BaseImageProcessorFast]๋Š” torchvision์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ํŠนํžˆ GPU์—์„œ ์ฒ˜๋ฆฌํ•  ๋•Œ ์†๋„๊ฐ€ ํ›จ์”ฌ ๋น ๋ฆ…๋‹ˆ๋‹ค. ์ด ํด๋ž˜์Šค๋Š” ๊ธฐ์กด [BaseImageProcessor]์™€ ์™„์ „ํžˆ ๋™์ผํ•˜๊ฒŒ ์„ค๊ณ„๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ชจ๋ธ์ด ์ง€์›ํ•œ๋‹ค๋ฉด ๋ณ„๋„ ์ˆ˜์ • ์—†์ด ๋ฐ”๋กœ ๊ต์ฒดํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. torchvision์„ ์„ค์น˜ํ•œ ๋’ค use_fast ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ True๋กœ ์ง€์ •ํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)

device ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ์–ด๋А ์žฅ์น˜์—์„œ ์ฒ˜๋ฆฌํ• ์ง€ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ž…๋ ฅ๊ฐ’์ด ํ…์„œ(tensor)๋ผ๋ฉด ๊ทธ ํ…์„œ์™€ ๋™์ผํ•œ ์žฅ์น˜์—์„œ, ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ CPU์—์„œ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๋น ๋ฅธ ํ”„๋กœ์„ธ์„œ๋ฅผ GPU์—์„œ ์‚ฌ์šฉํ•˜๋„๋ก ์„ค์ •ํ•˜๋Š” ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค.

from torchvision.io import read_image
from transformers import DetrImageProcessorFast

images = read_image("image.jpg")
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
images_processed = processor(images, return_tensors="pt", device="cuda")
Benchmarks

์ด ๋ฒค์น˜๋งˆํฌ๋Š” NVIDIA A10G Tensor Core GPU๊ฐ€ ์žฅ์ฐฉ๋œ AWS EC2 g5.2xlarge ์ธ์Šคํ„ด์Šค์—์„œ ์ธก์ •๋œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

์ „์ฒ˜๋ฆฌ(Preprocess) [[preprocess]]

Transformers์˜ ๋น„์ „ ๋ชจ๋ธ์€ ์ž…๋ ฅ๊ฐ’์œผ๋กœ PyTorch ํ…์„œ ํ˜•ํƒœ์˜ ํ”ฝ์…€ ๊ฐ’์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋ฐ”๋กœ ์ด ํ”ฝ์…€ ๊ฐ’ ํ…์„œ(๋ฐฐ์น˜ ํฌ๊ธฐ, ์ฑ„๋„ ์ˆ˜, ๋†’์ด, ๋„ˆ๋น„)๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๋ชจ๋ธ์ด ์š”๊ตฌํ•˜๋Š” ํฌ๊ธฐ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์กฐ์ ˆํ•˜๊ณ , ํ”ฝ์…€ ๊ฐ’ ๋˜ํ•œ ๋ชจ๋ธ ๊ธฐ์ค€์— ๋งž์ถฐ ์ •๊ทœํ™”ํ•˜๊ฑฐ๋‚˜ ์žฌ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ๋Š” ์ด๋ฏธ์ง€ ์ฆ๊ฐ•๊ณผ๋Š” ๋‹ค๋ฅธ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ์ฆ๊ฐ•์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋Š˜๋ฆฌ๊ฑฐ๋‚˜ ๊ณผ์ ํ•ฉ์„ ๋ง‰๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€์— ์˜๋„์ ์ธ ๋ณ€ํ™”(๋ฐ๊ธฐ, ์ƒ‰์ƒ, ํšŒ์ „ ๋“ฑ)๋ฅผ ์ฃผ๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์š”๊ตฌํ•˜๋Š” ์ž…๋ ฅ ํ˜•์‹์— ์ •ํ™•ํžˆ ๋งž์ถฐ์ฃผ๋Š” ์ž‘์—…์—๋งŒ ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ ๋ชจ๋ธ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, ์ด๋ฏธ์ง€๋Š” ๋ณดํ†ต ์ฆ๊ฐ• ๊ณผ์ •์„ ๊ฑฐ์นœ ๋’ค ์ „์ฒ˜๋ฆฌ๋˜์–ด ๋ชจ๋ธ์— ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์ฆ๊ฐ• ์ž‘์—…์€ Albumentations, Kornia) ์™€ ๊ฐ™์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ดํ›„ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

์ด๋ฒˆ ๊ฐ€์ด๋“œ์—์„œ๋Š” ์ด๋ฏธ์ง€ ์ฆ๊ฐ•์„ ์œ„ํ•ด torchvision์˜ transforms ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์šฐ์„  food101 ๋ฐ์ดํ„ฐ์…‹์˜ ์ผ๋ถ€๋งŒ ์ƒ˜ํ”Œ๋กœ ๋ถˆ๋Ÿฌ์™€์„œ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

from datasets import load_dataset

dataset = load_dataset("ethz/food101", split="train[:100]")

transforms ๋ชจ๋“ˆ์˜ ComposeAPI๋Š” ์—ฌ๋Ÿฌ ๋ณ€ํ™˜์„ ํ•˜๋‚˜๋กœ ๋ฌถ์–ด์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ž๋ฅด๊ณ  ๋ฆฌ์‚ฌ์ด์ฆˆํ•˜๋Š” RandomResizedCrop๊ณผ ์ƒ‰์ƒ์„ ๋ฌด์ž‘์œ„๋กœ ๋ฐ”๊พธ๋Š” ColorJitter๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด๋•Œ ์ž˜๋ผ๋‚ผ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋Š” ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์—์„œ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์— ๋”ฐ๋ผ ์ •ํ™•ํ•œ ๋†’์ด์™€ ๋„ˆ๋น„๊ฐ€ ํ•„์š”ํ•  ๋•Œ๋„ ์žˆ๊ณ , ๊ฐ€์žฅ ์งง์€ ๋ณ€ shortest_edge ๊ฐ’๋งŒ ํ•„์š”ํ•  ๋•Œ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

from torchvision.transforms import RandomResizedCrop, ColorJitter, Compose

size = (
    image_processor.size["shortest_edge"]
    if "shortest_edge" in image_processor.size
    else (image_processor.size["height"], image_processor.size["width"])
)
_transforms = Compose([RandomResizedCrop(size), ColorJitter(brightness=0.5, hue=0.5)])

์ค€๋น„๋œ ๋ณ€ํ™˜๊ฐ’ ๋“ค์„ ์ด๋ฏธ์ง€์— ์ ์šฉํ•˜๊ณ , RGB ํ˜•์‹์œผ๋กœ ๋ฐ”๊ฟ”์ค๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ, ์ด๋ ‡๊ฒŒ ์ฆ๊ฐ•๋œ ์ด๋ฏธ์ง€๋ฅผ ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ์— ๋„ฃ์–ด ํ”ฝ์…€ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ do_resizeํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ False๋กœ ์„ค์ •ํ•œ ์ด์œ ๋Š”, ์•ž์„  ์ฆ๊ฐ• ๋‹จ๊ณ„์—์„œ RandomResizedCrop์„ ํ†ตํ•ด ์ด๋ฏธ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ฆ๊ฐ• ๊ณผ์ •์„ ์ƒ๋žตํ•œ๋‹ค๋ฉด, ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” image_mean๊ณผ image_std๊ฐ’(์ „์ฒ˜๋ฆฌ๊ธฐ ์„ค์ • ํŒŒ์ผ์— ์ €์žฅ๋จ)์„ ์‚ฌ์šฉํ•ด ์ž๋™์œผ๋กœ ๋ฆฌ์‚ฌ์ด์ฆˆ์™€ ์ •๊ทœํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

def transforms(examples):
    images = [_transforms(img.convert("RGB")) for img in examples["image"]]
    examples["pixel_values"] = image_processor(images, do_resize=False, return_tensors="pt")["pixel_values"]
    return examples

[~datasets.Dataset.set_transform]์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ฒฐํ•ฉ๋œ ์ฆ๊ฐ• ๋ฐ ์ „์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์„ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

dataset.set_transform(transforms)

์ด์ œ ์ฒ˜๋ฆฌ๋œ ํ”ฝ์…€ ๊ฐ’์„ ๋‹ค์‹œ ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ฆ๊ฐ• ๋ฐ ์ „์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋‚˜์™”๋Š”์ง€ ์ง์ ‘ ํ™•์ธํ•ด ๋ด…์‹œ๋‹ค.

import numpy as np
import matplotlib.pyplot as plt

img = dataset[0]["pixel_values"]
plt.imshow(img.permute(1, 2, 0))
์ด์ „
์ดํ›„

์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋Š” ์ „์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๊ฐ์ฒด ํƒ์ง€๋‚˜ ๋ถ„ํ• ๊ณผ ๊ฐ™์€ ๋น„์ „ ์ž‘์—…์—์„œ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๊ฐ’์„ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋‚˜ ๋ถ„ํ•  ๋งต์ฒ˜๋Ÿผ ์˜๋ฏธ ์žˆ๋Š” ์˜ˆ์ธก์œผ๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” ํ›„์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ๋„ ๊ฐ–์ถ”๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํŒจ๋”ฉ(Padding) [[padding]]

DETR๊ณผ ๊ฐ™์€ ์ผ๋ถ€ ๋ชจ๋ธ์€ ํ›ˆ๋ จ ์ค‘์— scale augmentation์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•œ ๋ฐฐ์น˜ ๋‚ด์— ํฌํ•จ๋œ ์ด๋ฏธ์ง€๋“ค์˜ ํฌ๊ธฐ๊ฐ€ ์ œ๊ฐ๊ฐ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„์‹œ๋‹ค์‹œํ”ผ ํฌ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋“ค์€ ํ•˜๋‚˜์˜ ๋ฐฐ์น˜๋กœ ๋ฌถ์„ ์ˆ˜ ์—†์ฃ .

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋ฉด ์ด๋ฏธ์ง€์— ํŠน์ˆ˜ ํŒจ๋”ฉ ํ† ํฐ์ธ 0์„ ์ฑ„์›Œ ๋„ฃ์–ด ํฌ๊ธฐ๋ฅผ ํ†ต์ผ์‹œ์ผœ์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. pad ๋ฉ”์†Œ๋“œ๋กœ ํŒจ๋”ฉ์„ ์ ์šฉํ•˜๊ณ , ์ด๋ ‡๊ฒŒ ํฌ๊ธฐ๊ฐ€ ํ†ต์ผ๋œ ์ด๋ฏธ์ง€๋“ค์„ ๋ฐฐ์น˜๋กœ ๋ฌถ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ์ž ์ •์˜ collate ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ์‚ฌ์šฉํ•˜์„ธ์š”.

def collate_fn(batch):
    pixel_values = [item["pixel_values"] for item in batch]
    encoding = image_processor.pad(pixel_values, return_tensors="pt")
    labels = [item["labels"] for item in batch]
    batch = {}
    batch["pixel_values"] = encoding["pixel_values"]
    batch["pixel_mask"] = encoding["pixel_mask"]
    batch["labels"] = labels
    return batch