PythonProject1 / .venv /transformers /docs /source /ko /tasks /image_feature_extraction.md
DrDavis's picture
Upload folder using huggingface_hub
17c6d62 verified

์ด๋ฏธ์ง€ ํŠน์ง• ์ถ”์ถœ[[image-feature-extraction]]

[[open-in-colab]]

์ด๋ฏธ์ง€ ํŠน์ง• ์ถ”์ถœ์€ ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€์—์„œ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์˜๋ฏธ ์žˆ๋Š” ํŠน์ง•์„ ์ถ”์ถœํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ด๋ฏธ์ง€ ์œ ์‚ฌ์„ฑ ๋ฐ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋“ฑ ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ๋Œ€๋ถ€๋ถ„์˜ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€ ํŠน์ง• ์ถ”์ถœ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์—ฌ๊ธฐ์„œ ์ž‘์—… ํŠนํ™” ํ—ค๋“œ(์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ๋ฌผ์ฒด ๊ฐ์ง€ ๋“ฑ)๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ํŠน์ง•์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์ง•์€ ๊ฐ€์žฅ์ž๋ฆฌ ๊ฐ์ง€, ๋ชจ์„œ๋ฆฌ ๊ฐ์ง€ ๋“ฑ ๊ณ ์ฐจ์› ์ˆ˜์ค€์—์„œ ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ชจ๋ธ์˜ ๊นŠ์ด์— ๋”ฐ๋ผ ์‹ค์ œ ์„ธ๊ณ„์— ๋Œ€ํ•œ ์ •๋ณด(์˜ˆ: ๊ณ ์–‘์ด๊ฐ€ ์–ด๋–ป๊ฒŒ ์ƒ๊ฒผ๋Š”์ง€)๋ฅผ ํฌํ•จํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿฌํ•œ ์ถœ๋ ฅ์€ ํŠน์ • ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š”:

  • image-feature-extraction ํŒŒ์ดํ”„๋ผ์ธ์„ ํ™œ์šฉํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ์ด๋ฏธ์ง€ ์œ ์‚ฌ์„ฑ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์›๋‹ˆ๋‹ค.
  • ๊ธฐ๋ณธ ๋ชจ๋ธ ์ถ”๋ก ์œผ๋กœ ๋™์ผํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

image-feature-extraction ํŒŒ์ดํ”„๋ผ์ธ์„ ์ด์šฉํ•œ ์ด๋ฏธ์ง€ ์œ ์‚ฌ์„ฑ[[image-similarity-using-image-feature-extraction-pipeline]]

๋ฌผ๊ณ ๊ธฐ ๊ทธ๋ฌผ ์œ„์— ์•‰์•„ ์žˆ๋Š” ๋‘ ์žฅ์˜ ๊ณ ์–‘์ด ์‚ฌ์ง„์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ค‘ ํ•˜๋‚˜๋Š” ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค.

from PIL import Image
import requests

img_urls = ["https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.jpeg"]
image_real = Image.open(requests.get(img_urls[0], stream=True).raw).convert("RGB")
image_gen = Image.open(requests.get(img_urls[1], stream=True).raw).convert("RGB")

ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•ด ๋ด…์‹œ๋‹ค. ๋จผ์ € ํŒŒ์ดํ”„๋ผ์ธ์„ ์ดˆ๊ธฐํ™”ํ•˜์„ธ์š”. ๋ชจ๋ธ์„ ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด, ํŒŒ์ดํ”„๋ผ์ธ์€ ์ž๋™์œผ๋กœ google/vit-base-patch16-224 ๋ชจ๋ธ๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค. ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋ ค๋ฉด pool์„ True๋กœ ์„ค์ •ํ•˜์„ธ์š”.

import torch
from transformers import pipeline

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pipe = pipeline(task="image-feature-extraction", model_name="google/vit-base-patch16-384", device=DEVICE, pool=True)

pipe๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก ํ•˜๋ ค๋ฉด ๋‘ ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋‘ ์ „๋‹ฌํ•˜์„ธ์š”.

outputs = pipe([image_real, image_gen])

์ถœ๋ ฅ์—๋Š” ๋‘ ์ด๋ฏธ์ง€์˜ ํ’€๋ง๋œ(pooled) ์ž„๋ฒ ๋”ฉ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

# ๋‹จ์ผ ์ถœ๋ ฅ์˜ ๊ธธ์ด ๊ตฌํ•˜๊ธฐ
print(len(outputs[0][0]))
# ์ถœ๋ ฅ ๊ฒฐ๊ณผ ํ‘œ์‹œํ•˜๊ธฐ
print(outputs)

# 768
# [[[-0.03909236937761307, 0.43381670117378235, -0.06913255900144577,

์œ ์‚ฌ๋„ ์ ์ˆ˜๋ฅผ ์–ป์œผ๋ ค๋ฉด, ์ด๋“ค์„ ์œ ์‚ฌ๋„ ํ•จ์ˆ˜์— ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

from torch.nn.functional import cosine_similarity

similarity_score = cosine_similarity(torch.Tensor(outputs[0]),
                                     torch.Tensor(outputs[1]), dim=1)

print(similarity_score)

# tensor([0.6043])

ํ’€๋ง ์ด์ „์˜ ๋งˆ์ง€๋ง‰ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์–ป๊ณ  ์‹ถ๋‹ค๋ฉด, pool ๋งค๊ฐœ๋ณ€์ˆ˜์— ์•„๋ฌด ๊ฐ’๋„ ์ „๋‹ฌํ•˜์ง€ ๋งˆ์„ธ์š”. ๋˜ํ•œ, ๊ธฐ๋ณธ๊ฐ’์€ False๋กœ ์„ค์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์€๋‹‰ ์ƒํƒœ๋Š” ๋ชจ๋ธ์˜ ํŠน์ง•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ถ„๋ฅ˜๊ธฐ๋‚˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

pipe = pipeline(task="image-feature-extraction", model_name="google/vit-base-patch16-224", device=DEVICE)
output = pipe(image_real)

์•„์ง ์ถœ๋ ฅ์ด ํ’€๋ง๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ฒซ ๋ฒˆ์งธ ์ฐจ์›์€ ๋ฐฐ์น˜ ํฌ๊ธฐ์ด๊ณ  ๋งˆ์ง€๋ง‰ ๋‘ ์ฐจ์›์€ ์ž„๋ฒ ๋”ฉ ํ˜•ํƒœ์ธ ๋งˆ์ง€๋ง‰ ์€๋‹‰ ์ƒํƒœ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import numpy as np
print(np.array(outputs).shape)
# (1, 197, 768)

AutoModel์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ง•๊ณผ ์œ ์‚ฌ์„ฑ ์–ป๊ธฐ[[getting-features-and-similarities-using-automodel]]

transformers์˜ AutoModel ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ง•์„ ์–ป์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. AutoModel์€ ์ž‘์—… ํŠนํ™” ํ—ค๋“œ ์—†์ด ๋ชจ๋“  transformers ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ํŠน์ง•์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers import AutoImageProcessor, AutoModel

processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = AutoModel.from_pretrained("google/vit-base-patch16-224").to(DEVICE)

์ถ”๋ก ์„ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ ํ•จ์ˆ˜๋ฅผ ์ž‘์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋จผ์ € ์ž…๋ ฅ๊ฐ’์„ processor์— ์ „๋‹ฌํ•œ ๋‹ค์Œ, ๊ทธ ์ถœ๋ ฅ๊ฐ’์„ model์— ์ „๋‹ฌํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

def infer(image):
  inputs = processor(image, return_tensors="pt").to(DEVICE)
  outputs = model(**inputs)
  return outputs.pooler_output

์ด ํ•จ์ˆ˜์— ์ด๋ฏธ์ง€๋ฅผ ์ง์ ‘ ์ „๋‹ฌํ•˜์—ฌ ์ž„๋ฒ ๋”ฉ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

embed_real = infer(image_real)
embed_gen = infer(image_gen)

๊ทธ๋ฆฌ๊ณ  ์ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์‹œ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from torch.nn.functional import cosine_similarity

similarity_score = cosine_similarity(embed_real, embed_gen, dim=1)
print(similarity_score)

# tensor([0.6061], device='cuda:0', grad_fn=<SumBackward1>)