hyunlord's picture
Update README.md
f6ef9c5 verified
metadata
language:
  - ko
  - en
license: mit
metrics:
  - recall
base_model:
  - google/siglip2-base-patch16-224
tags:
  - zero-shot-image-classification

silgip2-base-patch16-224-ko

google/siglip2-base-patch16-224 ๋ชจ๋ธ์„ Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•ด์„œ ํ•œ๊ตญ์–ด ์ดํ•ด๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•œ Siglip2 ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉ๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ : aihub english-korean parallel dataset

์‚ฌ์šฉ๋œ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ : ms-koko caption english korean dataset

How to use

import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor

repo = "hyunlord/siglip2-base-patch16-224-ko"
model = AutoModel.from_pretrained(repo)
processor = AutoProcessor.from_pretrained(repo)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

texts = ["๊ณ ์–‘์ด ํ•œ ๋งˆ๋ฆฌ", 
         "๊ณ ์–‘์ด ๋‘ ๋งˆ๋ฆฌ",
         "๋ถ„ํ™์ƒ‰ ์†ŒํŒŒ์— ๋“œ๋Ÿฌ๋ˆ„์šด ๊ณ ์–‘์ด ์นœ๊ตฌ๋“ค",
         "๋ฆฌ๋ชจ์ปจ๊ณผ ๊ณ ์–‘์ด ๋‘๋งˆ๋ฆฌ",
         "๋ฆฌ๋ชจ์ปจ ๋‘ ๊ฐœ์™€ ๊ณ ์–‘์ด ๋‘๋งˆ๋ฆฌ",
         "๋ถ„ํ™์ƒ‰ ์†ŒํŒŒ ์œ„์— ๋ฆฌ๋ชจ์ปจ ๋‘ ๊ฐœ์™€ ๋“œ๋Ÿฌ๋ˆ„์šด ๊ณ ์–‘์ด ๋‘๋งˆ๋ฆฌ"]
inputs = processor(text=texts,
                   images=image,
                   padding="max_length",
                   max_length=64,
                   return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image)
>>> probs
tensor([[0.0038, 0.0429, 0.8294, 0.9787, 0.9816, 0.9990]])

MS-COCO Caption Evaluation

Model Parameter Size (En) I-T Recall@1 (En) T-I Recall@1 (Ko) I-T Recall@1 (Ko) T-I Recall@1
google/siglip2-base-patch16-224 375,187,970 65.20% 48.29% 45.68% 25.44%
google/siglip2-so400m-patch14-384 1,136,008,498 67.74% 52.04% 52.36% 31.59%
hyunlord/siglip2-base-patch16-224-ko 375,187,970 65.54% 47.99% 57.24% 36.55%