Update README.md

f6ef9c5 verified 11 months ago

2.09 kB

language:
  - ko
  - en
license: mit
metrics:
  - recall
base_model:
  - google/siglip2-base-patch16-224
tags:
  - zero-shot-image-classification

silgip2-base-patch16-224-ko

google/siglip2-base-patch16-224 모델을 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation기반으로 학습해서 한국어 이해능력을 강화한 Siglip2 모델입니다.

사용된 학습 데이터 : aihub english-korean parallel dataset

사용된 평가 데이터 : ms-koko caption english korean dataset

How to use

import requests
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor

repo = "hyunlord/siglip2-base-patch16-224-ko"
model = AutoModel.from_pretrained(repo)
processor = AutoProcessor.from_pretrained(repo)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

texts = ["고양이 한 마리", 
         "고양이 두 마리",
         "분홍색 소파에 드러누운 고양이 친구들",
         "리모컨과 고양이 두마리",
         "리모컨 두 개와 고양이 두마리",
         "분홍색 소파 위에 리모컨 두 개와 드러누운 고양이 두마리"]
inputs = processor(text=texts,
                   images=image,
                   padding="max_length",
                   max_length=64,
                   return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image)

>>> probs
tensor([[0.0038, 0.0429, 0.8294, 0.9787, 0.9816, 0.9990]])

MS-COCO Caption Evaluation

Model	Parameter Size	(En) I-T Recall@1	(En) T-I Recall@1	(Ko) I-T Recall@1	(Ko) T-I Recall@1
google/siglip2-base-patch16-224	375,187,970	65.20%	48.29%	45.68%	25.44%
google/siglip2-so400m-patch14-384	1,136,008,498	67.74%	52.04%	52.36%	31.59%
hyunlord/siglip2-base-patch16-224-ko	375,187,970	65.54%	47.99%	57.24%	36.55%