Model Card for tmp-pl-image-classification

이 저장소는 🤗 Transformers의 pipeline() 동작을 이해하고 연습하기 위한 학습용(pipeline practice) 모델 repo 입니다.
모델 가중치는 원본 모델 google/vit-base-patch16-224 을 그대로 사용하며, 추가적인 fine-tuning은 수행하지 않았습니다.

Model Details

Model Description

본 모델은 Vision Transformer(ViT) 기반 이미지 분류 모델을 pipeline("image-classification") 형태로
Hub에 업로드하고 다시 불러오는 전체 흐름을 실습하기 위해 구성됨.

Developed by: Google Research (원본모델)
Shared by [optional]: dsaint31
Model type: Image Classification (Vision Transformer)
Language(s) (NLP): 해당 없음 (이미지 입력)
License: Apache-2.0
Finetuned from model [optional]: google/vit-base-patch16-224 (가중치 변경 없음. fine-tuning 미수행)

Model Sources [optional]

Baee model Repository: https://huggingface.co/google/vit-base-patch16-224
Paper [optional]: Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, arXiv:2010.11929
Demo [optional]: None

Uses

Direct Use

pipeline("image-classification", model=...) 사용법 실습
Hugging Face Hub에 pipeline 형태로 모델을 업로드 / 다운로드하는 흐름 이해
Vision 모델과 pipeline의 관계 학습

Downstream Use [optional]

본 repo 자체는 downstream task를 위한 fine-tuning을 목적으로 하지 않습니다.
학습 또는 성능 비교 목적이라면 원본 모델 repo를 직접 사용하는 것이 적절합니다.

Out-of-Scope Use

모델 성능 평가 또는 벤치마크
실제 서비스 환경에서의 모델 배포
특정 도메인(의료, 산업 영상 등)에 대한 신뢰성 있는 추론

Bias, Risks, and Limitations

본 모델은 ImageNet 기반 데이터로 학습된 일반 목적 이미지 분류 모델의 특성을 그대로 가집니다.
특정 객체, 문화적 맥락, 전문 도메인에 대한 분류 성능은 보장되지 않습니다.
본 repo는 연습용 pipeline 저장소이므로 모델의 사회적 영향이나 편향 분석을 목적으로 하지 않습니다.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

실제 사용 목적이 있는 경우, 원본 모델 카드(google/vit-base-patch16-224)의 제한 사항을 반드시 참고하십시오.
이 repo는 학습 및 실습 목적에 한해 사용하기를 권장합니다.

How to Get Started with the Model

Use the code below to get started with the model.

아래 예제는 Hugging Face pipeline을 이용해 본 모델을 로드하고 이미지를 분류하는 최소 예제입니다.

from transformers import pipeline
from PIL import Image
import requests

img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
image = Image.open(requests.get(img_url, stream=True).raw)

clf = pipeline(
    task="image-classification",
    model="dsaint31/tmp-pl-image-classification",
)

print(clf(image))

[More Information Needed]

Training Details

Training Data

본 repo에서는 추가 학습을 수행하지 않았습니다.
원본 모델은 ImageNet-21k로 사전학습(pretraining) 후 ImageNet-1k로 fine-tuning된 모델입니다.

Training Procedure

Preprocessing [optional]

원본 ViT 모델의 기본 이미지 전처리(Image Processor)를 그대로 사용합니다.

[More Information Needed]

Training Hyperparameters

Training regime: 해당 없음(학습 미수행) [More Information Needed]

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

본 repo에서는 별도의 평가를 수행하지 않았습니다.
성능 지표는 원본 모델 카드의 평가 결과를 참고하십시오.

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

본 repo에서는 학습을 수행하지 않았으므로 추가적인 환경적 영향은 없습니다.

원본 모델 학습에 대한 환경 영향은 base model 문서를 참고하십시오.

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

Vision Transformer (ViT-Base, patch size 16, input resolution 224x224)
Objective: Image classification

Compute Infrastructure

[More Information Needed]

Hardware

해당없음 (학습 미수행)

Software

Transformers
Pillow
PyTorch

[More Information Needed]

Citation [optional]

원본 모델 인용 시 아래 논문을 참고하십시오.

BibTeX:

@article{dosovitskiy2020image,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and others},
  journal={arXiv preprint arXiv:2010.11929},
  year={2020}
}

APA:

[More Information Needed]