---
library_name: transformers
tags:
- transformers
- pipeline
- vision
- image-classification
- vit
- imagenet-1k
license: apache-2.0
datasets:
- ILSVRC/imagenet-1k
base_model:
- google/vit-base-patch16-224
pipeline_tag: image-classification
---

# Model Card for tmp-pl-image-classification

이 저장소는 🤗 Transformers의 `pipeline()` 동작을 이해하고 연습하기 위한 **학습용(pipeline practice) 모델 repo** 입니다.  
모델 가중치는 원본 모델 **`google/vit-base-patch16-224`** 을 그대로 사용하며, 추가적인 fine-tuning은 수행하지 않았습니다.

---

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->
본 모델은 **Vision Transformer(ViT)** 기반 이미지 분류 모델을 `pipeline("image-classification")` 형태로  
Hub에 업로드하고 다시 불러오는 전체 흐름을 실습하기 위해 구성됨.


- **Developed by:** Google Research (원본모델)
- **Shared by [optional]:** dsaint31
- **Model type:** Image Classification (Vision Transformer)
- **Language(s) (NLP):** 해당 없음 (이미지 입력)
- **License:** Apache-2.0
- **Finetuned from model [optional]:** google/vit-base-patch16-224 (가중치 변경 없음. fine-tuning 미수행)

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Baee model Repository:** [https://huggingface.co/google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)
- **Paper [optional]:** [Dosovitskiy et al., *An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale*, arXiv:2010.11929](https://arxiv.org/abs/2010.11929)
- **Demo [optional]:** None

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

- `pipeline("image-classification", model=...)` 사용법 실습
- Hugging Face Hub에 pipeline 형태로 모델을 업로드 / 다운로드하는 흐름 이해
- Vision 모델과 pipeline의 관계 학습

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

- 본 repo 자체는 downstream task를 위한 fine-tuning을 목적으로 하지 않습니다.
- 학습 또는 성능 비교 목적이라면 **원본 모델 repo**를 직접 사용하는 것이 적절합니다.

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

- 모델 성능 평가 또는 벤치마크
- 실제 서비스 환경에서의 모델 배포
- 특정 도메인(의료, 산업 영상 등)에 대한 신뢰성 있는 추론

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

- 본 모델은 ImageNet 기반 데이터로 학습된 일반 목적 이미지 분류 모델의 특성을 그대로 가집니다.
- 특정 객체, 문화적 맥락, 전문 도메인에 대한 분류 성능은 보장되지 않습니다.
- 본 repo는 **연습용 pipeline 저장소**이므로 모델의 사회적 영향이나 편향 분석을 목적으로 하지 않습니다.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

- 실제 사용 목적이 있는 경우, 원본 모델 카드(`google/vit-base-patch16-224`)의 제한 사항을 반드시 참고하십시오.
- 이 repo는 학습 및 실습 목적에 한해 사용하기를 권장합니다.

## How to Get Started with the Model

Use the code below to get started with the model.

아래 예제는 Hugging Face `pipeline`을 이용해 본 모델을 로드하고 이미지를 분류하는 최소 예제입니다.

```python
from transformers import pipeline
from PIL import Image
import requests

img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
image = Image.open(requests.get(img_url, stream=True).raw)

clf = pipeline(
    task="image-classification",
    model="dsaint31/tmp-pl-image-classification",
)

print(clf(image))
```

[More Information Needed]

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

* 본 repo에서는 **추가 학습을 수행하지 않았습니다.**
* 원본 모델은 ImageNet-21k로 사전학습(pretraining) 후 ImageNet-1k로 fine-tuning된 모델입니다.


### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

* 원본 ViT 모델의 기본 이미지 전처리(Image Processor)를 그대로 사용합니다.


[More Information Needed]


#### Training Hyperparameters

- **Training regime:** 해당 없음(학습 미수행) [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->


#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

* 본 repo에서는 별도의 평가를 수행하지 않았습니다.
* 성능 지표는 원본 모델 카드의 평가 결과를 참고하십시오.

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary


## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

본 repo에서는 학습을 수행하지 않았으므로 추가적인 환경적 영향은 없습니다.
* 원본 모델 학습에 대한 환경 영향은 base model 문서를 참고하십시오.

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

* Vision Transformer (ViT-Base, patch size 16, input resolution 224x224)
* Objective: Image classification


### Compute Infrastructure

[More Information Needed]

#### Hardware

* 해당없음 (학습 미수행)

#### Software

* Transformers
* Pillow
* PyTorch

[More Information Needed]

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

원본 모델 인용 시 아래 논문을 참고하십시오.

**BibTeX:**

```bibtex
@article{dosovitskiy2020image,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and others},
  journal={arXiv preprint arXiv:2010.11929},
  year={2020}
}
```


**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

* dsaint31 (pipeline practice repository)


## Model Card Contact

* Hugging Face profile: [https://huggingface.co/dsaint31](https://huggingface.co/dsaint31)