Update README.md

af2273f verified 15 days ago

8.59 kB

	---
	library_name: transformers
	tags:
	- transformers
	- pipeline
	- vision
	- image-classification
	- vit
	- imagenet-1k
	license: apache-2.0
	datasets:
	- ILSVRC/imagenet-1k
	base_model:
	- google/vit-base-patch16-224
	pipeline_tag: image-classification
	---

	# Model Card for tmp-pl-image-classification

	이 저장소는 🤗 Transformers의 `pipeline()` 동작을 이해하고 연습하기 위한 학습용(pipeline practice) 모델 repo 입니다.
	모델 가중치는 원본 모델 `google/vit-base-patch16-224` 을 그대로 사용하며, 추가적인 fine-tuning은 수행하지 않았습니다.

	---

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	본 모델은 Vision Transformer(ViT) 기반 이미지 분류 모델을 `pipeline("image-classification")` 형태로
	Hub에 업로드하고 다시 불러오는 전체 흐름을 실습하기 위해 구성됨.


	- Developed by: Google Research (원본모델)
	- Shared by [optional]: dsaint31
	- Model type: Image Classification (Vision Transformer)
	- Language(s) (NLP): 해당 없음 (이미지 입력)
	- License: Apache-2.0
	- Finetuned from model [optional]: google/vit-base-patch16-224 (가중치 변경 없음. fine-tuning 미수행)

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Baee model Repository: [https://huggingface.co/google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)
	- Paper [optional]: [Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, arXiv:2010.11929](https://arxiv.org/abs/2010.11929)
	- Demo [optional]: None

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	- `pipeline("image-classification", model=...)` 사용법 실습
	- Hugging Face Hub에 pipeline 형태로 모델을 업로드 / 다운로드하는 흐름 이해
	- Vision 모델과 pipeline의 관계 학습

	### Downstream Use [optional]

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	- 본 repo 자체는 downstream task를 위한 fine-tuning을 목적으로 하지 않습니다.
	- 학습 또는 성능 비교 목적이라면 원본 모델 repo를 직접 사용하는 것이 적절합니다.

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	- 모델 성능 평가 또는 벤치마크
	- 실제 서비스 환경에서의 모델 배포
	- 특정 도메인(의료, 산업 영상 등)에 대한 신뢰성 있는 추론

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	- 본 모델은 ImageNet 기반 데이터로 학습된 일반 목적 이미지 분류 모델의 특성을 그대로 가집니다.
	- 특정 객체, 문화적 맥락, 전문 도메인에 대한 분류 성능은 보장되지 않습니다.
	- 본 repo는 연습용 pipeline 저장소이므로 모델의 사회적 영향이나 편향 분석을 목적으로 하지 않습니다.

	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

	- 실제 사용 목적이 있는 경우, 원본 모델 카드(`google/vit-base-patch16-224`)의 제한 사항을 반드시 참고하십시오.
	- 이 repo는 학습 및 실습 목적에 한해 사용하기를 권장합니다.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	아래 예제는 Hugging Face `pipeline`을 이용해 본 모델을 로드하고 이미지를 분류하는 최소 예제입니다.

	```python
	from transformers import pipeline
	from PIL import Image
	import requests

	img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
	image = Image.open(requests.get(img_url, stream=True).raw)

	clf = pipeline(
	task="image-classification",
	model="dsaint31/tmp-pl-image-classification",
	)

	print(clf(image))
	```

	[More Information Needed]

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	* 본 repo에서는 추가 학습을 수행하지 않았습니다.
	* 원본 모델은 ImageNet-21k로 사전학습(pretraining) 후 ImageNet-1k로 fine-tuning된 모델입니다.


	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Preprocessing [optional]

	* 원본 ViT 모델의 기본 이미지 전처리(Image Processor)를 그대로 사용합니다.


	[More Information Needed]


	#### Training Hyperparameters

	- Training regime: 해당 없음(학습 미수행) [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->



	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->


	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	* 본 repo에서는 별도의 평가를 수행하지 않았습니다.
	* 성능 지표는 원본 모델 카드의 평가 결과를 참고하십시오.

	[More Information Needed]

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

	[More Information Needed]

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	#### Summary



	## Model Examination [optional]

	<!-- Relevant interpretability work for the model goes here -->

	[More Information Needed]

	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	본 repo에서는 학습을 수행하지 않았으므로 추가적인 환경적 영향은 없습니다.
	* 원본 모델 학습에 대한 환경 영향은 base model 문서를 참고하십시오.

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: [More Information Needed]
	- Hours used: [More Information Needed]
	- Cloud Provider: [More Information Needed]
	- Compute Region: [More Information Needed]
	- Carbon Emitted: [More Information Needed]

	## Technical Specifications [optional]

	### Model Architecture and Objective

	* Vision Transformer (ViT-Base, patch size 16, input resolution 224x224)
	* Objective: Image classification


	### Compute Infrastructure

	[More Information Needed]

	#### Hardware

	* 해당없음 (학습 미수행)

	#### Software

	* Transformers
	* Pillow
	* PyTorch

	[More Information Needed]

	## Citation [optional]

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	원본 모델 인용 시 아래 논문을 참고하십시오.

	BibTeX:

	```bibtex
	@article{dosovitskiy2020image,
	title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
	author={Dosovitskiy, Alexey and others},
	journal={arXiv preprint arXiv:2010.11929},
	year={2020}
	}
	```


	APA:

	[More Information Needed]

	## Glossary [optional]

	<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

	[More Information Needed]

	## More Information [optional]

	[More Information Needed]

	## Model Card Authors [optional]

	* dsaint31 (pipeline practice repository)


	## Model Card Contact

	* Hugging Face profile: [https://huggingface.co/dsaint31](https://huggingface.co/dsaint31)