Spaces:

DKatheesrupan
/

Exercise2

Running

App Files Files Community

Exercise2 / readme.md

DKatheesrupan

Create readme.md

a455a79 verified about 1 month ago

preview code

raw

history blame contribute delete

2.37 kB

	# Big Cat Classification Comparison App

	This project compares two image classification approaches on big cat images:

	- Fine-tuned ViT model (custom trained)
	- Zero-shot CLIP model (`openai/clip-vit-base-patch32`)

	---

	## Dataset Used For Training

	The dataset consists of images of five big cat species:

	- cheetah
	- leopard
	- lion
	- puma
	- tiger

	The images are organized using the `imagefolder` structure, where each class has its own folder.

	The dataset was used to train a custom image classification model using transfer learning.

	---

	## Preprocessing

	The following preprocessing steps were applied:

	- Images were loaded using the Hugging Face `imagefolder` format
	- Images were converted to RGB
	- Images were resized automatically using the ViT image processor
	- Labels were mapped to numerical IDs for training

	---

	## Model and Evaluation

	A Vision Transformer (ViT) model was fine-tuned on the custom dataset.

	The model was evaluated using example images and compared with CLIP.

	### Accuracy

	- Custom Model Accuracy: 1.00
	- CLIP Accuracy: 1.00

	---

	## Example Image Results

	\| Image \| True Class \| Custom Model (score) \| CLIP (score) \|
	\|---\|---\|---\|---\|
	\| Cheetah_032.jpg \| cheetah \| cheetah (0.53) \| cheetah (0.83) \|
	\| Leopard_001.jpg \| leopard \| leopard (0.51) \| leopard (0.92) \|
	\| Lion_003.jpg \| lion \| lion (0.54) \| lion (0.99) \|
	\| Puma_001.jpg \| puma \| puma (0.61) \| puma (1.00) \|
	\| Tiger_001.jpg \| tiger \| tiger (0.70) \| tiger (0.99) \|

	---

	## Comparison Summary

	Both the custom ViT model and CLIP achieved perfect accuracy (100%) on the test images.

	The custom model shows slightly lower confidence scores compared to CLIP, but still predicts all classes correctly.

	CLIP provides very high confidence predictions and performs strongly even without task-specific training.

	### Summary

	- Best task-specific model: Custom ViT model
	- Best open-source baseline: CLIP

	---

	## Links to Model and App

	- Hugging Face Model:
	https://huggingface.co/DKatheesrupan/aufgabe2

	- Hugging Face Space (App):
	https://huggingface.co/spaces/DKatheesrupan/Exercise2

	---

	## Application

	The application allows users to:

	- upload an image
	- test the custom model
	- compare predictions with CLIP
	- use example images directly

	This enables a direct comparison between trained and zero-shot models.