Spaces:
Running
Running
| # Big Cat Classification Comparison App | |
| This project compares two image classification approaches on big cat images: | |
| - Fine-tuned ViT model (custom trained) | |
| - Zero-shot CLIP model (`openai/clip-vit-base-patch32`) | |
| --- | |
| ## Dataset Used For Training | |
| The dataset consists of images of five big cat species: | |
| - cheetah | |
| - leopard | |
| - lion | |
| - puma | |
| - tiger | |
| The images are organized using the `imagefolder` structure, where each class has its own folder. | |
| The dataset was used to train a custom image classification model using transfer learning. | |
| --- | |
| ## Preprocessing | |
| The following preprocessing steps were applied: | |
| - Images were loaded using the Hugging Face `imagefolder` format | |
| - Images were converted to RGB | |
| - Images were resized automatically using the ViT image processor | |
| - Labels were mapped to numerical IDs for training | |
| --- | |
| ## Model and Evaluation | |
| A Vision Transformer (ViT) model was fine-tuned on the custom dataset. | |
| The model was evaluated using example images and compared with CLIP. | |
| ### Accuracy | |
| - **Custom Model Accuracy:** 1.00 | |
| - **CLIP Accuracy:** 1.00 | |
| --- | |
| ## Example Image Results | |
| | Image | True Class | Custom Model (score) | CLIP (score) | | |
| |---|---|---|---| | |
| | Cheetah_032.jpg | cheetah | cheetah (0.53) | cheetah (0.83) | | |
| | Leopard_001.jpg | leopard | leopard (0.51) | leopard (0.92) | | |
| | Lion_003.jpg | lion | lion (0.54) | lion (0.99) | | |
| | Puma_001.jpg | puma | puma (0.61) | puma (1.00) | | |
| | Tiger_001.jpg | tiger | tiger (0.70) | tiger (0.99) | | |
| --- | |
| ## Comparison Summary | |
| Both the custom ViT model and CLIP achieved perfect accuracy (100%) on the test images. | |
| The custom model shows slightly lower confidence scores compared to CLIP, but still predicts all classes correctly. | |
| CLIP provides very high confidence predictions and performs strongly even without task-specific training. | |
| ### Summary | |
| - **Best task-specific model:** Custom ViT model | |
| - **Best open-source baseline:** CLIP | |
| --- | |
| ## Links to Model and App | |
| - Hugging Face Model: | |
| https://huggingface.co/DKatheesrupan/aufgabe2 | |
| - Hugging Face Space (App): | |
| https://huggingface.co/spaces/DKatheesrupan/Exercise2 | |
| --- | |
| ## Application | |
| The application allows users to: | |
| - upload an image | |
| - test the custom model | |
| - compare predictions with CLIP | |
| - use example images directly | |
| This enables a direct comparison between trained and zero-shot models. |