Spaces:
Sleeping
Sleeping
Create readme.md
Browse filesUpdated temp readme
readme.md
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Cat Breed Classification & Model Comparison
|
| 2 |
+
|
| 3 |
+
## Project Overview
|
| 4 |
+
|
| 5 |
+
This project presents a computer vision application that classifies images of different cat breeds.
|
| 6 |
+
|
| 7 |
+
The goal is to compare three approaches to image classification:
|
| 8 |
+
1. A fine-tuned Vision Transformer (ViT) model trained on a custom dataset
|
| 9 |
+
2. A zero-shot CLIP model (open-source)
|
| 10 |
+
3. An OpenAI vision model (closed-source)
|
| 11 |
+
|
| 12 |
+
The application is deployed as a Hugging Face Space and allows users to upload images or select example images.
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
## Dataset Description
|
| 16 |
+
|
| 17 |
+
### The dataset consists of images from seven cat breeds:
|
| 18 |
+
- Sphynx
|
| 19 |
+
- Russian Blue
|
| 20 |
+
- Maine Coon
|
| 21 |
+
- Ragdoll
|
| 22 |
+
- Bengal
|
| 23 |
+
- Singapura
|
| 24 |
+
- Calico Cat
|
| 25 |
+
|
| 26 |
+
### Dataset characteristics:
|
| 27 |
+
- Number of classes: 7
|
| 28 |
+
- Images per class: ~[fill in]
|
| 29 |
+
- Total images: ~[fill in]
|
| 30 |
+
|
| 31 |
+
### Data sources:
|
| 32 |
+
- Public datasets (Kaggle / Hugging Face)
|
| 33 |
+
- Manually collected images
|
| 34 |
+
|
| 35 |
+
### Split:
|
| 36 |
+
- Training: 80%
|
| 37 |
+
- Validation/Test: 20%
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
## Preprocessing Steps
|
| 41 |
+
- Resize images to 224 × 224
|
| 42 |
+
- Convert to RGB
|
| 43 |
+
- Remove corrupted images
|
| 44 |
+
- Normalize using model-specific values
|
| 45 |
+
|
| 46 |
+
## Data Augmentation
|
| 47 |
+
- Random horizontal flip
|
| 48 |
+
- Random rotation
|
| 49 |
+
- Optional brightness/contrast adjustments
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
## Model and Training
|
| 53 |
+
|
| 54 |
+
### Fine-Tuned Model
|
| 55 |
+
- Base model: google/vit-base-patch16-224
|
| 56 |
+
- Approach: Transfer learning + fine-tuning
|
| 57 |
+
- Output classes: 7
|
| 58 |
+
|
| 59 |
+
### Training settings:
|
| 60 |
+
- Epochs: [e.g. 5–10]
|
| 61 |
+
- Batch size: [e.g. 16]
|
| 62 |
+
- Learning rate: [e.g. 2e-5]
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
### Links
|
| 66 |
+
- Hugging Face Space: [ADD LINK]
|
| 67 |
+
- Hugging Face Model: [ADD LINK]
|