| --- |
| title: Computer Vision Classification Model Comparison |
| emoji: "\U0001F4CA" |
| colorFrom: purple |
| colorTo: gray |
| sdk: gradio |
| sdk_version: 6.11.0 |
| app_file: app.py |
| pinned: false |
| short_description: 'Block 2 ' |
| --- |
| |
| # CIFAR-10 Image Classification — Model Comparison |
|
|
| This app compares 3 image classification approaches on CIFAR-10 images: |
|
|
| - Fine-tuned ViT model [(`adisaljusi/cifar10-vit`)](https://huggingface.co/adisaljusi/cifar10-vit) |
| - Zero-shot CLIP (`openai/clip-vit-large-patch14`) |
| - OpenAI vision model (`gpt-4.1-mini`) |
|
|
| ## Dataset Used For Training |
|
|
| - Hugging Face dataset loader: `load_dataset("uoft-cs/cifar10")` |
| - Dataset reference: https://huggingface.co/datasets/uoft-cs/cifar10 |
| - Number of classes: `10` (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) |
| - Training subset: 8,000 images (from 50,000 total) |
| - Test subset: 2,000 images (from 10,000 total) |
|
|
| ## Preprocessing |
|
|
| - Resize from 32x32 to 224x224 (ViT input size) |
| - Normalize pixel values with mean=0.5, std=0.5 per channel |
| - Convert all images to RGB |
|
|
| Applied using `AutoImageProcessor` from `google/vit-base-patch16-224`. |
|
|
| ## Trained Model |
|
|
| - Hugging Face model link: [https://huggingface.co/adisaljusi/cifar10-vit](https://huggingface.co/adisaljusi/cifar10-vit) |
| - Base model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) |
| - Transfer learning: all layers frozen except the classification head (7,690 of 85.8M parameters trainable) |
| - Training config: 4 epochs, batch size 32, learning rate 2e-4, warmup ratio 0.1, weight decay 0.01, AdamW optimizer |
|
|
| ## Training Performance |
|
|
| | Training Loss | Epoch | Validation Loss | Accuracy | |
| |--------------:|------:|----------------:|---------:| |
| | 0.2316 | 1 | 0.2161 | 94.95% | |
| | 0.1551 | 2 | 0.1516 | 95.65% | |
| | 0.1230 | 3 | 0.1390 | 95.80% | |
| | 0.1097 | 4 | 0.1363 | 95.95% | |
|
|
| ## Example Image Results |
|
|
| | Image | True Class | ViT Top-1 (score) | CLIP Top-1 (score) | OpenAI LLM (label, confidence) | |
| |---|---|---|---|---| |
| | `airplane.jpg` | `airplane` | `airplane` (0.675) | `airplane` (0.900) | `bird` (0.75) | |
| | `automobile.jpg` | `automobile` | `automobile` (0.656) | `automobile` (0.952) | `automobile` (0.85) | |
| | `cat.jpg` | `cat` | `cat` (0.954) | `cat` (0.536) | `cat` (0.85) | |
| | `dog.jpg` | `dog` | `dog` (0.988) | `dog` (0.936) | `dog` (0.85) | |
| | `horse.jpg` | `horse` | `horse` (0.998) | `horse` (0.990) | `horse` (0.95) | |
| | `ship.jpg` | `ship` | `ship` (0.989) | `ship` (0.996) | `ship` (0.95) | |
|
|
| ## Links |
|
|
| - Model: [https://huggingface.co/adisaljusi/cifar10-vit](https://huggingface.co/adisaljusi/cifar10-vit) |
| - App: [https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison](https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison) |
|
|