File size: 2,778 Bytes
21b4eed
 
3e6a4eb
21b4eed
 
 
 
 
 
 
 
 
3e6a4eb
 
66fbd92
3e6a4eb
66fbd92
 
 
3e6a4eb
66fbd92
3e6a4eb
66fbd92
 
 
 
 
3e6a4eb
 
 
 
 
 
 
 
 
66fbd92
3e6a4eb
66fbd92
 
 
 
3e6a4eb
66fbd92
3e6a4eb
66fbd92
 
 
 
 
 
3e6a4eb
66fbd92
c39886f
66fbd92
 
 
 
 
 
 
 
3e6a4eb
 
 
66fbd92
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
title: Computer Vision Classification Model Comparison
emoji: "\U0001F4CA"
colorFrom: purple
colorTo: gray
sdk: gradio
sdk_version: 6.11.0
app_file: app.py
pinned: false
short_description: 'Block 2 '
---

# CIFAR-10 Image Classification — Model Comparison

This app compares 3 image classification approaches on CIFAR-10 images:

- Fine-tuned ViT model [(`adisaljusi/cifar10-vit`)](https://huggingface.co/adisaljusi/cifar10-vit)
- Zero-shot CLIP (`openai/clip-vit-large-patch14`)
- OpenAI vision model (`gpt-4.1-mini`)

## Dataset Used For Training

- Hugging Face dataset loader: `load_dataset("uoft-cs/cifar10")`
- Dataset reference: https://huggingface.co/datasets/uoft-cs/cifar10
- Number of classes: `10` (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck)
- Training subset: 8,000 images (from 50,000 total)
- Test subset: 2,000 images (from 10,000 total)

## Preprocessing

- Resize from 32x32 to 224x224 (ViT input size)
- Normalize pixel values with mean=0.5, std=0.5 per channel
- Convert all images to RGB

Applied using `AutoImageProcessor` from `google/vit-base-patch16-224`.

## Trained Model

- Hugging Face model link: [https://huggingface.co/adisaljusi/cifar10-vit](https://huggingface.co/adisaljusi/cifar10-vit)
- Base model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)
- Transfer learning: all layers frozen except the classification head (7,690 of 85.8M parameters trainable)
- Training config: 4 epochs, batch size 32, learning rate 2e-4, warmup ratio 0.1, weight decay 0.01, AdamW optimizer

## Training Performance

| Training Loss | Epoch | Validation Loss | Accuracy |
|--------------:|------:|----------------:|---------:|
| 0.2316 | 1 | 0.2161 | 94.95% |
| 0.1551 | 2 | 0.1516 | 95.65% |
| 0.1230 | 3 | 0.1390 | 95.80% |
| 0.1097 | 4 | 0.1363 | 95.95% |

## Example Image Results

| Image | True Class | ViT Top-1 (score) | CLIP Top-1 (score) | OpenAI LLM (label, confidence) |
|---|---|---|---|---|
| `airplane.jpg` | `airplane` | `airplane` (0.675) | `airplane` (0.900) | `bird` (0.75) |
| `automobile.jpg` | `automobile` | `automobile` (0.656) | `automobile` (0.952) | `automobile` (0.85) |
| `cat.jpg` | `cat` | `cat` (0.954) | `cat` (0.536) | `cat` (0.85) |
| `dog.jpg` | `dog` | `dog` (0.988) | `dog` (0.936) | `dog` (0.85) |
| `horse.jpg` | `horse` | `horse` (0.998) | `horse` (0.990) | `horse` (0.95) |
| `ship.jpg` | `ship` | `ship` (0.989) | `ship` (0.996) | `ship` (0.95) |

## Links

- Model: [https://huggingface.co/adisaljusi/cifar10-vit](https://huggingface.co/adisaljusi/cifar10-vit)
- App: [https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison](https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison)