--- title: Computer Vision Classification Model Comparison emoji: "\U0001F4CA" colorFrom: purple colorTo: gray sdk: gradio sdk_version: 6.11.0 app_file: app.py pinned: false short_description: 'Block 2 ' --- # CIFAR-10 Image Classification — Model Comparison This app compares 3 image classification approaches on CIFAR-10 images: - Fine-tuned ViT model [(`adisaljusi/cifar10-vit`)](https://huggingface.co/adisaljusi/cifar10-vit) - Zero-shot CLIP (`openai/clip-vit-large-patch14`) - OpenAI vision model (`gpt-4.1-mini`) ## Dataset Used For Training - Hugging Face dataset loader: `load_dataset("uoft-cs/cifar10")` - Dataset reference: https://huggingface.co/datasets/uoft-cs/cifar10 - Number of classes: `10` (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) - Training subset: 8,000 images (from 50,000 total) - Test subset: 2,000 images (from 10,000 total) ## Preprocessing - Resize from 32x32 to 224x224 (ViT input size) - Normalize pixel values with mean=0.5, std=0.5 per channel - Convert all images to RGB Applied using `AutoImageProcessor` from `google/vit-base-patch16-224`. ## Trained Model - Hugging Face model link: [https://huggingface.co/adisaljusi/cifar10-vit](https://huggingface.co/adisaljusi/cifar10-vit) - Base model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) - Transfer learning: all layers frozen except the classification head (7,690 of 85.8M parameters trainable) - Training config: 4 epochs, batch size 32, learning rate 2e-4, warmup ratio 0.1, weight decay 0.01, AdamW optimizer ## Training Performance | Training Loss | Epoch | Validation Loss | Accuracy | |--------------:|------:|----------------:|---------:| | 0.2316 | 1 | 0.2161 | 94.95% | | 0.1551 | 2 | 0.1516 | 95.65% | | 0.1230 | 3 | 0.1390 | 95.80% | | 0.1097 | 4 | 0.1363 | 95.95% | ## Example Image Results | Image | True Class | ViT Top-1 (score) | CLIP Top-1 (score) | OpenAI LLM (label, confidence) | |---|---|---|---|---| | `airplane.jpg` | `airplane` | `airplane` (0.675) | `airplane` (0.900) | `bird` (0.75) | | `automobile.jpg` | `automobile` | `automobile` (0.656) | `automobile` (0.952) | `automobile` (0.85) | | `cat.jpg` | `cat` | `cat` (0.954) | `cat` (0.536) | `cat` (0.85) | | `dog.jpg` | `dog` | `dog` (0.988) | `dog` (0.936) | `dog` (0.85) | | `horse.jpg` | `horse` | `horse` (0.998) | `horse` (0.990) | `horse` (0.95) | | `ship.jpg` | `ship` | `ship` (0.989) | `ship` (0.996) | `ship` (0.95) | ## Links - Model: [https://huggingface.co/adisaljusi/cifar10-vit](https://huggingface.co/adisaljusi/cifar10-vit) - App: [https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison](https://huggingface.co/spaces/adisaljusi/computer-vision-classification-model-comparison)