nenzilea
/

car-classification

+# Car Brand Classification App
+This app compares 3 image classification approaches on car images:
+- Fine-tuned ViT model ([`nenzilea/car-classification`](https://huggingface.co/nenzilea/car-classification))
+- Zero-shot CLIP (`openai/clip-vit-large-patch14`)
+- OpenAI vision model (GPT-4o image classification)
+## Dataset Used For Training
+- Hugging Face dataset: `tanganke/stanford_cars`
+- The Stanford Cars dataset contains 196 fine-grained classes (car make/model/year combinations). We group them into 9 brand-level classes for a cleaner, more visually meaningful classification task.
+- Number of classes: `9`
+- Classes: `BMW`, `Dodge`, `Ferrari`, `Ford`, `Jeep`, `Lamborghini`, `Porsche`, `Rolls-Royce`, `Toyota`
+### Preprocessing Steps
+1. **Brand extraction** — each of the 196 Stanford Cars class names (e.g. `"Ferrari 458 Italia Coupe 2012"`) is mapped to one of 9 brands by substring matching.
+2. **Filtering** — images whose class does not belong to the 8 brands are removed from the dataset.
+3. **Label remapping** — original integer labels (0–195) are re-mapped to brand indices (0–8).
+4. **Train/validation/test split** — the original training split is divided 80/10/10 (train/validation/test) using `train_test_split(test_size=0.2, seed=42)`.
+5. **Image preprocessing** — images are resized to 224×224 and pixel values are normalised to [-1, 1] using `AutoImageProcessor` from `google/vit-base-patch16-224`.
+6. **RGB conversion** — all images are converted to RGB to handle any grayscale or RGBA edge cases.
+## Trained Model
+- Hugging Face model link: [https://huggingface.co/nenzilea/car-classification](https://huggingface.co/nenzilea/car-classification)
+- Base model: `google/vit-base-patch16-224`
+- Only the final classification head was fine-tuned (all other layers frozen).
+- Trainable parameters: ~4,614 out of ~85.8M total.
+## Training Performance
+| Training Loss | Epoch | Step | Validation Loss | Accuracy |
+|---:|---:|---:|---:|---:|
+| 1.368263 | 1.0 | 128 | 1.158491 | 0.5529 |
+| 1.093463 | 2.0 | 256 | 0.998960 | 0.6627 |
+| 1.005174 | 3.0 | 384 | 0.934044 | 0.6667 |
+| 0.946650 | 4.0 | 512 | 0.900367 | 0.6549 |
+| 0.886520 | 5.0 | 640 | 0.887630 | 0.6706 |
+## Hugging Face Space
+- App link: [https://huggingface.co/spaces/nenzilea/car-classification](https://huggingface.co/spaces/nenzilea/car-classification)
+## Example Image Results
+The table below reports the true class and Top-3 predictions for ViT, CLIP, and GPT-4o.
+| Image | True Class | ViT Top-3 (score) | CLIP Top-3 (score) | OpenAI GPT-4o (label, confidence) |
+|---|---|---|---|---|
+| `Dodge.jpg` | `Dodge` | BMW (0.3564), Dodge (0.2218), Rolls-Royce (0.1807) | Dodge (0.9432), Jeep (0.0393), Lamborghini (0.0078) | Dodge (1.00) |
+| `Ferrari.jpg` | `Ferrari` | Ferrari (0.6007), Lamborghini (0.2946), Ford (0.0296) | Ferrari (0.9958), Lamborghini (0.0032), Ford (0.0004) | Ferrari (1.00) |
+| `BMW.jpg` | `BMW` | BMW (0.2737), Porsche (0.1800), Dodge (0.1630) | BMW (0.9969), Porsche (0.0014), Ferrari (0.0007) | BMW (1.00) |
+| `Porsche.jpg` | `Porsche` | BMW (0.5858), Dodge (0.2040), Toyota (0.0667) | Porsche (0.9887), Lamborghini (0.0047), Dodge (0.0022) | Porsche (1.00) |
+## Model Comparison Summary
+| Model | Approach | Strengths | Weaknesses |
+|---|---|---|---|
+| **Custom ViT** | Supervised fine-tuning on 9 car brands | High accuracy on known brands | Only classifies the 9 trained brands |
+| **CLIP** | Zero-shot with brand name as text prompt | No training needed, flexible labels | Lower accuracy; may confuse visually similar brands |
+| **OpenAI GPT-4o** | LLM vision with natural language prompt | Strong reasoning, handles unusual angles | API cost, latency, black-box |