nenzilea
/

car-classification

@@ -2,13 +2,13 @@
 This app compares 3 image classification approaches on car images:
-- Fine-tuned ViT model ([`nenzilea/car-classification`](https://huggingface.co/nenzilea/car-classification))
 - Zero-shot CLIP (`openai/clip-vit-large-patch14`)
 - OpenAI vision model (GPT-4o image classification)
 ## Dataset Used For Training
-- Hugging Face dataset: `tanganke/stanford_cars`
 - The Stanford Cars dataset contains 196 fine-grained classes (car make/model/year combinations). We group them into 9 brand-level classes for a cleaner, more visually meaningful classification task.
 - Number of classes: `9`
 - Classes: `BMW`, `Dodge`, `Ferrari`, `Ford`, `Jeep`, `Lamborghini`, `Porsche`, `Rolls-Royce`, `Toyota`
@@ -24,7 +24,7 @@ This app compares 3 image classification approaches on car images:
 ## Trained Model
-- Hugging Face model link: [https://huggingface.co/nenzilea/car-classification](https://huggingface.co/nenzilea/car-classification)
 - Base model: `google/vit-base-patch16-224`
 - Only the final classification head was fine-tuned (all other layers frozen).
 - Trainable parameters: ~4,614 out of ~85.8M total.
@@ -41,7 +41,7 @@ This app compares 3 image classification approaches on car images:
 ## Hugging Face Space
-- App link: [https://huggingface.co/spaces/nenzilea/car-classification](https://huggingface.co/spaces/nenzilea/car-classification)
 ## Example Image Results
@@ -51,5 +51,13 @@ The table below reports the true class and Top-3 predictions for ViT, CLIP, and
 |---|---|---|---|---|
 | `Dodge.jpg` | `Dodge` | BMW (0.3564), Dodge (0.2218), Rolls-Royce (0.1807) | Dodge (0.9432), Jeep (0.0393), Lamborghini (0.0078) | Dodge (1.00) |
 | `Ferrari.jpg` | `Ferrari` | Ferrari (0.6007), Lamborghini (0.2946), Ford (0.0296) | Ferrari (0.9958), Lamborghini (0.0032), Ford (0.0004) | Ferrari (1.00) |
-| `BMW.jpg` | `BMW` | BMW (0.2737), Porsche (0.1800), Dodge (0.1630) | BMW (0.9969), Porsche (0.0014), Ferrari (0.0007) | BMW (1.00) |
-| `Porsche.jpg` | `Porsche` | BMW (0.5858), Dodge (0.2040), Toyota (0.0667) | Porsche (0.9887), Lamborghini (0.0047), Dodge (0.0022) | Porsche (1.00) |

 This app compares 3 image classification approaches on car images:
+- Fine-tuned ViT model (`nenzilea/car-classification`)
 - Zero-shot CLIP (`openai/clip-vit-large-patch14`)
 - OpenAI vision model (GPT-4o image classification)
 ## Dataset Used For Training
+- Hugging Face dataset: https://huggingface.co/datasets/tanganke/stanford_cars
 - The Stanford Cars dataset contains 196 fine-grained classes (car make/model/year combinations). We group them into 9 brand-level classes for a cleaner, more visually meaningful classification task.
 - Number of classes: `9`
 - Classes: `BMW`, `Dodge`, `Ferrari`, `Ford`, `Jeep`, `Lamborghini`, `Porsche`, `Rolls-Royce`, `Toyota`
 ## Trained Model
+- Hugging Face model link: https://huggingface.co/nenzilea/car-classification
 - Base model: `google/vit-base-patch16-224`
 - Only the final classification head was fine-tuned (all other layers frozen).
 - Trainable parameters: ~4,614 out of ~85.8M total.
 ## Hugging Face Space
+- App link: https://huggingface.co/spaces/nenzilea/car-classification
 ## Example Image Results
 |---|---|---|---|---|
 | `Dodge.jpg` | `Dodge` | BMW (0.3564), Dodge (0.2218), Rolls-Royce (0.1807) | Dodge (0.9432), Jeep (0.0393), Lamborghini (0.0078) | Dodge (1.00) |
 | `Ferrari.jpg` | `Ferrari` | Ferrari (0.6007), Lamborghini (0.2946), Ford (0.0296) | Ferrari (0.9958), Lamborghini (0.0032), Ford (0.0004) | Ferrari (1.00) |
+| `BMW.jpg` | `BMW` | BMW (0.2737), Porsche (0.1800), Dodge (0.1630) | BMW (0.9969), Porsche (0.0014), Ferrari (0.0007) | BMW (0.95), Porsche (0.001), Dodge (0.001), Ferrari (0.001), Ford (0.001) |
+| `Porsche.jpg` | `Porsche` | BMW (0.5858), Dodge (0.2040), Toyota (0.0667) | Porsche (0.9887), Lamborghini (0.0047), Dodge (0.0022) | Porsche (1.00) |
+## Model Comparison Summary
+| Model | Approach | Strengths | Weaknesses |
+|---|---|---|---|
+| **Custom ViT** | Supervised fine-tuning on 9 car brands | High accuracy on known brands | Only classifies the 9 trained brands |
+| **CLIP** | Zero-shot with brand name as text prompt | No training needed, flexible labels | Lower accuracy; may confuse visually similar brands |
+| **OpenAI GPT-4o** | LLM vision with natural language prompt | Strong reasoning, handles unusual angles | API cost, latency, black-box |