nenzilea commited on
Commit
8f1415f
·
verified ·
1 Parent(s): e1afb80

Upload readme documentation

Browse files
Files changed (1) hide show
  1. readme_.md +63 -0
readme_.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Car Brand Classification App
2
+
3
+ This app compares 3 image classification approaches on car images:
4
+
5
+ - Fine-tuned ViT model ([`nenzilea/car-classification`](https://huggingface.co/nenzilea/car-classification))
6
+ - Zero-shot CLIP (`openai/clip-vit-large-patch14`)
7
+ - OpenAI vision model (GPT-4o image classification)
8
+
9
+ ## Dataset Used For Training
10
+
11
+ - Hugging Face dataset: `tanganke/stanford_cars`
12
+ - The Stanford Cars dataset contains 196 fine-grained classes (car make/model/year combinations). We group them into 9 brand-level classes for a cleaner, more visually meaningful classification task.
13
+ - Number of classes: `9`
14
+ - Classes: `BMW`, `Dodge`, `Ferrari`, `Ford`, `Jeep`, `Lamborghini`, `Porsche`, `Rolls-Royce`, `Toyota`
15
+
16
+ ### Preprocessing Steps
17
+
18
+ 1. **Brand extraction** — each of the 196 Stanford Cars class names (e.g. `"Ferrari 458 Italia Coupe 2012"`) is mapped to one of 9 brands by substring matching.
19
+ 2. **Filtering** — images whose class does not belong to the 8 brands are removed from the dataset.
20
+ 3. **Label remapping** — original integer labels (0–195) are re-mapped to brand indices (0–8).
21
+ 4. **Train/validation/test split** — the original training split is divided 80/10/10 (train/validation/test) using `train_test_split(test_size=0.2, seed=42)`.
22
+ 5. **Image preprocessing** — images are resized to 224×224 and pixel values are normalised to [-1, 1] using `AutoImageProcessor` from `google/vit-base-patch16-224`.
23
+ 6. **RGB conversion** — all images are converted to RGB to handle any grayscale or RGBA edge cases.
24
+
25
+ ## Trained Model
26
+
27
+ - Hugging Face model link: [https://huggingface.co/nenzilea/car-classification](https://huggingface.co/nenzilea/car-classification)
28
+ - Base model: `google/vit-base-patch16-224`
29
+ - Only the final classification head was fine-tuned (all other layers frozen).
30
+ - Trainable parameters: ~4,614 out of ~85.8M total.
31
+
32
+ ## Training Performance
33
+
34
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
35
+ |---:|---:|---:|---:|---:|
36
+ | 1.368263 | 1.0 | 128 | 1.158491 | 0.5529 |
37
+ | 1.093463 | 2.0 | 256 | 0.998960 | 0.6627 |
38
+ | 1.005174 | 3.0 | 384 | 0.934044 | 0.6667 |
39
+ | 0.946650 | 4.0 | 512 | 0.900367 | 0.6549 |
40
+ | 0.886520 | 5.0 | 640 | 0.887630 | 0.6706 |
41
+
42
+ ## Hugging Face Space
43
+
44
+ - App link: [https://huggingface.co/spaces/nenzilea/car-classification](https://huggingface.co/spaces/nenzilea/car-classification)
45
+
46
+ ## Example Image Results
47
+
48
+ The table below reports the true class and Top-3 predictions for ViT, CLIP, and GPT-4o.
49
+
50
+ | Image | True Class | ViT Top-3 (score) | CLIP Top-3 (score) | OpenAI GPT-4o (label, confidence) |
51
+ |---|---|---|---|---|
52
+ | `Dodge.jpg` | `Dodge` | BMW (0.3564), Dodge (0.2218), Rolls-Royce (0.1807) | Dodge (0.9432), Jeep (0.0393), Lamborghini (0.0078) | Dodge (1.00) |
53
+ | `Ferrari.jpg` | `Ferrari` | Ferrari (0.6007), Lamborghini (0.2946), Ford (0.0296) | Ferrari (0.9958), Lamborghini (0.0032), Ford (0.0004) | Ferrari (1.00) |
54
+ | `BMW.jpg` | `BMW` | BMW (0.2737), Porsche (0.1800), Dodge (0.1630) | BMW (0.9969), Porsche (0.0014), Ferrari (0.0007) | BMW (1.00) |
55
+ | `Porsche.jpg` | `Porsche` | BMW (0.5858), Dodge (0.2040), Toyota (0.0667) | Porsche (0.9887), Lamborghini (0.0047), Dodge (0.0022) | Porsche (1.00) |
56
+
57
+ ## Model Comparison Summary
58
+
59
+ | Model | Approach | Strengths | Weaknesses |
60
+ |---|---|---|---|
61
+ | **Custom ViT** | Supervised fine-tuning on 9 car brands | High accuracy on known brands | Only classifies the 9 trained brands |
62
+ | **CLIP** | Zero-shot with brand name as text prompt | No training needed, flexible labels | Lower accuracy; may confuse visually similar brands |
63
+ | **OpenAI GPT-4o** | LLM vision with natural language prompt | Strong reasoning, handles unusual angles | API cost, latency, black-box |