kikogazda commited on
Commit
26d0865
Β·
verified Β·
1 Parent(s): f10ca14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -169
README.md CHANGED
@@ -1,259 +1,219 @@
1
  ---
2
  license: apache-2.0
 
3
  datasets:
4
  - tanganke/stanford_cars
 
5
  language:
6
  - en
 
7
  metrics:
8
  - accuracy
 
9
  base_model:
10
  - timm/efficientnetv2_rw_s.ra2_in1k
 
11
  pipeline_tag: image-classification
12
  ---
13
- # πŸš— TwinCar: Fine-Grained Car Classification on Stanford Cars 196 (EfficientNetV2 Edition)
14
 
15
- > **TwinCar** is a modern deep learning pipeline for car make/model/year classification, featuring a cutting-edge EfficientNetV2 backbone, advanced data augmentation (Mixup, CutMix), robust metric tracking, rich evaluation visuals, and deep model explainability (Grad-CAM++).
16
- Developed for the Brainster Data Science Academy, 2025.
 
 
17
 
18
  ---
19
- <pre> CarClassificationTeam3/ β”œβ”€β”€ models/ β”œβ”€β”€ notebook/ β”‚ └── Last_model.ipynb β”œβ”€β”€ reports/ β”œβ”€β”€ twincar/ β”‚ β”œβ”€β”€ config.py β”‚ β”œβ”€β”€ dataset.py β”‚ β”œβ”€β”€ README.md β”‚ └── modeling/ β”‚ β”œβ”€β”€ train.py β”‚ β”œβ”€β”€ predict.py β”‚ └── gradcampp.py β”œβ”€β”€ README.md β”œβ”€β”€ last_model.py β”œβ”€β”€ requirements.txt </pre>
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
- ## Table of Contents
 
22
 
23
  - [Overview](#overview)
24
- - [Project Structure](#project-structure)
25
  - [Dataset & Preprocessing](#dataset--preprocessing)
26
  - [Model Architecture](#model-architecture)
27
  - [Training Pipeline](#training-pipeline)
28
- - [Grad-CAM++ Explainability](#grad-cam-explainability)
29
  - [Visualizations](#visualizations)
30
  - [Metrics & Results](#metrics--results)
31
  - [Hugging Face & Demo](#hugging-face--demo)
 
32
  - [Usage & Inference](#usage--inference)
 
33
 
34
  ---
35
 
36
  ## Overview
37
 
38
- TwinCar tackles **fine-grained car recognition**: distinguishing between 196 car makes, models, and years, with minimal visual differences.
39
-
40
- **Key features:**
41
- - **EfficientNetV2** backbone (pretrained, SOTA accuracy and speed)
42
- - Advanced augmentations: Mixup, CutMix, strong color/blur transforms
43
- - Weighted random sampling for class balance
44
- - Complete metric logging (accuracy, F1, precision, recall, Top-3/Top-5, confusion matrix)
45
- - **Grad-CAM++** explainability, per-sample and grid
46
- - **Test-Time Augmentation** (TTA) for robust evaluation
47
- - Fully reproducible and scriptable end-to-end
48
 
49
  ---
50
 
51
  ## Dataset & Preprocessing
52
 
53
  - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
54
- - 196 classes, 16,185 images (train/test)
55
- - Each image labeled by make/model/year
56
- - Full human-readable metadata (`cars_meta.mat`)
57
  - **Preprocessing:**
58
- - Annotations extracted to CSV
59
- - Stratified train/val split (10% validation)
60
- - Outlier and missing image checks
61
- - **Advanced augmentations**: random resized crop, flip, rotation, color jitter, blur, Mixup/CutMix
62
- - Per-channel normalization (ImageNet stats)
63
 
64
  ---
65
 
66
  ## Model Architecture
67
 
68
- - **Backbone:** `EfficientNetV2` (pretrained on ImageNet21k, all layers trainable)
69
- - **Classifier Head:**
70
- - Linear(embedding_size β†’ 512) β†’ ReLU β†’ Dropout(0.2) β†’ Linear(512 β†’ 196)
71
  - **Optimization:**
72
- - AdamW optimizer (one of the most robust for deep learning)
73
- - Cross-Entropy loss with optional label smoothing for regularization
74
- - **Callbacks:**
75
- - Early stopping (patience=7 epochs, on macro F1)
76
- - ReduceLROnPlateau (automatic learning rate schedule)
77
  - WeightedRandomSampler for class balance
78
- - Full support for GPU or CPU
79
 
80
- **Diagram:**
81
- Input Image β†’ [Augmentation: Mixup/CutMix, crop, jitter, blur]
82
- β†’ EfficientNetV2
83
- β†’ Custom Classifier Head
84
- β†’ 196-class Softmax
85
 
86
  ---
87
 
88
  ## Training Pipeline
89
 
90
- - **Epochs:** Up to 25 (with early stopping)
91
- - **Batch Size:** 32 (weighted for class balance, even for Mixup/CutMix)
92
- - **Validation:** Macro/micro metrics, confusion matrices, Top-3/Top-5 accuracy
93
- - **Logging:** All key metrics saved (CSV), plus visual curves for:
94
- - Accuracy & F1 per epoch
95
- - Precision/Recall (macro/weighted)
96
- - Loss curves
97
- - Top-3/Top-5 accuracy curves
98
- - **Artifacts:** All reports, CSVs, and plots saved for reproducibility
99
-
100
- **Typical training logic:**
101
- - Train with strong augmentations & balanced sampling
102
- - Monitor macro-F1 on validation set; trigger early stopping if no improvement
103
- - Save the best model automatically
104
 
105
  ---
106
 
107
- ## Grad-CAM++ Explainability
108
-
109
- **What is Grad-CAM++?**
110
- Grad-CAM++ is an advanced visualization tool that highlights regions in an input image that are most influential for a model’s prediction.
111
- - **Why use it?**
112
- - Helps understand _why_ the model predicts a certain class (e.g., β€œis it focusing on the headlights or the logo?”)
113
- - Builds trust for deployment and debugging
114
- - **How it's used:**
115
- - For each prediction, Grad-CAM++ generates a heatmap overlay showing which pixels most affected the result.
116
 
117
- **Example (Grid):**
118
- ![GradCAM Grid](reports/gradcam_grid.png)
 
119
 
120
- *Selected Grad-CAM++ overlays for validation samples:
121
- Green titles = correct prediction, Red titles = wrong prediction.*
122
 
123
  ---
124
- ## Visualizations
125
-
126
- Below are key visual outputs from the model training and evaluation.
127
- *All files are in the [`/reports`](./reports) directory.*
128
-
129
- <table>
130
- <tr>
131
- <td>
132
- <img src="reports/metrics_acc_f1_beautiful.png" width="350"/><br>
133
- <b>Accuracy & Macro F1</b>
134
- </td>
135
- <td>
136
- <img src="reports/metrics_loss_beautiful.png" width="350"/><br>
137
- <b>Loss Curve</b>
138
- </td>
139
- </tr>
140
- <tr>
141
- <td>
142
- <img src="reports/metrics_precision_recall_beautiful.png" width="350"/><br>
143
- <b>Precision & Recall</b>
144
- </td>
145
- <td>
146
- <img src="reports/metrics_topk_beautiful.png" width="350"/><br>
147
- <b>Top-3/Top-5 Accuracy</b>
148
- </td>
149
- </tr>
150
- <tr>
151
- <td colspan="2" align="center">
152
- <img src="reports/top20_accuracy_beautiful.png" width="400"/><br>
153
- <b>Top-20 Accurate Classes</b>
154
- </td>
155
- </tr>
156
- </table>
157
 
158
- ---
159
 
160
- ### πŸ” Advanced Evaluation
161
-
162
- <table>
163
- <tr>
164
- <td>
165
- <img src="reports/confused_top20_beautiful.png" width="340"/><br>
166
- <b>Top-20 Most Confused Classes</b>
167
- </td>
168
- <td>
169
- <img src="reports/confusion_matrix_beautiful.png" width="340"/><br>
170
- <b>Full Confusion Matrix</b>
171
- </td>
172
- </tr>
173
- </table>
174
 
175
  ---
176
 
177
- ## Metrics & Results
178
-
179
- | Metric | Value (example) |
180
- |------------------------|--------|
181
- | train_loss | 0.97 |
182
- | train_acc | 0.997 |
183
- | val_loss | 1.40 |
184
- | val_acc | 0.87 |
185
- | val_precision_macro | 0.88 |
186
- | val_precision_weighted | 0.89 |
187
- | val_recall_macro | 0.87 |
188
- | val_recall_weighted | 0.87 |
189
- | val_f1_macro | 0.87 |
190
- | val_f1_weighted | 0.88 |
191
- | val_top3 | 0.95 |
192
- | val_top5 | 0.97 |
193
 
194
  ---
195
 
196
  ## πŸ€— Hugging Face & Demo
197
 
198
- - **Model on Hugging Face:**
199
- *()*
200
- - **Live Gradio Demo:**
201
- *()*
 
 
202
 
203
  ---
204
 
205
  ## ⬇️ Download Resources
206
 
207
  - **Stanford Cars 196 Dataset:**
208
- [Download directly from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
209
  - **Trained Model Weights:**
210
- (link if available)
211
-
212
- ## Usage & Inference
 
 
 
 
 
 
213
 
214
- ```python
 
 
215
 
216
  import torch
217
- import timm
218
  from torchvision import transforms
219
  from PIL import Image
220
- import scipy.io
221
- import os
222
-
223
- # Set dataset and model paths
224
- EXTRACTED_ROOT = "stanford_cars" # Change if you extracted elsewhere
225
- META_PATH = os.path.join(EXTRACTED_ROOT, "car_devkit", "devkit", "cars_meta.mat")
226
- MODEL_PATH = "models/efficientnetv2_best_model.pth"
227
-
228
- # Load class names directly from Stanford Cars dataset
229
- meta = scipy.io.loadmat(META_PATH)
230
- class_names = [x[0] for x in meta['class_names'][0]]
231
-
232
- # Model setup
233
- NUM_CLASSES = len(class_names)
234
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
235
- model = timm.create_model('efficientnetv2_rw_s', pretrained=False, num_classes=NUM_CLASSES)
236
- model.load_state_dict(torch.load(MODEL_PATH, map_location=device))
237
  model.eval()
238
- model.to(device)
239
 
240
- # Preprocessing (matches validation)
241
- imagenet_mean = [0.485, 0.456, 0.406]
242
- imagenet_std = [0.229, 0.224, 0.225]
243
  transform = transforms.Compose([
244
- transforms.Resize(256),
245
- transforms.CenterCrop(224),
246
  transforms.ToTensor(),
247
- transforms.Normalize(mean=imagenet_mean, std=imagenet_std)
248
  ])
249
-
250
- # Load and preprocess image
251
  img = Image.open("your_image.jpg").convert("RGB")
252
- input_tensor = transform(img).unsqueeze(0).to(device)
253
 
254
  # Predict
255
  with torch.no_grad():
256
  output = model(input_tensor)
257
- pred_idx = output.argmax(1).item()
258
 
259
- print(f"Predicted class: {class_names[pred_idx]} (index: {pred_idx})")
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+
4
  datasets:
5
  - tanganke/stanford_cars
6
+
7
  language:
8
  - en
9
+
10
  metrics:
11
  - accuracy
12
+
13
  base_model:
14
  - timm/efficientnetv2_rw_s.ra2_in1k
15
+
16
  pipeline_tag: image-classification
17
  ---
 
18
 
19
+ # πŸš— EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition
20
+
21
+ > **EfficientNetV2 Car Classifier** delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM.
22
+ > Developed by kikogazda, 2025.
23
 
24
  ---
25
+
26
+ ## πŸ“ Project Structure
27
+
28
+ <pre>
29
+ Efficient_NetV2_Edition/
30
+ β”œβ”€β”€ efficientnetv2_best_model.pth # Best model weights
31
+ β”œβ”€β”€ Last_model.ipynb # Full training & evaluation pipeline
32
+ β”œβ”€β”€ class_mapping.json # Class index to name mapping
33
+ β”œβ”€β”€ *.csv # Logs, splits, labels, and metrics
34
+ β”œβ”€β”€ *.png # Visualizations and Grad-CAM outputs
35
+ β”œβ”€β”€ README.md # Model card (this file)
36
+ └── ... # Additional scripts, reports, and assets
37
+ </pre>
38
+
39
  ---
40
+
41
+ ## 🚦 Table of Contents
42
 
43
  - [Overview](#overview)
 
44
  - [Dataset & Preprocessing](#dataset--preprocessing)
45
  - [Model Architecture](#model-architecture)
46
  - [Training Pipeline](#training-pipeline)
47
+ - [Explainability (Grad-CAM)](#explainability-grad-cam)
48
  - [Visualizations](#visualizations)
49
  - [Metrics & Results](#metrics--results)
50
  - [Hugging Face & Demo](#hugging-face--demo)
51
+ - [Download Resources](#download-resources)
52
  - [Usage & Inference](#usage--inference)
53
+ - [References](#references)
54
 
55
  ---
56
 
57
  ## Overview
58
 
59
+ **EfficientNetV2 Car Classifier** tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible.
60
+ **Highlights:**
61
+ - Modern EfficientNetV2 backbone with transfer learning
62
+ - Aggressive, real-world augmentation pipeline
63
+ - Class balancing for rare makes/models
64
+ - Extensive, scriptable metric tracking and reporting
65
+ - End-to-end explainability with Grad-CAM
66
+ - Fully reproducible, robust, and deployment-ready
 
 
67
 
68
  ---
69
 
70
  ## Dataset & Preprocessing
71
 
72
  - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
73
+ - 196 classes, 16,185 images (official train/test split)
74
+ - Detailed make/model/year for each image
 
75
  - **Preprocessing:**
76
+ - Annotation CSV export and class mapping JSON
77
+ - Stratified train/val/test split (maintains class distribution)
78
+ - Outlier cleaning and normalization
79
+ - Augmentations: random resized crop, flip, rotate, color jitter, blur
80
+ - ImageNet mean/std normalization
81
 
82
  ---
83
 
84
  ## Model Architecture
85
 
86
+ - **Backbone:** EfficientNetV2 (pretrained)
87
+ - All but the last blocks frozen initially
88
+ - Custom classifier head for 196 classes (Linear β†’ ReLU β†’ Dropout β†’ Linear)
89
  - **Optimization:**
90
+ - Adam optimizer
91
+ - Cross-Entropy loss (with label smoothing)
92
+ - Learning rate scheduling (ReduceLROnPlateau)
93
+ - Early stopping (macro F1 on validation)
 
94
  - WeightedRandomSampler for class balance
 
95
 
96
+ **Flow:**
97
+ Input β†’ [Augmentations] β†’ EfficientNetV2 Backbone β†’ Custom Head β†’ Softmax (196 classes)
 
 
 
98
 
99
  ---
100
 
101
  ## Training Pipeline
102
 
103
+ - **Epochs:** Up to 25 (early stopping enabled)
104
+ - **Batch Size:** 32 (weighted sampling)
105
+ - **Validation:** Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy
106
+ - **Logging:** All metrics and losses to CSV, plus high-res visual plots:
107
+ - Accuracy/F1 per epoch
108
+ - Precision/Recall (macro, weighted)
109
+ - Loss curve
110
+ - Top-3/Top-5 accuracy
111
+ - **Artifacts:** All reports, CSVs, and visuals in repo for transparency
 
 
 
 
 
112
 
113
  ---
114
 
115
+ ## Explainability (Grad-CAM)
 
 
 
 
 
 
 
 
116
 
117
+ Grad-CAM overlays highlight image regions most responsible for model predictionsβ€”letting you "see" what the network is using for its decisions.
118
+ - *Why?* Trust, transparency, debugging.
119
+ - *How?* For every prediction, a heatmap overlay shows most influential pixels.
120
 
121
+ ![GradCAM Example](./gradcam_grid.png)
122
+ *Heatmaps visualize key decision regions for each sample.*
123
 
124
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
+ ## πŸ“Š Visualizations
127
 
128
+ Key assets (see repo for all):
129
+
130
+ | Visualization | Description |
131
+ |-----------------------------------|---------------------------------------------|
132
+ | `confusion_matrix_beautiful.png` | Confusion matrix (validation set) |
133
+ | `metrics_acc_f1_beautiful.png` | Training & validation accuracy/F1 curves |
134
+ | `metrics_loss_beautiful.png` | Loss curves |
135
+ | `metrics_precision_recall_beautiful.png` | Precision/Recall by epoch |
136
+ | `metrics_topk_beautiful.png` | Top-3/Top-5 accuracy |
137
+ | `top20_accuracy_beautiful.png` | Top-20 class accuracy |
138
+ | `gradcam_grid.png` | Grad-CAM visualizations grid |
 
 
 
139
 
140
  ---
141
 
142
+ ## πŸ“ˆ Metrics & Results
143
+
144
+ | Metric | Value |
145
+ |------------------------|---------|
146
+ | train_loss | 0.97 |
147
+ | train_acc | 0.997 |
148
+ | val_loss | 1.40 |
149
+ | val_acc | 0.87 |
150
+ | val_precision_macro | 0.89 |
151
+ | val_precision_weighted | 0.89 |
152
+ | val_recall_macro | 0.87 |
153
+ | val_recall_weighted | 0.87 |
154
+ | val_f1_macro | 0.87 |
155
+ | val_f1_weighted | 0.88 |
156
+ | val_top3 | 0.95 |
157
+ | val_top5 | 0.97 |
158
 
159
  ---
160
 
161
  ## πŸ€— Hugging Face & Demo
162
 
163
+ **Model on Hugging Face:**
164
+ [EfficientNetV2 Car Classifier on Hugging Face](https://huggingface.co/kikogazda/Efficient_NetV2_Edition)
165
+ **Live Gradio Demo:**
166
+ [Gradio Demo Space](https://kikogazda-efficientnetv2-demo.hf.space)
167
+ [![Model Demo Space](https://img.shields.io/badge/Gradio-Demo-green?logo=gradio)](https://kikogazda-efficientnetv2-demo.hf.space)
168
+ [![HuggingFace Model](https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition)
169
 
170
  ---
171
 
172
  ## ⬇️ Download Resources
173
 
174
  - **Stanford Cars 196 Dataset:**
175
+ [Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
176
  - **Trained Model Weights:**
177
+ [Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth)
178
+ - **Class mapping/metadata:**
179
+ Included as `class_mapping.json` in this repo
180
+
181
+ ---
182
+
183
+ ## πŸ’» Usage & Inference
184
+
185
+ ### 1. Install dependencies
186
 
187
+ ```bash
188
+ pip install -r requirements.txt
189
+ pip install torch torchvision pytorch-grad-cam gradio
190
 
191
  import torch
 
192
  from torchvision import transforms
193
  from PIL import Image
194
+ import json
195
+ from efficientnet_pytorch import EfficientNet
196
+
197
+ # Load model
198
+ model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196)
199
+ model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu"))
 
 
 
 
 
 
 
 
 
 
 
200
  model.eval()
 
201
 
202
+ # Preprocess
 
 
203
  transform = transforms.Compose([
204
+ transforms.Resize((224, 224)),
 
205
  transforms.ToTensor(),
206
+ transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
207
  ])
 
 
208
  img = Image.open("your_image.jpg").convert("RGB")
209
+ input_tensor = transform(img).unsqueeze(0)
210
 
211
  # Predict
212
  with torch.no_grad():
213
  output = model(input_tensor)
214
+ pred = output.argmax(1).item()
215
 
216
+ # Class name
217
+ with open("class_mapping.json") as f:
218
+ class_map = json.load(f)
219
+ print("Predicted class:", class_map[str(pred)])