kikogazda commited on
Commit
7400dcb
·
verified ·
1 Parent(s): f42ccaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +260 -3
README.md CHANGED
@@ -1,3 +1,260 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - tanganke/stanford_cars
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - timm/efficientnetv2_rw_s.ra2_in1k
11
+ pipeline_tag: image-classification
12
+ ---
13
+
14
+ # 🚗 TwinCar: Fine-Grained Car Classification on Stanford Cars 196 (EfficientNetV2 Edition)
15
+
16
+ > **TwinCar** is a modern deep learning pipeline for car make/model/year classification, featuring a cutting-edge EfficientNetV2 backbone, advanced data augmentation (Mixup, CutMix), robust metric tracking, rich evaluation visuals, and deep model explainability (Grad-CAM++).
17
+ Developed for the Brainster Data Science Academy, 2025.
18
+
19
+ ---
20
+ <pre> CarClassificationTeam3/ ├── models/ ├── notebook/ │ └── Last_model.ipynb ├── reports/ ├── twincar/ │ ├── config.py │ ├── dataset.py │ ├── README.md │ └── modeling/ │ ├── train.py │ ├── predict.py │ └── gradcampp.py ├── README.md ├── last_model.py ├── requirements.txt </pre>
21
+ ---
22
+ ## Table of Contents
23
+
24
+ - [Overview](#overview)
25
+ - [Project Structure](#project-structure)
26
+ - [Dataset & Preprocessing](#dataset--preprocessing)
27
+ - [Model Architecture](#model-architecture)
28
+ - [Training Pipeline](#training-pipeline)
29
+ - [Grad-CAM++ Explainability](#grad-cam-explainability)
30
+ - [Visualizations](#visualizations)
31
+ - [Metrics & Results](#metrics--results)
32
+ - [Hugging Face & Demo](#hugging-face--demo)
33
+ - [Usage & Inference](#usage--inference)
34
+
35
+ ---
36
+
37
+ ## Overview
38
+
39
+ TwinCar tackles **fine-grained car recognition**: distinguishing between 196 car makes, models, and years, with minimal visual differences.
40
+
41
+ **Key features:**
42
+ - **EfficientNetV2** backbone (pretrained, SOTA accuracy and speed)
43
+ - Advanced augmentations: Mixup, CutMix, strong color/blur transforms
44
+ - Weighted random sampling for class balance
45
+ - Complete metric logging (accuracy, F1, precision, recall, Top-3/Top-5, confusion matrix)
46
+ - **Grad-CAM++** explainability, per-sample and grid
47
+ - **Test-Time Augmentation** (TTA) for robust evaluation
48
+ - Fully reproducible and scriptable end-to-end
49
+
50
+ ---
51
+
52
+ ## Dataset & Preprocessing
53
+
54
+ - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
55
+ - 196 classes, 16,185 images (train/test)
56
+ - Each image labeled by make/model/year
57
+ - Full human-readable metadata (`cars_meta.mat`)
58
+ - **Preprocessing:**
59
+ - Annotations extracted to CSV
60
+ - Stratified train/val split (10% validation)
61
+ - Outlier and missing image checks
62
+ - **Advanced augmentations**: random resized crop, flip, rotation, color jitter, blur, Mixup/CutMix
63
+ - Per-channel normalization (ImageNet stats)
64
+
65
+ ---
66
+
67
+ ## Model Architecture
68
+
69
+ - **Backbone:** `EfficientNetV2` (pretrained on ImageNet21k, all layers trainable)
70
+ - **Classifier Head:**
71
+ - Linear(embedding_size → 512) → ReLU → Dropout(0.2) → Linear(512 → 196)
72
+ - **Optimization:**
73
+ - AdamW optimizer (one of the most robust for deep learning)
74
+ - Cross-Entropy loss with optional label smoothing for regularization
75
+ - **Callbacks:**
76
+ - Early stopping (patience=7 epochs, on macro F1)
77
+ - ReduceLROnPlateau (automatic learning rate schedule)
78
+ - WeightedRandomSampler for class balance
79
+ - Full support for GPU or CPU
80
+
81
+ **Diagram:**
82
+ Input Image → [Augmentation: Mixup/CutMix, crop, jitter, blur]
83
+ → EfficientNetV2
84
+ → Custom Classifier Head
85
+ → 196-class Softmax
86
+
87
+ ---
88
+
89
+ ## Training Pipeline
90
+
91
+ - **Epochs:** Up to 25 (with early stopping)
92
+ - **Batch Size:** 32 (weighted for class balance, even for Mixup/CutMix)
93
+ - **Validation:** Macro/micro metrics, confusion matrices, Top-3/Top-5 accuracy
94
+ - **Logging:** All key metrics saved (CSV), plus visual curves for:
95
+ - Accuracy & F1 per epoch
96
+ - Precision/Recall (macro/weighted)
97
+ - Loss curves
98
+ - Top-3/Top-5 accuracy curves
99
+ - **Artifacts:** All reports, CSVs, and plots saved for reproducibility
100
+
101
+ **Typical training logic:**
102
+ - Train with strong augmentations & balanced sampling
103
+ - Monitor macro-F1 on validation set; trigger early stopping if no improvement
104
+ - Save the best model automatically
105
+
106
+ ---
107
+
108
+ ## Grad-CAM++ Explainability
109
+
110
+ **What is Grad-CAM++?**
111
+ Grad-CAM++ is an advanced visualization tool that highlights regions in an input image that are most influential for a model’s prediction.
112
+ - **Why use it?**
113
+ - Helps understand _why_ the model predicts a certain class (e.g., “is it focusing on the headlights or the logo?”)
114
+ - Builds trust for deployment and debugging
115
+ - **How it's used:**
116
+ - For each prediction, Grad-CAM++ generates a heatmap overlay showing which pixels most affected the result.
117
+
118
+ **Example (Grid):**
119
+ ![GradCAM Grid](reports/gradcam_grid.png)
120
+
121
+ *Selected Grad-CAM++ overlays for validation samples:
122
+ Green titles = correct prediction, Red titles = wrong prediction.*
123
+
124
+ ---
125
+ ## Visualizations
126
+
127
+ Below are key visual outputs from the model training and evaluation.
128
+ *All files are in the [`/reports`](./reports) directory.*
129
+
130
+ <table>
131
+ <tr>
132
+ <td>
133
+ <img src="reports/metrics_acc_f1_beautiful.png" width="350"/><br>
134
+ <b>Accuracy & Macro F1</b>
135
+ </td>
136
+ <td>
137
+ <img src="reports/metrics_loss_beautiful.png" width="350"/><br>
138
+ <b>Loss Curve</b>
139
+ </td>
140
+ </tr>
141
+ <tr>
142
+ <td>
143
+ <img src="reports/metrics_precision_recall_beautiful.png" width="350"/><br>
144
+ <b>Precision & Recall</b>
145
+ </td>
146
+ <td>
147
+ <img src="reports/metrics_topk_beautiful.png" width="350"/><br>
148
+ <b>Top-3/Top-5 Accuracy</b>
149
+ </td>
150
+ </tr>
151
+ <tr>
152
+ <td colspan="2" align="center">
153
+ <img src="reports/top20_accuracy_beautiful.png" width="400"/><br>
154
+ <b>Top-20 Accurate Classes</b>
155
+ </td>
156
+ </tr>
157
+ </table>
158
+
159
+ ---
160
+
161
+ ### 🔍 Advanced Evaluation
162
+
163
+ <table>
164
+ <tr>
165
+ <td>
166
+ <img src="reports/confused_top20_beautiful.png" width="340"/><br>
167
+ <b>Top-20 Most Confused Classes</b>
168
+ </td>
169
+ <td>
170
+ <img src="reports/confusion_matrix_beautiful.png" width="340"/><br>
171
+ <b>Full Confusion Matrix</b>
172
+ </td>
173
+ </tr>
174
+ </table>
175
+
176
+ ---
177
+
178
+ ## Metrics & Results
179
+
180
+ | Metric | Value (example) |
181
+ |------------------------|--------|
182
+ | train_loss | 0.97 |
183
+ | train_acc | 0.997 |
184
+ | val_loss | 1.40 |
185
+ | val_acc | 0.87 |
186
+ | val_precision_macro | 0.88 |
187
+ | val_precision_weighted | 0.89 |
188
+ | val_recall_macro | 0.87 |
189
+ | val_recall_weighted | 0.87 |
190
+ | val_f1_macro | 0.87 |
191
+ | val_f1_weighted | 0.88 |
192
+ | val_top3 | 0.95 |
193
+ | val_top5 | 0.97 |
194
+
195
+ ---
196
+
197
+ ## 🤗 Hugging Face & Demo
198
+
199
+ - **Model on Hugging Face:**
200
+ *(link here if uploaded)*
201
+ - **Live Gradio Demo:**
202
+ *(link here if deployed)*
203
+
204
+ ---
205
+
206
+ ## ⬇️ Download Resources
207
+
208
+ - **Stanford Cars 196 Dataset:**
209
+ [Download directly from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
210
+ - **Trained Model Weights:**
211
+ (link if available)
212
+
213
+ ## Usage & Inference
214
+
215
+ ```python
216
+
217
+ import torch
218
+ import timm
219
+ from torchvision import transforms
220
+ from PIL import Image
221
+ import scipy.io
222
+ import os
223
+
224
+ # Set dataset and model paths
225
+ EXTRACTED_ROOT = "stanford_cars" # Change if you extracted elsewhere
226
+ META_PATH = os.path.join(EXTRACTED_ROOT, "car_devkit", "devkit", "cars_meta.mat")
227
+ MODEL_PATH = "models/efficientnetv2_best_model.pth"
228
+
229
+ # Load class names directly from Stanford Cars dataset
230
+ meta = scipy.io.loadmat(META_PATH)
231
+ class_names = [x[0] for x in meta['class_names'][0]]
232
+
233
+ # Model setup
234
+ NUM_CLASSES = len(class_names)
235
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
236
+ model = timm.create_model('efficientnetv2_rw_s', pretrained=False, num_classes=NUM_CLASSES)
237
+ model.load_state_dict(torch.load(MODEL_PATH, map_location=device))
238
+ model.eval()
239
+ model.to(device)
240
+
241
+ # Preprocessing (matches validation)
242
+ imagenet_mean = [0.485, 0.456, 0.406]
243
+ imagenet_std = [0.229, 0.224, 0.225]
244
+ transform = transforms.Compose([
245
+ transforms.Resize(256),
246
+ transforms.CenterCrop(224),
247
+ transforms.ToTensor(),
248
+ transforms.Normalize(mean=imagenet_mean, std=imagenet_std)
249
+ ])
250
+
251
+ # Load and preprocess image
252
+ img = Image.open("your_image.jpg").convert("RGB")
253
+ input_tensor = transform(img).unsqueeze(0).to(device)
254
+
255
+ # Predict
256
+ with torch.no_grad():
257
+ output = model(input_tensor)
258
+ pred_idx = output.argmax(1).item()
259
+
260
+ print(f"Predicted class: {class_names[pred_idx]} (index: {pred_idx})")