kikogazda
/

Efficient_NetV2_Edition

Image Classification

English

Model card Files Files and versions

xet

Community

kikogazda commited on Jun 25, 2025

Commit

26d0865

verified ·

1 Parent(s): f10ca14

Update README.md

Browse files

Files changed (1) hide show

README.md +129 -169

README.md CHANGED Viewed

@@ -1,259 +1,219 @@
 ---
 license: apache-2.0
 datasets:
 - tanganke/stanford_cars
 language:
 - en
 metrics:
 - accuracy
 base_model:
 - timm/efficientnetv2_rw_s.ra2_in1k
 pipeline_tag: image-classification
 ---
-# 🚗 TwinCar: Fine-Grained Car Classification on Stanford Cars 196 (EfficientNetV2 Edition)
-> **TwinCar** is a modern deep learning pipeline for car make/model/year classification, featuring a cutting-edge EfficientNetV2 backbone, advanced data augmentation (Mixup, CutMix), robust metric tracking, rich evaluation visuals, and deep model explainability (Grad-CAM++).
-Developed for the Brainster Data Science Academy, 2025.
 ---
-<pre> CarClassificationTeam3/ ├── models/ ├── notebook/ │ └── Last_model.ipynb ├── reports/ ├── twincar/ │ ├── config.py │ ├── dataset.py │ ├── README.md │ └── modeling/ │ ├── train.py │ ├── predict.py │ └── gradcampp.py ├── README.md ├── last_model.py ├── requirements.txt </pre>
 ---
-## Table of Contents
 - [Overview](#overview)
-- [Project Structure](#project-structure)
 - [Dataset & Preprocessing](#dataset--preprocessing)
 - [Model Architecture](#model-architecture)
 - [Training Pipeline](#training-pipeline)
-- [Grad-CAM++ Explainability](#grad-cam-explainability)
 - [Visualizations](#visualizations)
 - [Metrics & Results](#metrics--results)
 - [Hugging Face & Demo](#hugging-face--demo)
 - [Usage & Inference](#usage--inference)
 ---
 ## Overview
-TwinCar tackles **fine-grained car recognition**: distinguishing between 196 car makes, models, and years, with minimal visual differences.
-**Key features:**
-- **EfficientNetV2** backbone (pretrained, SOTA accuracy and speed)
-- Advanced augmentations: Mixup, CutMix, strong color/blur transforms
-- Weighted random sampling for class balance
-- Complete metric logging (accuracy, F1, precision, recall, Top-3/Top-5, confusion matrix)
-- **Grad-CAM++** explainability, per-sample and grid
-- **Test-Time Augmentation** (TTA) for robust evaluation
-- Fully reproducible and scriptable end-to-end
 ---
 ## Dataset & Preprocessing
 - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
-    - 196 classes, 16,185 images (train/test)
-    - Each image labeled by make/model/year
-    - Full human-readable metadata (`cars_meta.mat`)
 - **Preprocessing:**
-    - Annotations extracted to CSV
-    - Stratified train/val split (10% validation)
-    - Outlier and missing image checks
-    - **Advanced augmentations**: random resized crop, flip, rotation, color jitter, blur, Mixup/CutMix
-    - Per-channel normalization (ImageNet stats)
 ---
 ## Model Architecture
-- **Backbone:** `EfficientNetV2` (pretrained on ImageNet21k, all layers trainable)
-- **Classifier Head:**
-    - Linear(embedding_size → 512) → ReLU → Dropout(0.2) → Linear(512 → 196)
 - **Optimization:**
-    - AdamW optimizer (one of the most robust for deep learning)
-    - Cross-Entropy loss with optional label smoothing for regularization
-- **Callbacks:**
-    - Early stopping (patience=7 epochs, on macro F1)
-    - ReduceLROnPlateau (automatic learning rate schedule)
     - WeightedRandomSampler for class balance
-    - Full support for GPU or CPU
-**Diagram:**
-Input Image → [Augmentation: Mixup/CutMix, crop, jitter, blur]
-→ EfficientNetV2
-→ Custom Classifier Head
-→ 196-class Softmax
 ---
 ## Training Pipeline
-- **Epochs:** Up to 25 (with early stopping)
-- **Batch Size:** 32 (weighted for class balance, even for Mixup/CutMix)
-- **Validation:** Macro/micro metrics, confusion matrices, Top-3/Top-5 accuracy
-- **Logging:** All key metrics saved (CSV), plus visual curves for:
-    - Accuracy & F1 per epoch
-    - Precision/Recall (macro/weighted)
-    - Loss curves
-    - Top-3/Top-5 accuracy curves
-- **Artifacts:** All reports, CSVs, and plots saved for reproducibility
-**Typical training logic:**
-- Train with strong augmentations & balanced sampling
-- Monitor macro-F1 on validation set; trigger early stopping if no improvement
-- Save the best model automatically
 ---
-## Grad-CAM++ Explainability
-**What is Grad-CAM++?**
-Grad-CAM++ is an advanced visualization tool that highlights regions in an input image that are most influential for a model’s prediction.
-- **Why use it?**
-    - Helps understand _why_ the model predicts a certain class (e.g., “is it focusing on the headlights or the logo?”)
-    - Builds trust for deployment and debugging
-- **How it's used:**
-    - For each prediction, Grad-CAM++ generates a heatmap overlay showing which pixels most affected the result.
-**Example (Grid):**
-![GradCAM Grid](reports/gradcam_grid.png)
-*Selected Grad-CAM++ overlays for validation samples:
-Green titles = correct prediction, Red titles = wrong prediction.*
 ---
-##  Visualizations
-Below are key visual outputs from the model training and evaluation.
-*All files are in the [`/reports`](./reports) directory.*
-<table>
-  <tr>
-    <td>
-      <img src="reports/metrics_acc_f1_beautiful.png" width="350"/><br>
-      <b>Accuracy & Macro F1</b>
-    </td>
-    <td>
-      <img src="reports/metrics_loss_beautiful.png" width="350"/><br>
-      <b>Loss Curve</b>
-    </td>
-  </tr>
-  <tr>
-    <td>
-      <img src="reports/metrics_precision_recall_beautiful.png" width="350"/><br>
-      <b>Precision & Recall</b>
-    </td>
-    <td>
-      <img src="reports/metrics_topk_beautiful.png" width="350"/><br>
-      <b>Top-3/Top-5 Accuracy</b>
-    </td>
-  </tr>
-  <tr>
-    <td colspan="2" align="center">
-      <img src="reports/top20_accuracy_beautiful.png" width="400"/><br>
-      <b>Top-20 Accurate Classes</b>
-    </td>
-  </tr>
-</table>
----
-### 🔍 Advanced Evaluation
-<table>
-  <tr>
-    <td>
-      <img src="reports/confused_top20_beautiful.png" width="340"/><br>
-      <b>Top-20 Most Confused Classes</b>
-    </td>
-    <td>
-      <img src="reports/confusion_matrix_beautiful.png" width="340"/><br>
-      <b>Full Confusion Matrix</b>
-    </td>
-  </tr>
-</table>
 ---
-## Metrics & Results
-| Metric                 | Value (example) |
-|------------------------|--------|
-| train_loss             | 0.97   |
-| train_acc              | 0.997  |
-| val_loss               | 1.40   |
-| val_acc                | 0.87   |
-| val_precision_macro    | 0.88   |
-| val_precision_weighted | 0.89   |
-| val_recall_macro       | 0.87   |
-| val_recall_weighted    | 0.87   |
-| val_f1_macro           | 0.87   |
-| val_f1_weighted        | 0.88   |
-| val_top3               | 0.95   |
-| val_top5               | 0.97   |
 ---
 ## 🤗 Hugging Face & Demo
-- **Model on Hugging Face:**
-  *()*
-- **Live Gradio Demo:**
-  *()*
 ---
 ## ⬇️ Download Resources
 - **Stanford Cars 196 Dataset:**
-  [Download directly from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
 - **Trained Model Weights:**
-  (link if available)
-## Usage & Inference
-```python
 import torch
-import timm
 from torchvision import transforms
 from PIL import Image
-import scipy.io
-import os
-# Set dataset and model paths
-EXTRACTED_ROOT = "stanford_cars"  # Change if you extracted elsewhere
-META_PATH = os.path.join(EXTRACTED_ROOT, "car_devkit", "devkit", "cars_meta.mat")
-MODEL_PATH = "models/efficientnetv2_best_model.pth"
-# Load class names directly from Stanford Cars dataset
-meta = scipy.io.loadmat(META_PATH)
-class_names = [x[0] for x in meta['class_names'][0]]
-# Model setup
-NUM_CLASSES = len(class_names)
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-model = timm.create_model('efficientnetv2_rw_s', pretrained=False, num_classes=NUM_CLASSES)
-model.load_state_dict(torch.load(MODEL_PATH, map_location=device))
 model.eval()
-model.to(device)
-# Preprocessing (matches validation)
-imagenet_mean = [0.485, 0.456, 0.406]
-imagenet_std = [0.229, 0.224, 0.225]
 transform = transforms.Compose([
-    transforms.Resize(256),
-    transforms.CenterCrop(224),
     transforms.ToTensor(),
-    transforms.Normalize(mean=imagenet_mean, std=imagenet_std)
 ])
-# Load and preprocess image
 img = Image.open("your_image.jpg").convert("RGB")
-input_tensor = transform(img).unsqueeze(0).to(device)
 # Predict
 with torch.no_grad():
     output = model(input_tensor)
-    pred_idx = output.argmax(1).item()
-print(f"Predicted class: {class_names[pred_idx]} (index: {pred_idx})")

 ---
 license: apache-2.0
 datasets:
 - tanganke/stanford_cars
 language:
 - en
 metrics:
 - accuracy
 base_model:
 - timm/efficientnetv2_rw_s.ra2_in1k
 pipeline_tag: image-classification
 ---
+# 🚗 EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition
+> **EfficientNetV2 Car Classifier** delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM.
+> Developed by kikogazda, 2025.
 ---
+## 📁 Project Structure
+<pre>
+Efficient_NetV2_Edition/
+├── efficientnetv2_best_model.pth      # Best model weights
+├── Last_model.ipynb                   # Full training & evaluation pipeline
+├── class_mapping.json                 # Class index to name mapping
+├── *.csv                              # Logs, splits, labels, and metrics
+├── *.png                              # Visualizations and Grad-CAM outputs
+├── README.md                          # Model card (this file)
+└── ...                                # Additional scripts, reports, and assets
+</pre>
 ---
+## 🚦 Table of Contents
 - [Overview](#overview)
 - [Dataset & Preprocessing](#dataset--preprocessing)
 - [Model Architecture](#model-architecture)
 - [Training Pipeline](#training-pipeline)
+- [Explainability (Grad-CAM)](#explainability-grad-cam)
 - [Visualizations](#visualizations)
 - [Metrics & Results](#metrics--results)
 - [Hugging Face & Demo](#hugging-face--demo)
+- [Download Resources](#download-resources)
 - [Usage & Inference](#usage--inference)
+- [References](#references)
 ---
 ## Overview
+**EfficientNetV2 Car Classifier** tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible.
+**Highlights:**
+- Modern EfficientNetV2 backbone with transfer learning
+- Aggressive, real-world augmentation pipeline
+- Class balancing for rare makes/models
+- Extensive, scriptable metric tracking and reporting
+- End-to-end explainability with Grad-CAM
+- Fully reproducible, robust, and deployment-ready
 ---
 ## Dataset & Preprocessing
 - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
+    - 196 classes, 16,185 images (official train/test split)
+    - Detailed make/model/year for each image
 - **Preprocessing:**
+    - Annotation CSV export and class mapping JSON
+    - Stratified train/val/test split (maintains class distribution)
+    - Outlier cleaning and normalization
+    - Augmentations: random resized crop, flip, rotate, color jitter, blur
+    - ImageNet mean/std normalization
 ---
 ## Model Architecture
+- **Backbone:** EfficientNetV2 (pretrained)
+    - All but the last blocks frozen initially
+    - Custom classifier head for 196 classes (Linear → ReLU → Dropout → Linear)
 - **Optimization:**
+    - Adam optimizer
+    - Cross-Entropy loss (with label smoothing)
+    - Learning rate scheduling (ReduceLROnPlateau)
+    - Early stopping (macro F1 on validation)
     - WeightedRandomSampler for class balance
+**Flow:**
+Input → [Augmentations] → EfficientNetV2 Backbone → Custom Head → Softmax (196 classes)
 ---
 ## Training Pipeline
+- **Epochs:** Up to 25 (early stopping enabled)
+- **Batch Size:** 32 (weighted sampling)
+- **Validation:** Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy
+- **Logging:** All metrics and losses to CSV, plus high-res visual plots:
+    - Accuracy/F1 per epoch
+    - Precision/Recall (macro, weighted)
+    - Loss curve
+    - Top-3/Top-5 accuracy
+- **Artifacts:** All reports, CSVs, and visuals in repo for transparency
 ---
+## Explainability (Grad-CAM)
+Grad-CAM overlays highlight image regions most responsible for model predictions—letting you "see" what the network is using for its decisions.
+- *Why?* Trust, transparency, debugging.
+- *How?* For every prediction, a heatmap overlay shows most influential pixels.
+![GradCAM Example](./gradcam_grid.png)
+*Heatmaps visualize key decision regions for each sample.*
 ---
+## 📊 Visualizations
+Key assets (see repo for all):
+| Visualization                    | Description                                 |
+|-----------------------------------|---------------------------------------------|
+| `confusion_matrix_beautiful.png`  | Confusion matrix (validation set)           |
+| `metrics_acc_f1_beautiful.png`    | Training & validation accuracy/F1 curves    |
+| `metrics_loss_beautiful.png`      | Loss curves                                 |
+| `metrics_precision_recall_beautiful.png` | Precision/Recall by epoch           |
+| `metrics_topk_beautiful.png`      | Top-3/Top-5 accuracy                        |
+| `top20_accuracy_beautiful.png`    | Top-20 class accuracy                       |
+| `gradcam_grid.png`                | Grad-CAM visualizations grid                |
 ---
+## 📈 Metrics & Results
+| Metric                 | Value   |
+|------------------------|---------|
+| train_loss             | 0.97    |
+| train_acc              | 0.997   |
+| val_loss               | 1.40    |
+| val_acc                | 0.87    |
+| val_precision_macro    | 0.89    |
+| val_precision_weighted | 0.89    |
+| val_recall_macro       | 0.87    |
+| val_recall_weighted    | 0.87    |
+| val_f1_macro           | 0.87    |
+| val_f1_weighted        | 0.88    |
+| val_top3               | 0.95    |
+| val_top5               | 0.97    |
 ---
 ## 🤗 Hugging Face & Demo
+**Model on Hugging Face:**
+[EfficientNetV2 Car Classifier on Hugging Face](https://huggingface.co/kikogazda/Efficient_NetV2_Edition)
+**Live Gradio Demo:**
+[Gradio Demo Space](https://kikogazda-efficientnetv2-demo.hf.space)
+[![Model Demo Space](https://img.shields.io/badge/Gradio-Demo-green?logo=gradio)](https://kikogazda-efficientnetv2-demo.hf.space)
+[![HuggingFace Model](https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition)
 ---
 ## ⬇️ Download Resources
 - **Stanford Cars 196 Dataset:**
+  [Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
 - **Trained Model Weights:**
+  [Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth)
+- **Class mapping/metadata:**
+  Included as `class_mapping.json` in this repo
+---
+## 💻 Usage & Inference
+### 1. Install dependencies
+```bash
+pip install -r requirements.txt
+pip install torch torchvision pytorch-grad-cam gradio
 import torch
 from torchvision import transforms
 from PIL import Image
+import json
+from efficientnet_pytorch import EfficientNet
+# Load model
+model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196)
+model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu"))
 model.eval()
+# Preprocess
 transform = transforms.Compose([
+    transforms.Resize((224, 224)),
     transforms.ToTensor(),
+    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
 ])
 img = Image.open("your_image.jpg").convert("RGB")
+input_tensor = transform(img).unsqueeze(0)
 # Predict
 with torch.no_grad():
     output = model(input_tensor)
+    pred = output.argmax(1).item()
+# Class name
+with open("class_mapping.json") as f:
+    class_map = json.load(f)
+print("Predicted class:", class_map[str(pred)])