File size: 7,285 Bytes

---
license: apache-2.0

datasets:
- tanganke/stanford_cars

language:
- en

metrics:
- accuracy

base_model:
- timm/efficientnetv2_rw_s.ra2_in1k

pipeline_tag: image-classification
---

# 🚗 EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition

> **EfficientNetV2 Car Classifier** delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM.  
> Developed by kikogazda, 2025.

---

## 📁 Project Structure

<pre>
Efficient_NetV2_Edition/
├── efficientnetv2_best_model.pth      # Best model weights
├── Last_model.ipynb                   # Full training & evaluation pipeline
├── class_mapping.json                 # Class index to name mapping
├── *.csv                              # Logs, splits, labels, and metrics
├── *.png                              # Visualizations and Grad-CAM outputs
├── README.md                          # Model card (this file)
└── ...                                # Additional scripts, reports, and assets
</pre>

---

## 🚦 Table of Contents

- [Overview](#overview)
- [Dataset & Preprocessing](#dataset--preprocessing)
- [Model Architecture](#model-architecture)
- [Training Pipeline](#training-pipeline)
- [Explainability (Grad-CAM)](#explainability-grad-cam)
- [Visualizations](#visualizations)
- [Metrics & Results](#metrics--results)
- [Hugging Face & Demo](#hugging-face--demo)
- [Download Resources](#download-resources)
- [Usage & Inference](#usage--inference)

---

## Overview

**EfficientNetV2 Car Classifier** tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible.  
**Highlights:**
- Modern EfficientNetV2 backbone with transfer learning
- Aggressive, real-world augmentation pipeline
- Class balancing for rare makes/models
- Extensive, scriptable metric tracking and reporting
- End-to-end explainability with Grad-CAM
- Fully reproducible, robust, and deployment-ready

---

## Dataset & Preprocessing

- **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
    - 196 classes, 16,185 images (official train/test split)
    - Detailed make/model/year for each image
- **Preprocessing:**
    - Annotation CSV export and class mapping JSON
    - Stratified train/val/test split (maintains class distribution)
    - Outlier cleaning and normalization
    - Augmentations: random resized crop, flip, rotate, color jitter, blur
    - ImageNet mean/std normalization

---

## Model Architecture

- **Backbone:** EfficientNetV2 (pretrained)
    - All but the last blocks frozen initially
    - Custom classifier head for 196 classes (Linear → ReLU → Dropout → Linear)
- **Optimization:**
    - Adam optimizer
    - Cross-Entropy loss (with label smoothing)
    - Learning rate scheduling (ReduceLROnPlateau)
    - Early stopping (macro F1 on validation)
    - WeightedRandomSampler for class balance

**Flow:**  
Input → [Augmentations] → EfficientNetV2 Backbone → Custom Head → Softmax (196 classes)

---

## Training Pipeline

- **Epochs:** Up to 25 (early stopping enabled)
- **Batch Size:** 32 (weighted sampling)
- **Validation:** Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy
- **Logging:** All metrics and losses to CSV, plus high-res visual plots:
    - Accuracy/F1 per epoch
    - Precision/Recall (macro, weighted)
    - Loss curve
    - Top-3/Top-5 accuracy
- **Artifacts:** All reports, CSVs, and visuals in repo for transparency

---

## Explainability (Grad-CAM)

Grad-CAM overlays highlight image regions most responsible for model predictions—letting you "see" what the network is using for its decisions.
- *Why?* Trust, transparency, debugging.
- *How?* For every prediction, a heatmap overlay shows most influential pixels.

![GradCAM Example](./gradcam_grid.png)  
*Heatmaps visualize key decision regions for each sample.*

---

## Visualizations 

Here are key visualizations from the training and evaluation process, including loss curves, accuracy plots, and Grad-CAM++ overlays that illustrate what the model focuses on.


### 🎯 Accuracy & F1 Score per Epoch
Visualizing training and validation accuracy alongside macro F1 score.

![Accuracy and F1](metrics_acc_f1_beautiful.png)

---

### 📉 Training vs Validation Loss
Clear comparison of model learning over time.

![Loss Curves](metrics_loss_beautiful.png)

---

### 📈 Precision & Recall Trends
Macro and weighted precision/recall for detailed class-wise performance.

![Precision and Recall](metrics_precision_recall_beautiful.png)

---

### 📊 Top-3 and Top-5 Accuracy Over Epochs
Measuring how often the correct class is within the top predictions.

![Top-k Accuracy](metrics_topk_beautiful.png)

---

### 🏆 Top-20 Most Accurate Classes
Sorted bar plot of classes the model predicts with the highest accuracy.

![Top 20 Accuracy](top20_accuracy_beautiful.png)

---

### 🧩 Confusion Matrix
High-resolution heatmap showing misclassifications and accuracy by class.

![Confusion Matrix](confusion_matrix_beautiful.png)

---

## 📈 Metrics & Results

| Metric                 | Value   |
|------------------------|---------|
| train_loss             | 0.97    |
| train_acc              | 0.997   |
| val_loss               | 1.40    |
| val_acc                | 0.87    |
| val_precision_macro    | 0.89    |
| val_precision_weighted | 0.89    |
| val_recall_macro       | 0.87    |
| val_recall_weighted    | 0.87    |
| val_f1_macro           | 0.87    |
| val_f1_weighted        | 0.88    |
| val_top3               | 0.95    |
| val_top5               | 0.97    |

---

## Hugging Face Demo

**Live Gradio Demo:**  
[Click here to launch the demo](https://kikogazda-efficient-netv2.hf.space/)

---

## Download Resources

- **Stanford Cars 196 Dataset:**  
  [Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
- **Trained Model Weights:**  
  [Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth)
- **Class mapping/metadata:**  
  Included as `class_mapping.json` in this repo

---

## Usage & Inference

### 1. Install dependencies

```bash
pip install -r requirements.txt
pip install torch torchvision pytorch-grad-cam gradio

import torch
from torchvision import transforms
from PIL import Image
import json
from efficientnet_pytorch import EfficientNet

# Load model
model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196)
model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu"))
model.eval()

# Preprocess
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
img = Image.open("your_image.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)

# Predict
with torch.no_grad():
    output = model(input_tensor)
    pred = output.argmax(1).item()

# Class name
with open("class_mapping.json") as f:
    class_map = json.load(f)
print("Predicted class:", class_map[str(pred)])