kikogazda's picture
Update README.md
01fd517 verified
---
license: apache-2.0
datasets:
- tanganke/stanford_cars
language:
- en
metrics:
- accuracy
base_model:
- timm/efficientnetv2_rw_s.ra2_in1k
pipeline_tag: image-classification
---
# πŸš— EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition
> **EfficientNetV2 Car Classifier** delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM.
> Developed by kikogazda, 2025.
---
## πŸ“ Project Structure
<pre>
Efficient_NetV2_Edition/
β”œβ”€β”€ efficientnetv2_best_model.pth # Best model weights
β”œβ”€β”€ Last_model.ipynb # Full training & evaluation pipeline
β”œβ”€β”€ class_mapping.json # Class index to name mapping
β”œβ”€β”€ *.csv # Logs, splits, labels, and metrics
β”œβ”€β”€ *.png # Visualizations and Grad-CAM outputs
β”œβ”€β”€ README.md # Model card (this file)
└── ... # Additional scripts, reports, and assets
</pre>
---
## 🚦 Table of Contents
- [Overview](#overview)
- [Dataset & Preprocessing](#dataset--preprocessing)
- [Model Architecture](#model-architecture)
- [Training Pipeline](#training-pipeline)
- [Explainability (Grad-CAM)](#explainability-grad-cam)
- [Visualizations](#visualizations)
- [Metrics & Results](#metrics--results)
- [Hugging Face & Demo](#hugging-face--demo)
- [Download Resources](#download-resources)
- [Usage & Inference](#usage--inference)
---
## Overview
**EfficientNetV2 Car Classifier** tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible.
**Highlights:**
- Modern EfficientNetV2 backbone with transfer learning
- Aggressive, real-world augmentation pipeline
- Class balancing for rare makes/models
- Extensive, scriptable metric tracking and reporting
- End-to-end explainability with Grad-CAM
- Fully reproducible, robust, and deployment-ready
---
## Dataset & Preprocessing
- **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
- 196 classes, 16,185 images (official train/test split)
- Detailed make/model/year for each image
- **Preprocessing:**
- Annotation CSV export and class mapping JSON
- Stratified train/val/test split (maintains class distribution)
- Outlier cleaning and normalization
- Augmentations: random resized crop, flip, rotate, color jitter, blur
- ImageNet mean/std normalization
---
## Model Architecture
- **Backbone:** EfficientNetV2 (pretrained)
- All but the last blocks frozen initially
- Custom classifier head for 196 classes (Linear β†’ ReLU β†’ Dropout β†’ Linear)
- **Optimization:**
- Adam optimizer
- Cross-Entropy loss (with label smoothing)
- Learning rate scheduling (ReduceLROnPlateau)
- Early stopping (macro F1 on validation)
- WeightedRandomSampler for class balance
**Flow:**
Input β†’ [Augmentations] β†’ EfficientNetV2 Backbone β†’ Custom Head β†’ Softmax (196 classes)
---
## Training Pipeline
- **Epochs:** Up to 25 (early stopping enabled)
- **Batch Size:** 32 (weighted sampling)
- **Validation:** Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy
- **Logging:** All metrics and losses to CSV, plus high-res visual plots:
- Accuracy/F1 per epoch
- Precision/Recall (macro, weighted)
- Loss curve
- Top-3/Top-5 accuracy
- **Artifacts:** All reports, CSVs, and visuals in repo for transparency
---
## Explainability (Grad-CAM)
Grad-CAM overlays highlight image regions most responsible for model predictionsβ€”letting you "see" what the network is using for its decisions.
- *Why?* Trust, transparency, debugging.
- *How?* For every prediction, a heatmap overlay shows most influential pixels.
![GradCAM Example](./gradcam_grid.png)
*Heatmaps visualize key decision regions for each sample.*
---
## Visualizations
Here are key visualizations from the training and evaluation process, including loss curves, accuracy plots, and Grad-CAM++ overlays that illustrate what the model focuses on.
### 🎯 Accuracy & F1 Score per Epoch
Visualizing training and validation accuracy alongside macro F1 score.
![Accuracy and F1](metrics_acc_f1_beautiful.png)
---
### πŸ“‰ Training vs Validation Loss
Clear comparison of model learning over time.
![Loss Curves](metrics_loss_beautiful.png)
---
### πŸ“ˆ Precision & Recall Trends
Macro and weighted precision/recall for detailed class-wise performance.
![Precision and Recall](metrics_precision_recall_beautiful.png)
---
### πŸ“Š Top-3 and Top-5 Accuracy Over Epochs
Measuring how often the correct class is within the top predictions.
![Top-k Accuracy](metrics_topk_beautiful.png)
---
### πŸ† Top-20 Most Accurate Classes
Sorted bar plot of classes the model predicts with the highest accuracy.
![Top 20 Accuracy](top20_accuracy_beautiful.png)
---
### 🧩 Confusion Matrix
High-resolution heatmap showing misclassifications and accuracy by class.
![Confusion Matrix](confusion_matrix_beautiful.png)
---
## πŸ“ˆ Metrics & Results
| Metric | Value |
|------------------------|---------|
| train_loss | 0.97 |
| train_acc | 0.997 |
| val_loss | 1.40 |
| val_acc | 0.87 |
| val_precision_macro | 0.89 |
| val_precision_weighted | 0.89 |
| val_recall_macro | 0.87 |
| val_recall_weighted | 0.87 |
| val_f1_macro | 0.87 |
| val_f1_weighted | 0.88 |
| val_top3 | 0.95 |
| val_top5 | 0.97 |
---
## Hugging Face Demo
**Live Gradio Demo:**
[Click here to launch the demo](https://kikogazda-efficient-netv2.hf.space/)
---
## Download Resources
- **Stanford Cars 196 Dataset:**
[Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
- **Trained Model Weights:**
[Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth)
- **Class mapping/metadata:**
Included as `class_mapping.json` in this repo
---
## Usage & Inference
### 1. Install dependencies
```bash
pip install -r requirements.txt
pip install torch torchvision pytorch-grad-cam gradio
import torch
from torchvision import transforms
from PIL import Image
import json
from efficientnet_pytorch import EfficientNet
# Load model
model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196)
model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu"))
model.eval()
# Preprocess
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
img = Image.open("your_image.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)
# Predict
with torch.no_grad():
output = model(input_tensor)
pred = output.argmax(1).item()
# Class name
with open("class_mapping.json") as f:
class_map = json.load(f)
print("Predicted class:", class_map[str(pred)])