Update README.md

01fd517 verified 9 months ago

7.29 kB

	---
	license: apache-2.0

	datasets:
	- tanganke/stanford_cars

	language:
	- en

	metrics:
	- accuracy

	base_model:
	- timm/efficientnetv2_rw_s.ra2_in1k

	pipeline_tag: image-classification
	---

	# 🚗 EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition

	> EfficientNetV2 Car Classifier delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM.
	> Developed by kikogazda, 2025.

	---

	## 📁 Project Structure

	<pre>
	Efficient_NetV2_Edition/
	├── efficientnetv2_best_model.pth # Best model weights
	├── Last_model.ipynb # Full training & evaluation pipeline
	├── class_mapping.json # Class index to name mapping
	├── *.csv # Logs, splits, labels, and metrics
	├── *.png # Visualizations and Grad-CAM outputs
	├── README.md # Model card (this file)
	└── ... # Additional scripts, reports, and assets
	</pre>

	---

	## 🚦 Table of Contents

	- [Overview](#overview)
	- [Dataset & Preprocessing](#dataset--preprocessing)
	- [Model Architecture](#model-architecture)
	- [Training Pipeline](#training-pipeline)
	- [Explainability (Grad-CAM)](#explainability-grad-cam)
	- [Visualizations](#visualizations)
	- [Metrics & Results](#metrics--results)
	- [Hugging Face & Demo](#hugging-face--demo)
	- [Download Resources](#download-resources)
	- [Usage & Inference](#usage--inference)

	---

	## Overview

	EfficientNetV2 Car Classifier tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible.
	Highlights:
	- Modern EfficientNetV2 backbone with transfer learning
	- Aggressive, real-world augmentation pipeline
	- Class balancing for rare makes/models
	- Extensive, scriptable metric tracking and reporting
	- End-to-end explainability with Grad-CAM
	- Fully reproducible, robust, and deployment-ready

	---

	## Dataset & Preprocessing

	- Dataset: [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars)
	- 196 classes, 16,185 images (official train/test split)
	- Detailed make/model/year for each image
	- Preprocessing:
	- Annotation CSV export and class mapping JSON
	- Stratified train/val/test split (maintains class distribution)
	- Outlier cleaning and normalization
	- Augmentations: random resized crop, flip, rotate, color jitter, blur
	- ImageNet mean/std normalization

	---

	## Model Architecture

	- Backbone: EfficientNetV2 (pretrained)
	- All but the last blocks frozen initially
	- Custom classifier head for 196 classes (Linear → ReLU → Dropout → Linear)
	- Optimization:
	- Adam optimizer
	- Cross-Entropy loss (with label smoothing)
	- Learning rate scheduling (ReduceLROnPlateau)
	- Early stopping (macro F1 on validation)
	- WeightedRandomSampler for class balance

	Flow:
	Input → [Augmentations] → EfficientNetV2 Backbone → Custom Head → Softmax (196 classes)

	---

	## Training Pipeline

	- Epochs: Up to 25 (early stopping enabled)
	- Batch Size: 32 (weighted sampling)
	- Validation: Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy
	- Logging: All metrics and losses to CSV, plus high-res visual plots:
	- Accuracy/F1 per epoch
	- Precision/Recall (macro, weighted)
	- Loss curve
	- Top-3/Top-5 accuracy
	- Artifacts: All reports, CSVs, and visuals in repo for transparency

	---

	## Explainability (Grad-CAM)

	Grad-CAM overlays highlight image regions most responsible for model predictions—letting you "see" what the network is using for its decisions.
	- Why? Trust, transparency, debugging.
	- How? For every prediction, a heatmap overlay shows most influential pixels.

	![GradCAM Example](./gradcam_grid.png)
	Heatmaps visualize key decision regions for each sample.

	---

	## Visualizations

	Here are key visualizations from the training and evaluation process, including loss curves, accuracy plots, and Grad-CAM++ overlays that illustrate what the model focuses on.


	### 🎯 Accuracy & F1 Score per Epoch
	Visualizing training and validation accuracy alongside macro F1 score.

	![Accuracy and F1](metrics_acc_f1_beautiful.png)

	---

	### 📉 Training vs Validation Loss
	Clear comparison of model learning over time.

	![Loss Curves](metrics_loss_beautiful.png)

	---

	### 📈 Precision & Recall Trends
	Macro and weighted precision/recall for detailed class-wise performance.

	![Precision and Recall](metrics_precision_recall_beautiful.png)

	---

	### 📊 Top-3 and Top-5 Accuracy Over Epochs
	Measuring how often the correct class is within the top predictions.

	![Top-k Accuracy](metrics_topk_beautiful.png)

	---

	### 🏆 Top-20 Most Accurate Classes
	Sorted bar plot of classes the model predicts with the highest accuracy.

	![Top 20 Accuracy](top20_accuracy_beautiful.png)

	---

	### 🧩 Confusion Matrix
	High-resolution heatmap showing misclassifications and accuracy by class.

	![Confusion Matrix](confusion_matrix_beautiful.png)

	---

	## 📈 Metrics & Results

	\| Metric \| Value \|
	\|------------------------\|---------\|
	\| train_loss \| 0.97 \|
	\| train_acc \| 0.997 \|
	\| val_loss \| 1.40 \|
	\| val_acc \| 0.87 \|
	\| val_precision_macro \| 0.89 \|
	\| val_precision_weighted \| 0.89 \|
	\| val_recall_macro \| 0.87 \|
	\| val_recall_weighted \| 0.87 \|
	\| val_f1_macro \| 0.87 \|
	\| val_f1_weighted \| 0.88 \|
	\| val_top3 \| 0.95 \|
	\| val_top5 \| 0.97 \|

	---

	## Hugging Face Demo

	Live Gradio Demo:
	[Click here to launch the demo](https://kikogazda-efficient-netv2.hf.space/)

	---

	## Download Resources

	- Stanford Cars 196 Dataset:
	[Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars)
	- Trained Model Weights:
	[Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth)
	- Class mapping/metadata:
	Included as `class_mapping.json` in this repo

	---

	## Usage & Inference

	### 1. Install dependencies

	```bash
	pip install -r requirements.txt
	pip install torch torchvision pytorch-grad-cam gradio

	import torch
	from torchvision import transforms
	from PIL import Image
	import json
	from efficientnet_pytorch import EfficientNet

	# Load model
	model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196)
	model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu"))
	model.eval()

	# Preprocess
	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	])
	img = Image.open("your_image.jpg").convert("RGB")
	input_tensor = transform(img).unsqueeze(0)

	# Predict
	with torch.no_grad():
	output = model(input_tensor)
	pred = output.argmax(1).item()

	# Class name
	with open("class_mapping.json") as f:
	class_map = json.load(f)
	print("Predicted class:", class_map[str(pred)])