| | --- |
| | license: apache-2.0 |
| |
|
| | datasets: |
| | - tanganke/stanford_cars |
| |
|
| | language: |
| | - en |
| |
|
| | metrics: |
| | - accuracy |
| |
|
| | base_model: |
| | - timm/efficientnetv2_rw_s.ra2_in1k |
| |
|
| | pipeline_tag: image-classification |
| | --- |
| | |
| | # π EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition |
| |
|
| | > **EfficientNetV2 Car Classifier** delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM. |
| | > Developed by kikogazda, 2025. |
| |
|
| | --- |
| |
|
| | ## π Project Structure |
| |
|
| | <pre> |
| | Efficient_NetV2_Edition/ |
| | βββ efficientnetv2_best_model.pth # Best model weights |
| | βββ Last_model.ipynb # Full training & evaluation pipeline |
| | βββ class_mapping.json # Class index to name mapping |
| | βββ *.csv # Logs, splits, labels, and metrics |
| | βββ *.png # Visualizations and Grad-CAM outputs |
| | βββ README.md # Model card (this file) |
| | βββ ... # Additional scripts, reports, and assets |
| | </pre> |
| |
|
| | --- |
| |
|
| | ## π¦ Table of Contents |
| |
|
| | - [Overview](#overview) |
| | - [Dataset & Preprocessing](#dataset--preprocessing) |
| | - [Model Architecture](#model-architecture) |
| | - [Training Pipeline](#training-pipeline) |
| | - [Explainability (Grad-CAM)](#explainability-grad-cam) |
| | - [Visualizations](#visualizations) |
| | - [Metrics & Results](#metrics--results) |
| | - [Hugging Face & Demo](#hugging-face--demo) |
| | - [Download Resources](#download-resources) |
| | - [Usage & Inference](#usage--inference) |
| |
|
| | --- |
| |
|
| | ## Overview |
| |
|
| | **EfficientNetV2 Car Classifier** tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible. |
| | **Highlights:** |
| | - Modern EfficientNetV2 backbone with transfer learning |
| | - Aggressive, real-world augmentation pipeline |
| | - Class balancing for rare makes/models |
| | - Extensive, scriptable metric tracking and reporting |
| | - End-to-end explainability with Grad-CAM |
| | - Fully reproducible, robust, and deployment-ready |
| |
|
| | --- |
| |
|
| | ## Dataset & Preprocessing |
| |
|
| | - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars) |
| | - 196 classes, 16,185 images (official train/test split) |
| | - Detailed make/model/year for each image |
| | - **Preprocessing:** |
| | - Annotation CSV export and class mapping JSON |
| | - Stratified train/val/test split (maintains class distribution) |
| | - Outlier cleaning and normalization |
| | - Augmentations: random resized crop, flip, rotate, color jitter, blur |
| | - ImageNet mean/std normalization |
| |
|
| | --- |
| |
|
| | ## Model Architecture |
| |
|
| | - **Backbone:** EfficientNetV2 (pretrained) |
| | - All but the last blocks frozen initially |
| | - Custom classifier head for 196 classes (Linear β ReLU β Dropout β Linear) |
| | - **Optimization:** |
| | - Adam optimizer |
| | - Cross-Entropy loss (with label smoothing) |
| | - Learning rate scheduling (ReduceLROnPlateau) |
| | - Early stopping (macro F1 on validation) |
| | - WeightedRandomSampler for class balance |
| |
|
| | **Flow:** |
| | Input β [Augmentations] β EfficientNetV2 Backbone β Custom Head β Softmax (196 classes) |
| |
|
| | --- |
| |
|
| | ## Training Pipeline |
| |
|
| | - **Epochs:** Up to 25 (early stopping enabled) |
| | - **Batch Size:** 32 (weighted sampling) |
| | - **Validation:** Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy |
| | - **Logging:** All metrics and losses to CSV, plus high-res visual plots: |
| | - Accuracy/F1 per epoch |
| | - Precision/Recall (macro, weighted) |
| | - Loss curve |
| | - Top-3/Top-5 accuracy |
| | - **Artifacts:** All reports, CSVs, and visuals in repo for transparency |
| |
|
| | --- |
| |
|
| | ## Explainability (Grad-CAM) |
| |
|
| | Grad-CAM overlays highlight image regions most responsible for model predictionsβletting you "see" what the network is using for its decisions. |
| | - *Why?* Trust, transparency, debugging. |
| | - *How?* For every prediction, a heatmap overlay shows most influential pixels. |
| |
|
| |  |
| | *Heatmaps visualize key decision regions for each sample.* |
| |
|
| | --- |
| |
|
| | ## Visualizations |
| |
|
| | Here are key visualizations from the training and evaluation process, including loss curves, accuracy plots, and Grad-CAM++ overlays that illustrate what the model focuses on. |
| |
|
| |
|
| | ### π― Accuracy & F1 Score per Epoch |
| | Visualizing training and validation accuracy alongside macro F1 score. |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ### π Training vs Validation Loss |
| | Clear comparison of model learning over time. |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ### π Precision & Recall Trends |
| | Macro and weighted precision/recall for detailed class-wise performance. |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ### π Top-3 and Top-5 Accuracy Over Epochs |
| | Measuring how often the correct class is within the top predictions. |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ### π Top-20 Most Accurate Classes |
| | Sorted bar plot of classes the model predicts with the highest accuracy. |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ### π§© Confusion Matrix |
| | High-resolution heatmap showing misclassifications and accuracy by class. |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | ## π Metrics & Results |
| |
|
| | | Metric | Value | |
| | |------------------------|---------| |
| | | train_loss | 0.97 | |
| | | train_acc | 0.997 | |
| | | val_loss | 1.40 | |
| | | val_acc | 0.87 | |
| | | val_precision_macro | 0.89 | |
| | | val_precision_weighted | 0.89 | |
| | | val_recall_macro | 0.87 | |
| | | val_recall_weighted | 0.87 | |
| | | val_f1_macro | 0.87 | |
| | | val_f1_weighted | 0.88 | |
| | | val_top3 | 0.95 | |
| | | val_top5 | 0.97 | |
| |
|
| | --- |
| |
|
| | ## Hugging Face Demo |
| |
|
| | **Live Gradio Demo:** |
| | [Click here to launch the demo](https://kikogazda-efficient-netv2.hf.space/) |
| |
|
| | --- |
| |
|
| | ## Download Resources |
| |
|
| | - **Stanford Cars 196 Dataset:** |
| | [Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars) |
| | - **Trained Model Weights:** |
| | [Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth) |
| | - **Class mapping/metadata:** |
| | Included as `class_mapping.json` in this repo |
| |
|
| | --- |
| |
|
| | ## Usage & Inference |
| |
|
| | ### 1. Install dependencies |
| |
|
| | ```bash |
| | pip install -r requirements.txt |
| | pip install torch torchvision pytorch-grad-cam gradio |
| | |
| | import torch |
| | from torchvision import transforms |
| | from PIL import Image |
| | import json |
| | from efficientnet_pytorch import EfficientNet |
| | |
| | # Load model |
| | model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196) |
| | model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu")) |
| | model.eval() |
| | |
| | # Preprocess |
| | transform = transforms.Compose([ |
| | transforms.Resize((224, 224)), |
| | transforms.ToTensor(), |
| | transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) |
| | ]) |
| | img = Image.open("your_image.jpg").convert("RGB") |
| | input_tensor = transform(img).unsqueeze(0) |
| | |
| | # Predict |
| | with torch.no_grad(): |
| | output = model(input_tensor) |
| | pred = output.argmax(1).item() |
| | |
| | # Class name |
| | with open("class_mapping.json") as f: |
| | class_map = json.load(f) |
| | print("Predicted class:", class_map[str(pred)]) |