--- license: apache-2.0 datasets: - tanganke/stanford_cars language: - en metrics: - accuracy base_model: - timm/efficientnetv2_rw_s.ra2_in1k pipeline_tag: image-classification --- # π EfficientNetV2 Car Classifier: Fine-Grained Vehicle Recognition > **EfficientNetV2 Car Classifier** delivers robust, fine-grained recognition for 196 car makes and models, powered by EfficientNetV2, state-of-the-art augmentations, rigorous metric tracking, and full visual explainability with Grad-CAM. > Developed by kikogazda, 2025. --- ## π Project Structure
Efficient_NetV2_Edition/ βββ efficientnetv2_best_model.pth # Best model weights βββ Last_model.ipynb # Full training & evaluation pipeline βββ class_mapping.json # Class index to name mapping βββ *.csv # Logs, splits, labels, and metrics βββ *.png # Visualizations and Grad-CAM outputs βββ README.md # Model card (this file) βββ ... # Additional scripts, reports, and assets--- ## π¦ Table of Contents - [Overview](#overview) - [Dataset & Preprocessing](#dataset--preprocessing) - [Model Architecture](#model-architecture) - [Training Pipeline](#training-pipeline) - [Explainability (Grad-CAM)](#explainability-grad-cam) - [Visualizations](#visualizations) - [Metrics & Results](#metrics--results) - [Hugging Face & Demo](#hugging-face--demo) - [Download Resources](#download-resources) - [Usage & Inference](#usage--inference) --- ## Overview **EfficientNetV2 Car Classifier** tackles the real-world challenge of distinguishing between 196 car makes and models, even when differences are nearly imperceptible. **Highlights:** - Modern EfficientNetV2 backbone with transfer learning - Aggressive, real-world augmentation pipeline - Class balancing for rare makes/models - Extensive, scriptable metric tracking and reporting - End-to-end explainability with Grad-CAM - Fully reproducible, robust, and deployment-ready --- ## Dataset & Preprocessing - **Dataset:** [Stanford Cars 196](https://huggingface.co/datasets/tanganke/stanford_cars) - 196 classes, 16,185 images (official train/test split) - Detailed make/model/year for each image - **Preprocessing:** - Annotation CSV export and class mapping JSON - Stratified train/val/test split (maintains class distribution) - Outlier cleaning and normalization - Augmentations: random resized crop, flip, rotate, color jitter, blur - ImageNet mean/std normalization --- ## Model Architecture - **Backbone:** EfficientNetV2 (pretrained) - All but the last blocks frozen initially - Custom classifier head for 196 classes (Linear β ReLU β Dropout β Linear) - **Optimization:** - Adam optimizer - Cross-Entropy loss (with label smoothing) - Learning rate scheduling (ReduceLROnPlateau) - Early stopping (macro F1 on validation) - WeightedRandomSampler for class balance **Flow:** Input β [Augmentations] β EfficientNetV2 Backbone β Custom Head β Softmax (196 classes) --- ## Training Pipeline - **Epochs:** Up to 25 (early stopping enabled) - **Batch Size:** 32 (weighted sampling) - **Validation:** Macro/micro metrics, confusion matrix, Top-3/Top-5 accuracy - **Logging:** All metrics and losses to CSV, plus high-res visual plots: - Accuracy/F1 per epoch - Precision/Recall (macro, weighted) - Loss curve - Top-3/Top-5 accuracy - **Artifacts:** All reports, CSVs, and visuals in repo for transparency --- ## Explainability (Grad-CAM) Grad-CAM overlays highlight image regions most responsible for model predictionsβletting you "see" what the network is using for its decisions. - *Why?* Trust, transparency, debugging. - *How?* For every prediction, a heatmap overlay shows most influential pixels.  *Heatmaps visualize key decision regions for each sample.* --- ## Visualizations Here are key visualizations from the training and evaluation process, including loss curves, accuracy plots, and Grad-CAM++ overlays that illustrate what the model focuses on. ### π― Accuracy & F1 Score per Epoch Visualizing training and validation accuracy alongside macro F1 score.  --- ### π Training vs Validation Loss Clear comparison of model learning over time.  --- ### π Precision & Recall Trends Macro and weighted precision/recall for detailed class-wise performance.  --- ### π Top-3 and Top-5 Accuracy Over Epochs Measuring how often the correct class is within the top predictions.  --- ### π Top-20 Most Accurate Classes Sorted bar plot of classes the model predicts with the highest accuracy.  --- ### π§© Confusion Matrix High-resolution heatmap showing misclassifications and accuracy by class.  --- ## π Metrics & Results | Metric | Value | |------------------------|---------| | train_loss | 0.97 | | train_acc | 0.997 | | val_loss | 1.40 | | val_acc | 0.87 | | val_precision_macro | 0.89 | | val_precision_weighted | 0.89 | | val_recall_macro | 0.87 | | val_recall_weighted | 0.87 | | val_f1_macro | 0.87 | | val_f1_weighted | 0.88 | | val_top3 | 0.95 | | val_top5 | 0.97 | --- ## Hugging Face Demo **Live Gradio Demo:** [Click here to launch the demo](https://kikogazda-efficient-netv2.hf.space/) --- ## Download Resources - **Stanford Cars 196 Dataset:** [Direct download from Stanford](https://huggingface.co/datasets/tanganke/stanford_cars) - **Trained Model Weights:** [Download from Hugging Face (efficientnetv2_best_model.pth)](https://huggingface.co/kikogazda/Efficient_NetV2_Edition/resolve/main/efficientnetv2_best_model.pth) - **Class mapping/metadata:** Included as `class_mapping.json` in this repo --- ## Usage & Inference ### 1. Install dependencies ```bash pip install -r requirements.txt pip install torch torchvision pytorch-grad-cam gradio import torch from torchvision import transforms from PIL import Image import json from efficientnet_pytorch import EfficientNet # Load model model = EfficientNet.from_pretrained('efficientnet-b2', num_classes=196) model.load_state_dict(torch.load("efficientnetv2_best_model.pth", map_location="cpu")) model.eval() # Preprocess transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) img = Image.open("your_image.jpg").convert("RGB") input_tensor = transform(img).unsqueeze(0) # Predict with torch.no_grad(): output = model(input_tensor) pred = output.argmax(1).item() # Class name with open("class_mapping.json") as f: class_map = json.load(f) print("Predicted class:", class_map[str(pred)])