--- title: CIFAR-100 Image Classifier emoji: 🎯 colorFrom: blue colorTo: purple sdk: gradio sdk_version: "5.49.1" app_file: app.py pinned: false license: mit --- # CIFAR-100 ResNet Training from Scratch A ResNet-34 model trained from scratch on CIFAR-100 dataset, achieving **76.68% top-1 accuracy** in 100 epochs with OneCycle Learning Rate scheduling. ## Project Overview This project demonstrates training a ResNet architecture from scratch on the CIFAR-100 dataset without using any pre-trained models. The implementation leverages modern deep learning techniques including data augmentation, OneCycle LR scheduling, and mixed precision training. ## Results Summary ### Performance Metrics (100 Epochs) | Metric | Score | |--------|-------| | **Top-1 Accuracy** | **76.68%** βœ… (Target: 73%) | | **Top-3 Accuracy** | **90.95%** | | **Top-5 Accuracy** | **94.07%** | | **Best Test Accuracy** | **76.79%** (Epoch 99) | | **Macro F1-Score** | **0.7670** | | **Weighted F1-Score** | **0.7668** | ### Averaged Metrics **Macro-Averaged (unweighted):** - Precision: 0.7708 - Recall: 0.7668 - F1-Score: 0.7670 **Weighted-Averaged (by class support):** - Precision: 0.7708 - Recall: 0.7668 - F1-Score: 0.7668 ## Training Configuration ### Model Architecture **Custom Lightweight ResNet for CIFAR-100** A specially designed ResNet variant optimized for small image classification: ``` Model: ResNet34 (CIFAR-optimized) Total Parameters: 4,949,412 (~5M) Trainable Parameters: 4,949,412 Input Size: 32Γ—32Γ—3 (RGB) Output Classes: 100 ``` **Architecture Details** (from `model_cifar.py`): ### Layer-by-Layer Feature Map Progression | Layer | Operation | Kernel | Stride | Padding | Input Size | Output Size | Channels | Receptive Field | |-------|-----------|--------|--------|---------|------------|-------------|----------|-----------------| | **Input** | - | - | - | - | 32Γ—32 | 32Γ—32 | 3 | 1Γ—1 | | **conv1** | Conv2d | 3Γ—3 | 1 | 1 | 32Γ—32Γ—3 | 32Γ—32Γ—64 | 64 | **3Γ—3** | | **bn1+relu** | BN+ReLU | - | - | - | 32Γ—32Γ—64 | 32Γ—32Γ—64 | 64 | 3Γ—3 | | **layer1** | BasicBlock | 3Γ—3,3Γ—3 | 1,1 | 1,1 | 32Γ—32Γ—64 | **32Γ—32Γ—64** | 64 | **7Γ—7** | | **layer2** | BasicBlock | 3Γ—3,3Γ—3 | 2,1 | 1,1 | 32Γ—32Γ—64 | **16Γ—16Γ—128** | 128 | **15Γ—15** | | **layer3** | BasicBlock | 3Γ—3,3Γ—3 | 2,1 | 1,1 | 16Γ—16Γ—128 | **8Γ—8Γ—256** | 256 | **31Γ—31** | | **layer4** | BasicBlock | 3Γ—3,3Γ—3 | 2,1 | 1,1 | 8Γ—8Γ—256 | **4Γ—4Γ—512** | 512 | **63Γ—63** | | **avgpool** | AdaptiveAvgPool2d | 4Γ—4 | - | - | 4Γ—4Γ—512 | 1Γ—1Γ—512 | 512 | **Full image** | | **fc** | Linear | - | - | - | 512 | 100 | 100 | - | **Key Observations:** - **Receptive field at layer4**: 63Γ—63 pixels (covers **full 32Γ—32 image** with 2Γ— margin) - **Spatial downsampling**: 3 stride-2 operations reduce 32Γ—32 β†’ 4Γ—4 (8Γ— reduction) - **Channel expansion**: 3 β†’ 64 β†’ 128 β†’ 256 β†’ 512 (progressive feature richness) - **Feature map efficiency**: No information loss from MaxPooling (common in ImageNet models) ### Detailed Architecture Components 1. **Initial Convolution Block** ``` Input: 32Γ—32Γ—3 β†’ Conv2d(3β†’64, k=3Γ—3, s=1, p=1) β†’ BN β†’ ReLU β†’ Output: 32Γ—32Γ—64 Receptive Field: 1Γ—1 β†’ 3Γ—3 ``` - CIFAR-optimized: 3Γ—3 conv (not 7Γ—7 like ImageNet ResNets) - Preserves spatial resolution (no stride-2 or MaxPool) - Captures fine-grained details essential for small images 2. **Layer 1: Residual Stage 1** (64 channels, no downsampling) ``` Input: 32Γ—32Γ—64 BasicBlock: β”œβ”€ Conv(64β†’64, k=3Γ—3, s=1, p=1) β†’ BN β†’ ReLU β†’ 32Γ—32Γ—64 β”œβ”€ Conv(64β†’64, k=3Γ—3, s=1, p=1) β†’ BN β†’ 32Γ—32Γ—64 └─ Add(identity) β†’ ReLU β†’ Output: 32Γ—32Γ—64 Receptive Field: 3Γ—3 β†’ 7Γ—7 ``` - No spatial downsampling (stride=1) - Identity skip connection (no projection needed) - RF grows by 4 pixels (2 conv layers Γ— 2 pixels each) 3. **Layer 2: Residual Stage 2** (128 channels, downsample) ``` Input: 32Γ—32Γ—64 BasicBlock: β”œβ”€ Conv(64β†’128, k=3Γ—3, s=2, p=1) β†’ BN β†’ ReLU β†’ 16Γ—16Γ—128 β”œβ”€ Conv(128β†’128, k=3Γ—3, s=1, p=1) β†’ BN β†’ 16Γ—16Γ—128 β”œβ”€ Skip: Conv(64β†’128, k=1Γ—1, s=2) β†’ BN β†’ 16Γ—16Γ—128 (projection) └─ Add(skip) β†’ ReLU β†’ Output: 16Γ—16Γ—128 Receptive Field: 7Γ—7 β†’ 15Γ—15 ``` - **Spatial downsampling**: 32Γ—32 β†’ 16Γ—16 (stride=2 in first conv) - **Channel expansion**: 64 β†’ 128 - **Projection shortcut**: 1Γ—1 conv matches dimensions - RF doubles due to stride-2 convolution 4. **Layer 3: Residual Stage 3** (256 channels, downsample) ``` Input: 16Γ—16Γ—128 BasicBlock: β”œβ”€ Conv(128β†’256, k=3Γ—3, s=2, p=1) β†’ BN β†’ ReLU β†’ 8Γ—8Γ—256 β”œβ”€ Conv(256β†’256, k=3Γ—3, s=1, p=1) β†’ BN β†’ 8Γ—8Γ—256 β”œβ”€ Skip: Conv(128β†’256, k=1Γ—1, s=2) β†’ BN β†’ 8Γ—8Γ—256 (projection) └─ Add(skip) β†’ ReLU β†’ Output: 8Γ—8Γ—256 Receptive Field: 15Γ—15 β†’ 31Γ—31 ``` - **Spatial downsampling**: 16Γ—16 β†’ 8Γ—8 - **Channel expansion**: 128 β†’ 256 - RF now covers most of the input image 5. **Layer 4: Residual Stage 4** (512 channels, downsample) ``` Input: 8Γ—8Γ—256 BasicBlock: β”œβ”€ Conv(256β†’512, k=3Γ—3, s=2, p=1) β†’ BN β†’ ReLU β†’ 4Γ—4Γ—512 β”œβ”€ Conv(512β†’512, k=3Γ—3, s=1, p=1) β†’ BN β†’ 4Γ—4Γ—512 β”œβ”€ Skip: Conv(256β†’512, k=1Γ—1, s=2) β†’ BN β†’ 4Γ—4Γ—512 (projection) └─ Add(skip) β†’ ReLU β†’ Output: 4Γ—4Γ—512 Receptive Field: 31Γ—31 β†’ 63Γ—63 ``` - **Final spatial downsampling**: 8Γ—8 β†’ 4Γ—4 - **Maximum channels**: 512 (highest feature richness) - **RF exceeds input size**: 63Γ—63 > 32Γ—32 (full image context) 6. **Classification Head** ``` Input: 4Γ—4Γ—512 β”œβ”€ AdaptiveAvgPool2d((1,1)) β†’ 1Γ—1Γ—512 (global spatial pooling) β”œβ”€ Flatten β†’ 512 └─ Linear(512 β†’ 100) β†’ 100 class logits ``` - Global Average Pooling: Each of 512 channels β†’ single value - Reduces overfitting vs fully-connected layers - Translation invariant features 7. **Initialization Strategy** - **Kaiming (He) Normal** for Conv2d weights - Optimal for ReLU activations - `std = sqrt(2 / fan_in)` - **Constant initialization** for BatchNorm - weight = 1, bias = 0 ### Architecture Flow Diagram ``` Input Image (32Γ—32Γ—3, RF=1Γ—1) ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ STEM: Conv 3Γ—3 β†’ BN β†’ ReLU β”‚ β”‚ Output: 32Γ—32Γ—64, RF=3Γ—3 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ STAGE 1: BasicBlock (64 channels, stride=1) β”‚ β”‚ Conv 3Γ—3 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLU β”‚ β”‚ Output: 32Γ—32Γ—64, RF=7Γ—7 β”‚ β”‚ Skip: Identity (no projection needed) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ [Spatial: 32Γ—32, Channels: 64, RF: 7Γ—7] β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ STAGE 2: BasicBlock (128 channels, stride=2) ↓↓ β”‚ β”‚ Conv 3Γ—3,s2 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLUβ”‚ β”‚ Output: 16Γ—16Γ—128, RF=15Γ—15 β”‚ β”‚ Skip: Conv 1Γ—1,s2 (projection: 64β†’128) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ [Spatial: 16Γ—16, Channels: 128, RF: 15Γ—15] β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ STAGE 3: BasicBlock (256 channels, stride=2) ↓↓ β”‚ β”‚ Conv 3Γ—3,s2 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLUβ”‚ β”‚ Output: 8Γ—8Γ—256, RF=31Γ—31 β”‚ β”‚ Skip: Conv 1Γ—1,s2 (projection: 128β†’256) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ [Spatial: 8Γ—8, Channels: 256, RF: 31Γ—31] β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ STAGE 4: BasicBlock (512 channels, stride=2) ↓↓ β”‚ β”‚ Conv 3Γ—3,s2 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLUβ”‚ β”‚ Output: 4Γ—4Γ—512, RF=63Γ—63 (exceeds 32Γ—32!) β”‚ β”‚ Skip: Conv 1Γ—1,s2 (projection: 256β†’512) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ [Spatial: 4Γ—4, Channels: 512, RF: Full Image] β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ HEAD: Global Average Pooling β†’ FC β”‚ β”‚ AdaptiveAvgPool2d(1,1) β†’ Flatten β†’ Linear(512β†’100) β”‚ β”‚ Output: 100 class logits β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ Predictions (100 classes) ``` **Key Design Choices:** - βœ… **CIFAR-specific stem**: 3Γ—3 conv instead of 7Γ—7 (ImageNet-style) - βœ… **No aggressive downsampling**: Preserves spatial information for 32Γ—32 images - βœ… **Lightweight**: 1 block per stage instead of [3,4,6,3] for efficient training - βœ… **Residual connections**: Enable gradient flow for deeper networks - βœ… **Global Average Pooling**: Reduces overfitting vs fully-connected layers - βœ… **Progressive RF growth**: Each layer sees more context (7β†’15β†’31β†’63 pixels) ### Training Hyperparameters ```python Epochs: 100 Batch Size: 512 Optimizer: SGD with Nesterov momentum Momentum: 0.9 Weight Decay: 1e-4 Label Smoothing: 0.1 Mixed Precision: Enabled (AMP) Gradient Clipping: 1.0 # OneCycle Learning Rate Schedule LR Schedule: OneCycle (Custom) - Phase 1 (Epochs 0-40): 0.01 β†’ 0.1 (warmup) - Phase 2 (Epochs 41-81): 0.1 β†’ 0.01 (cooldown) - Phase 3 (Epochs 82-99): 0.01 β†’ 0.001 (annihilation) ``` ### Data Augmentation Using Albumentations library: - **Training:** - Random padding (32β†’36) + Random crop (36β†’32) - Horizontal flip (p=0.5) - ShiftScaleRotate (shift=0.05, scale=0.05, rotate=5Β°, p=0.3) - CoarseDropout/Cutout (16Γ—16, p=0.4) - Color jitter (brightness, contrast, saturation, hue, p=0.4) - Normalization (CIFAR-100 mean/std) - **Testing:** - Normalization only ## Training Results ### Training Curves ![Training Curves](plots_complete/training_curves.png) The training curves show: - **Steady convergence** with minimal overfitting - **Effective learning rate schedule** with OneCycle policy - **Generalization gap** maintained below 5% throughout training - **Final training accuracy:** 80.47% ### Learning Rate Schedule ![Learning Rate Schedule](plots_complete/learning_rate_schedule.png) The OneCycle LR schedule implementation: 1. **Warmup Phase (41 epochs):** Linear increase from 0.01 to 0.1 2. **Cooldown Phase (41 epochs):** Linear decrease from 0.1 to 0.01 3. **Annihilation Phase (18 epochs):** Linear decrease from 0.01 to 0.001 This schedule helps the model: - Escape local minima early in training - Find a wide minimum for better generalization - Fine-tune with very small learning rates at the end ### Per-Class Performance ![Class Metrics](plots_complete/class_metrics.png) **Top 5 Best Performing Classes:** 1. **wardrobe** - F1: 0.9458 (Precision: 0.9320, Recall: 0.9600) 2. **sunflower** - F1: 0.9381 (Precision: 0.9681, Recall: 0.9100) 3. **poppy** - F1: 0.9315 (Precision: 0.9444, Recall: 0.9189) 4. **can** - F1: 0.9310 (Precision: 0.9000, Recall: 0.9643) 5. **skyscraper** - F1: 0.9100 (Precision: 0.9100, Recall: 0.9100) **Most Challenging Classes:** - **boy** - F1: 0.4286 (Fine-grained human features) - **girl** - F1: 0.4646 (Similar to boy) - **baby** - F1: 0.5079 (Fine-grained human features) - **man** - F1: 0.5758 (Similar to boy) - **plate** - F1: 0.5797 (Simple objects, easily confused) The model performs exceptionally well on distinct objects (flowers, buildings, furniture) but struggles with fine-grained human categorization, which is expected for CIFAR-100's 32Γ—32 resolution. ## Model Architecture Summary From `model_cifar.py`: | Component | Specification | |-----------|---------------| | **Model Name** | ResNet34 (CIFAR-optimized) | | **Total Parameters** | 4,949,412 (~5M) | | **Architecture Depth** | 10 weight layers (1 initial + 8 residual + 1 FC) | | **Residual Blocks** | 4 BasicBlocks (1 per stage) | | **Channel Progression** | 3 β†’ 64 β†’ 128 β†’ 256 β†’ 512 β†’ 100 | | **Spatial Downsampling** | 32Γ—32 β†’ 16Γ—16 β†’ 8Γ—8 β†’ 4Γ—4 β†’ 1Γ—1 | | **Receptive Field Growth** | 1Γ—1 β†’ 3Γ—3 β†’ 7Γ—7 β†’ 15Γ—15 β†’ 31Γ—31 β†’ 63Γ—63 | | **Skip Connections** | 4 (1 identity + 3 projection shortcuts) | | **Pooling Strategy** | Global Average Pooling (4Γ—4 β†’ 1Γ—1) | | **Initialization** | Kaiming Normal (He) for Conv, Constant for BN | | **Downsampling Method** | Strided convolutions (no MaxPool) | **Why This Architecture Works for CIFAR-100:** 1. **Right-sized capacity**: 5M parameters balances expressiveness vs overfitting risk 2. **Preserved resolution**: No aggressive downsampling maintains spatial detail in 32Γ—32 images 3. **Optimal receptive field**: 63Γ—63 RF exceeds input size (32Γ—32), capturing full image context 4. **Progressive downsampling**: 3 stride-2 ops (vs 1 MaxPool + 4 stride-2 in ImageNet ResNet) 5. **Residual learning**: Skip connections enable gradient flow through 10 weight layers 6. **Efficient computation**: Lightweight design trains in ~2-3 hours on single GPU **Receptive Field Analysis:** - By **layer2** (16Γ—16Γ—128): RF = 15Γ—15 β†’ covers ~50% of image - By **layer3** (8Γ—8Γ—256): RF = 31Γ—31 β†’ covers ~95% of image - By **layer4** (4Γ—4Γ—512): RF = 63Γ—63 β†’ covers **full image + context** - Each neuron in final feature map can "see" the entire input image ## Project Structure ``` CIFAR100/ β”œβ”€β”€ main.py # Main training script with OneCycle LR β”œβ”€β”€ model_cifar.py # Custom ResNet architecture (5M params) β”‚ β”œβ”€β”€ BasicBlock # 2-layer residual block with skip connection β”‚ └── ResNet34 # CIFAR-optimized ResNet variant β”œβ”€β”€ train.py # Training and evaluation loops β”œβ”€β”€ preprocess.py # Data loading with Albumentations β”œβ”€β”€ visualization.py # Metrics calculation and plotting β”œβ”€β”€ inference.py # Model inference utilities β”œβ”€β”€ app.py # Gradio web interface for demo β”œβ”€β”€ run_complete_training.py # Full training pipeline with logging β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ log/ # Training logs β”‚ └── training_complete_20251010-103227.log └── plots_complete/ # Training visualizations β”œβ”€β”€ training_curves.png β”œβ”€β”€ learning_rate_schedule.png β”œβ”€β”€ class_metrics.png β”œβ”€β”€ confusion_matrix.png └── classification_report.txt ``` ## Quick Start ### Installation ```bash # Clone the repository git clone cd CIFAR100 # Install dependencies pip install -r requirements.txt ``` ### Training ```bash # Train with OneCycle LR for 100 epochs python main.py \ --scheduler onecycle \ --epochs 100 \ --batch_size 512 \ --lr 0.1 \ --momentum 0.9 \ --weight_decay 1e-4 \ --amp \ --plot_training \ --plot_evaluation # Or use the complete training script with logging python run_complete_training.py ``` ### Inference ```bash # Run interactive web demo python app.py # Or use inference script python inference.py --image path/to/image.jpg --model snapshots/best_model.pth ``` ## Key Features ### 1. **OneCycle Learning Rate Policy** Implements the OneCycle LR schedule from "Super-Convergence: Very Fast Training of Neural Networks" paper: - Achieves faster convergence - Better generalization - Higher final accuracy ### 2. **Comprehensive Metrics Logging** After each training run, the script automatically outputs: - Training and test accuracy/loss curves - Top-1, Top-3, Top-5 accuracies - Precision, Recall, F1-Score (macro and weighted) - Per-class performance breakdown - Confusion matrix and classification report ### 3. **Mixed Precision Training (AMP)** - 2-3x faster training on modern GPUs - Reduced memory usage - Maintains accuracy with float16/float32 mixed precision ### 4. **Advanced Data Augmentation** Uses Albumentations for efficient augmentation: - Faster than torchvision transforms - More augmentation options - GPU-compatible with minimal overhead ### 5. **Model Checkpointing** - Automatic snapshot saving at specified intervals - Best model tracking based on test accuracy - Resume training from any checkpoint ## Detailed Training Log Full training logs are available in `log/training_complete_20251010-103227.log`, including: - Per-epoch train/test loss and accuracy - Learning rate at each epoch - Final comprehensive evaluation with per-class metrics - Training time and resource utilization Example final output: ``` ====================================================================== TRAINING COMPLETED - FINAL EVALUATION ====================================================================== TRAINING SUMMARY ---------------------------------------------------------------------- Total Epochs Trained: 100 Final Training Loss: 0.5584 Final Training Accuracy: 80.47% Best Training Accuracy: 81.05% (Epoch 94) Final Learning Rate: 0.001500 TEST/VALIDATION SUMMARY ---------------------------------------------------------------------- Final Test Loss: 0.8985 Final Test Accuracy: 76.68% Best Test Accuracy: 76.79% (Epoch 99) COMPREHENSIVE TEST SET METRICS ---------------------------------------------------------------------- Top-1 Accuracy (Test): 76.68% Top-3 Accuracy (Test): 90.95% Top-5 Accuracy (Test): 94.07% ``` ## Requirements Met βœ… **Training from Scratch**: Custom ResNet (5M params) trained without pre-trained weights βœ… **CIFAR-100 Dataset**: All 100 classes used (50,000 train / 10,000 test) βœ… **Target Accuracy**: **76.68% achieved** (target: 73%) - **Exceeded by 3.68%** βœ… **Training Duration**: 100 epochs with OneCycle LR schedule βœ… **Modern Tools**: Extensive use of ChatGPT/Cursor for development βœ… **Comprehensive Evaluation**: Full metrics, plots, and detailed analysis βœ… **Model Architecture**: Custom lightweight ResNet optimized for CIFAR-100 βœ… **Reproducibility**: Complete logs, checkpoints, and configuration documented ## Technologies Used - **PyTorch** - Deep learning framework - **Albumentations** - Data augmentation - **Gradio** - Web interface for inference - **scikit-learn** - Metrics calculation - **matplotlib/seaborn** - Visualization - **numpy** - Numerical operations ## Model Comparison | Model Variant | Parameters | Expected Accuracy | Notes | |---------------|------------|-------------------|-------| | **Our Model** (4 blocks) | **5M** | **76.68%** | Balanced efficiency & accuracy | | Standard ResNet-18 | 11M | ~76-78% | Good baseline for CIFAR | | Standard ResNet-34 | 21M | ~78-80% | More capacity, slower training | | Wide-ResNet-28-10 | 36M | ~80-82% | State-of-art, requires more resources | | PyramidNet | 26M | ~82-84% | Complex architecture | **Our lightweight design achieves competitive accuracy with 2-4Γ— fewer parameters than standard ResNets.** ## Future Improvements Potential enhancements to reach higher accuracy (78%+): 1. **Architecture upgrades**: - Increase blocks per stage: [2, 2, 2, 2] or [3, 3, 3, 3] - Try Wide-ResNet with wider channels - Add Squeeze-and-Excitation (SE) blocks 2. **Training tricks**: - Mixup (Ξ±=0.2) for better generalization - CutMix for spatial regularization - AutoAugment or RandAugment policies 3. **Regularization**: - Stochastic Depth (survival probability 0.8-0.9) - DropBlock for spatial dropout - Increased label smoothing (0.2) 4. **Ensemble methods**: - Train 3-5 models with different seeds - Snapshot ensembles (save last N checkpoints) 5. **Longer training**: - 200-300 epochs with cosine annealing - Multi-step or exponential LR decay 6. **Knowledge distillation**: - Train larger teacher model first - Use soft targets for student training ## Technical Implementation Details ### Architecture Design Rationale **Why a lightweight ResNet variant?** 1. **CIFAR-100 Image Size**: At 32Γ—32 pixels, CIFAR images contain less spatial information than ImageNet (224Γ—224) - Standard ResNet-34's [3,4,6,3] block structure is over-parameterized - Our [1,1,1,1] structure provides sufficient capacity without overfitting 2. **Parameter Efficiency**: - 5M parameters: Sweet spot between underfitting and overfitting - Faster training: 100 epochs in ~2-3 hours vs 5-6 hours for ResNet-34 - Lower memory footprint: Can use larger batch sizes 3. **CIFAR-Specific Modifications**: - **3Γ—3 initial conv** (vs 7Γ—7): Preserves fine details in small images - **No MaxPool layer**: Maintains spatial resolution (32Γ—32 β†’ 4Γ—4 over 4 stages) - **Stride-2 convolutions**: Gradual downsampling for feature hierarchy ### Code Reference From `model_cifar.py`: ```python class ResNet34(nn.Module): def __init__(self, num_classes=100): super().__init__() self.in_channels = 64 # CIFAR-specific: 3Γ—3 conv, no maxpool self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) # 4 stages with 1 BasicBlock each self.layer1 = self._make_layer(64, 1) # 32Γ—32Γ—64 self.layer2 = self._make_layer(128, 1, stride=2) # 16Γ—16Γ—128 self.layer3 = self._make_layer(256, 1, stride=2) # 8Γ—8Γ—256 self.layer4 = self._make_layer(512, 1, stride=2) # 4Γ—4Γ—512 # Classification head self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512, num_classes) ``` **BasicBlock** (2 conv layers + skip connection): ```python class BasicBlock(nn.Module): def forward(self, x): identity = x out = F.relu(self.bn1(self.conv1(x))) # Conv β†’ BN β†’ ReLU out = self.bn2(self.conv2(out)) # Conv β†’ BN out += identity # Add skip connection out = F.relu(out) # ReLU return out ``` ## References **Papers:** - He et al., "Deep Residual Learning for Image Recognition" (2016) - ResNet architecture - Smith, "Super-Convergence: Very Fast Training of Neural Networks" (2018) - OneCycle LR - Krizhevsky, "Learning Multiple Layers of Features from Tiny Images" (2009) - CIFAR-100 **Implementation Resources:** - PyTorch official ResNet implementation - Albumentations library for efficient augmentation - torchvision.datasets for CIFAR-100 loading ## License MIT License ## Acknowledgments This project was developed with extensive assistance from: - ChatGPT for architecture design and debugging - Cursor AI for code completion and refactoring - PyTorch and torchvision communities for reference implementations --- **Note:** Training logs, model checkpoints, and detailed per-class metrics are available in the `log/` and `plots_complete/` directories.