| --- |
| title: CIFAR-100 Image Classifier |
| emoji: π― |
| colorFrom: blue |
| colorTo: purple |
| sdk: gradio |
| sdk_version: "5.49.1" |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| # CIFAR-100 ResNet Training from Scratch |
|
|
| A ResNet-34 model trained from scratch on CIFAR-100 dataset, achieving **76.68% top-1 accuracy** in 100 epochs with OneCycle Learning Rate scheduling. |
|
|
| ## Project Overview |
|
|
| This project demonstrates training a ResNet architecture from scratch on the CIFAR-100 dataset without using any pre-trained models. The implementation leverages modern deep learning techniques including data augmentation, OneCycle LR scheduling, and mixed precision training. |
|
|
| ## Results Summary |
|
|
| ### Performance Metrics (100 Epochs) |
|
|
| | Metric | Score | |
| |--------|-------| |
| | **Top-1 Accuracy** | **76.68%** β
(Target: 73%) | |
| | **Top-3 Accuracy** | **90.95%** | |
| | **Top-5 Accuracy** | **94.07%** | |
| | **Best Test Accuracy** | **76.79%** (Epoch 99) | |
| | **Macro F1-Score** | **0.7670** | |
| | **Weighted F1-Score** | **0.7668** | |
|
|
| ### Averaged Metrics |
|
|
| **Macro-Averaged (unweighted):** |
| - Precision: 0.7708 |
| - Recall: 0.7668 |
| - F1-Score: 0.7670 |
|
|
| **Weighted-Averaged (by class support):** |
| - Precision: 0.7708 |
| - Recall: 0.7668 |
| - F1-Score: 0.7668 |
|
|
| ## Training Configuration |
|
|
| ### Model Architecture |
|
|
| **Custom Lightweight ResNet for CIFAR-100** |
|
|
| A specially designed ResNet variant optimized for small image classification: |
|
|
| ``` |
| Model: ResNet34 (CIFAR-optimized) |
| Total Parameters: 4,949,412 (~5M) |
| Trainable Parameters: 4,949,412 |
| Input Size: 32Γ32Γ3 (RGB) |
| Output Classes: 100 |
| ``` |
|
|
| **Architecture Details** (from `model_cifar.py`): |
|
|
| ### Layer-by-Layer Feature Map Progression |
|
|
| | Layer | Operation | Kernel | Stride | Padding | Input Size | Output Size | Channels | Receptive Field | |
| |-------|-----------|--------|--------|---------|------------|-------------|----------|-----------------| |
| | **Input** | - | - | - | - | 32Γ32 | 32Γ32 | 3 | 1Γ1 | |
| | **conv1** | Conv2d | 3Γ3 | 1 | 1 | 32Γ32Γ3 | 32Γ32Γ64 | 64 | **3Γ3** | |
| | **bn1+relu** | BN+ReLU | - | - | - | 32Γ32Γ64 | 32Γ32Γ64 | 64 | 3Γ3 | |
| | **layer1** | BasicBlock | 3Γ3,3Γ3 | 1,1 | 1,1 | 32Γ32Γ64 | **32Γ32Γ64** | 64 | **7Γ7** | |
| | **layer2** | BasicBlock | 3Γ3,3Γ3 | 2,1 | 1,1 | 32Γ32Γ64 | **16Γ16Γ128** | 128 | **15Γ15** | |
| | **layer3** | BasicBlock | 3Γ3,3Γ3 | 2,1 | 1,1 | 16Γ16Γ128 | **8Γ8Γ256** | 256 | **31Γ31** | |
| | **layer4** | BasicBlock | 3Γ3,3Γ3 | 2,1 | 1,1 | 8Γ8Γ256 | **4Γ4Γ512** | 512 | **63Γ63** | |
| | **avgpool** | AdaptiveAvgPool2d | 4Γ4 | - | - | 4Γ4Γ512 | 1Γ1Γ512 | 512 | **Full image** | |
| | **fc** | Linear | - | - | - | 512 | 100 | 100 | - | |
|
|
| **Key Observations:** |
| - **Receptive field at layer4**: 63Γ63 pixels (covers **full 32Γ32 image** with 2Γ margin) |
| - **Spatial downsampling**: 3 stride-2 operations reduce 32Γ32 β 4Γ4 (8Γ reduction) |
| - **Channel expansion**: 3 β 64 β 128 β 256 β 512 (progressive feature richness) |
| - **Feature map efficiency**: No information loss from MaxPooling (common in ImageNet models) |
|
|
| ### Detailed Architecture Components |
|
|
| 1. **Initial Convolution Block** |
| ``` |
| Input: 32Γ32Γ3 β Conv2d(3β64, k=3Γ3, s=1, p=1) β BN β ReLU β Output: 32Γ32Γ64 |
| Receptive Field: 1Γ1 β 3Γ3 |
| ``` |
| - CIFAR-optimized: 3Γ3 conv (not 7Γ7 like ImageNet ResNets) |
| - Preserves spatial resolution (no stride-2 or MaxPool) |
| - Captures fine-grained details essential for small images |
|
|
| 2. **Layer 1: Residual Stage 1** (64 channels, no downsampling) |
| ``` |
| Input: 32Γ32Γ64 |
| BasicBlock: |
| ββ Conv(64β64, k=3Γ3, s=1, p=1) β BN β ReLU β 32Γ32Γ64 |
| ββ Conv(64β64, k=3Γ3, s=1, p=1) β BN β 32Γ32Γ64 |
| ββ Add(identity) β ReLU β Output: 32Γ32Γ64 |
| Receptive Field: 3Γ3 β 7Γ7 |
| ``` |
| - No spatial downsampling (stride=1) |
| - Identity skip connection (no projection needed) |
| - RF grows by 4 pixels (2 conv layers Γ 2 pixels each) |
|
|
| 3. **Layer 2: Residual Stage 2** (128 channels, downsample) |
| ``` |
| Input: 32Γ32Γ64 |
| BasicBlock: |
| ββ Conv(64β128, k=3Γ3, s=2, p=1) β BN β ReLU β 16Γ16Γ128 |
| ββ Conv(128β128, k=3Γ3, s=1, p=1) β BN β 16Γ16Γ128 |
| ββ Skip: Conv(64β128, k=1Γ1, s=2) β BN β 16Γ16Γ128 (projection) |
| ββ Add(skip) β ReLU β Output: 16Γ16Γ128 |
| Receptive Field: 7Γ7 β 15Γ15 |
| ``` |
| - **Spatial downsampling**: 32Γ32 β 16Γ16 (stride=2 in first conv) |
| - **Channel expansion**: 64 β 128 |
| - **Projection shortcut**: 1Γ1 conv matches dimensions |
| - RF doubles due to stride-2 convolution |
|
|
| 4. **Layer 3: Residual Stage 3** (256 channels, downsample) |
| ``` |
| Input: 16Γ16Γ128 |
| BasicBlock: |
| ββ Conv(128β256, k=3Γ3, s=2, p=1) β BN β ReLU β 8Γ8Γ256 |
| ββ Conv(256β256, k=3Γ3, s=1, p=1) β BN β 8Γ8Γ256 |
| ββ Skip: Conv(128β256, k=1Γ1, s=2) β BN β 8Γ8Γ256 (projection) |
| ββ Add(skip) β ReLU β Output: 8Γ8Γ256 |
| Receptive Field: 15Γ15 β 31Γ31 |
| ``` |
| - **Spatial downsampling**: 16Γ16 β 8Γ8 |
| - **Channel expansion**: 128 β 256 |
| - RF now covers most of the input image |
|
|
| 5. **Layer 4: Residual Stage 4** (512 channels, downsample) |
| ``` |
| Input: 8Γ8Γ256 |
| BasicBlock: |
| ββ Conv(256β512, k=3Γ3, s=2, p=1) β BN β ReLU β 4Γ4Γ512 |
| ββ Conv(512β512, k=3Γ3, s=1, p=1) β BN β 4Γ4Γ512 |
| ββ Skip: Conv(256β512, k=1Γ1, s=2) β BN β 4Γ4Γ512 (projection) |
| ββ Add(skip) β ReLU β Output: 4Γ4Γ512 |
| Receptive Field: 31Γ31 β 63Γ63 |
| ``` |
| - **Final spatial downsampling**: 8Γ8 β 4Γ4 |
| - **Maximum channels**: 512 (highest feature richness) |
| - **RF exceeds input size**: 63Γ63 > 32Γ32 (full image context) |
|
|
| 6. **Classification Head** |
| ``` |
| Input: 4Γ4Γ512 |
| ββ AdaptiveAvgPool2d((1,1)) β 1Γ1Γ512 (global spatial pooling) |
| ββ Flatten β 512 |
| ββ Linear(512 β 100) β 100 class logits |
| ``` |
| - Global Average Pooling: Each of 512 channels β single value |
| - Reduces overfitting vs fully-connected layers |
| - Translation invariant features |
|
|
| 7. **Initialization Strategy** |
| - **Kaiming (He) Normal** for Conv2d weights |
| - Optimal for ReLU activations |
| - `std = sqrt(2 / fan_in)` |
| - **Constant initialization** for BatchNorm |
| - weight = 1, bias = 0 |
|
|
| ### Architecture Flow Diagram |
|
|
| ``` |
| Input Image (32Γ32Γ3, RF=1Γ1) |
| β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β STEM: Conv 3Γ3 β BN β ReLU β |
| β Output: 32Γ32Γ64, RF=3Γ3 β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β STAGE 1: BasicBlock (64 channels, stride=1) β |
| β Conv 3Γ3 β BN β ReLU β Conv 3Γ3 β BN β (+) β ReLU β |
| β Output: 32Γ32Γ64, RF=7Γ7 β |
| β Skip: Identity (no projection needed) β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β [Spatial: 32Γ32, Channels: 64, RF: 7Γ7] |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β STAGE 2: BasicBlock (128 channels, stride=2) ββ β |
| β Conv 3Γ3,s2 β BN β ReLU β Conv 3Γ3 β BN β (+) β ReLUβ |
| β Output: 16Γ16Γ128, RF=15Γ15 β |
| β Skip: Conv 1Γ1,s2 (projection: 64β128) β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β [Spatial: 16Γ16, Channels: 128, RF: 15Γ15] |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β STAGE 3: BasicBlock (256 channels, stride=2) ββ β |
| β Conv 3Γ3,s2 β BN β ReLU β Conv 3Γ3 β BN β (+) β ReLUβ |
| β Output: 8Γ8Γ256, RF=31Γ31 β |
| β Skip: Conv 1Γ1,s2 (projection: 128β256) β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β [Spatial: 8Γ8, Channels: 256, RF: 31Γ31] |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β STAGE 4: BasicBlock (512 channels, stride=2) ββ β |
| β Conv 3Γ3,s2 β BN β ReLU β Conv 3Γ3 β BN β (+) β ReLUβ |
| β Output: 4Γ4Γ512, RF=63Γ63 (exceeds 32Γ32!) β |
| β Skip: Conv 1Γ1,s2 (projection: 256β512) β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β [Spatial: 4Γ4, Channels: 512, RF: Full Image] |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β HEAD: Global Average Pooling β FC β |
| β AdaptiveAvgPool2d(1,1) β Flatten β Linear(512β100) β |
| β Output: 100 class logits β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| Predictions (100 classes) |
| ``` |
|
|
| **Key Design Choices:** |
| - β
**CIFAR-specific stem**: 3Γ3 conv instead of 7Γ7 (ImageNet-style) |
| - β
**No aggressive downsampling**: Preserves spatial information for 32Γ32 images |
| - β
**Lightweight**: 1 block per stage instead of [3,4,6,3] for efficient training |
| - β
**Residual connections**: Enable gradient flow for deeper networks |
| - β
**Global Average Pooling**: Reduces overfitting vs fully-connected layers |
| - β
**Progressive RF growth**: Each layer sees more context (7β15β31β63 pixels) |
|
|
| ### Training Hyperparameters |
| ```python |
| Epochs: 100 |
| Batch Size: 512 |
| Optimizer: SGD with Nesterov momentum |
| Momentum: 0.9 |
| Weight Decay: 1e-4 |
| Label Smoothing: 0.1 |
| Mixed Precision: Enabled (AMP) |
| Gradient Clipping: 1.0 |
| |
| # OneCycle Learning Rate Schedule |
| LR Schedule: OneCycle (Custom) |
| - Phase 1 (Epochs 0-40): 0.01 β 0.1 (warmup) |
| - Phase 2 (Epochs 41-81): 0.1 β 0.01 (cooldown) |
| - Phase 3 (Epochs 82-99): 0.01 β 0.001 (annihilation) |
| ``` |
|
|
| ### Data Augmentation |
| Using Albumentations library: |
| - **Training:** |
| - Random padding (32β36) + Random crop (36β32) |
| - Horizontal flip (p=0.5) |
| - ShiftScaleRotate (shift=0.05, scale=0.05, rotate=5Β°, p=0.3) |
| - CoarseDropout/Cutout (16Γ16, p=0.4) |
| - Color jitter (brightness, contrast, saturation, hue, p=0.4) |
| - Normalization (CIFAR-100 mean/std) |
|
|
| - **Testing:** |
| - Normalization only |
|
|
| ## Training Results |
|
|
| ### Training Curves |
|
|
|  |
|
|
| The training curves show: |
| - **Steady convergence** with minimal overfitting |
| - **Effective learning rate schedule** with OneCycle policy |
| - **Generalization gap** maintained below 5% throughout training |
| - **Final training accuracy:** 80.47% |
|
|
| ### Learning Rate Schedule |
|
|
|  |
|
|
| The OneCycle LR schedule implementation: |
| 1. **Warmup Phase (41 epochs):** Linear increase from 0.01 to 0.1 |
| 2. **Cooldown Phase (41 epochs):** Linear decrease from 0.1 to 0.01 |
| 3. **Annihilation Phase (18 epochs):** Linear decrease from 0.01 to 0.001 |
|
|
| This schedule helps the model: |
| - Escape local minima early in training |
| - Find a wide minimum for better generalization |
| - Fine-tune with very small learning rates at the end |
|
|
| ### Per-Class Performance |
|
|
|  |
|
|
| **Top 5 Best Performing Classes:** |
| 1. **wardrobe** - F1: 0.9458 (Precision: 0.9320, Recall: 0.9600) |
| 2. **sunflower** - F1: 0.9381 (Precision: 0.9681, Recall: 0.9100) |
| 3. **poppy** - F1: 0.9315 (Precision: 0.9444, Recall: 0.9189) |
| 4. **can** - F1: 0.9310 (Precision: 0.9000, Recall: 0.9643) |
| 5. **skyscraper** - F1: 0.9100 (Precision: 0.9100, Recall: 0.9100) |
|
|
| **Most Challenging Classes:** |
| - **boy** - F1: 0.4286 (Fine-grained human features) |
| - **girl** - F1: 0.4646 (Similar to boy) |
| - **baby** - F1: 0.5079 (Fine-grained human features) |
| - **man** - F1: 0.5758 (Similar to boy) |
| - **plate** - F1: 0.5797 (Simple objects, easily confused) |
|
|
| The model performs exceptionally well on distinct objects (flowers, buildings, furniture) but struggles with fine-grained human categorization, which is expected for CIFAR-100's 32Γ32 resolution. |
|
|
| ## Model Architecture Summary |
|
|
| From `model_cifar.py`: |
|
|
| | Component | Specification | |
| |-----------|---------------| |
| | **Model Name** | ResNet34 (CIFAR-optimized) | |
| | **Total Parameters** | 4,949,412 (~5M) | |
| | **Architecture Depth** | 10 weight layers (1 initial + 8 residual + 1 FC) | |
| | **Residual Blocks** | 4 BasicBlocks (1 per stage) | |
| | **Channel Progression** | 3 β 64 β 128 β 256 β 512 β 100 | |
| | **Spatial Downsampling** | 32Γ32 β 16Γ16 β 8Γ8 β 4Γ4 β 1Γ1 | |
| | **Receptive Field Growth** | 1Γ1 β 3Γ3 β 7Γ7 β 15Γ15 β 31Γ31 β 63Γ63 | |
| | **Skip Connections** | 4 (1 identity + 3 projection shortcuts) | |
| | **Pooling Strategy** | Global Average Pooling (4Γ4 β 1Γ1) | |
| | **Initialization** | Kaiming Normal (He) for Conv, Constant for BN | |
| | **Downsampling Method** | Strided convolutions (no MaxPool) | |
|
|
| **Why This Architecture Works for CIFAR-100:** |
|
|
| 1. **Right-sized capacity**: 5M parameters balances expressiveness vs overfitting risk |
| 2. **Preserved resolution**: No aggressive downsampling maintains spatial detail in 32Γ32 images |
| 3. **Optimal receptive field**: 63Γ63 RF exceeds input size (32Γ32), capturing full image context |
| 4. **Progressive downsampling**: 3 stride-2 ops (vs 1 MaxPool + 4 stride-2 in ImageNet ResNet) |
| 5. **Residual learning**: Skip connections enable gradient flow through 10 weight layers |
| 6. **Efficient computation**: Lightweight design trains in ~2-3 hours on single GPU |
|
|
| **Receptive Field Analysis:** |
| - By **layer2** (16Γ16Γ128): RF = 15Γ15 β covers ~50% of image |
| - By **layer3** (8Γ8Γ256): RF = 31Γ31 β covers ~95% of image |
| - By **layer4** (4Γ4Γ512): RF = 63Γ63 β covers **full image + context** |
| - Each neuron in final feature map can "see" the entire input image |
|
|
| ## Project Structure |
|
|
| ``` |
| CIFAR100/ |
| βββ main.py # Main training script with OneCycle LR |
| βββ model_cifar.py # Custom ResNet architecture (5M params) |
| β βββ BasicBlock # 2-layer residual block with skip connection |
| β βββ ResNet34 # CIFAR-optimized ResNet variant |
| βββ train.py # Training and evaluation loops |
| βββ preprocess.py # Data loading with Albumentations |
| βββ visualization.py # Metrics calculation and plotting |
| βββ inference.py # Model inference utilities |
| βββ app.py # Gradio web interface for demo |
| βββ run_complete_training.py # Full training pipeline with logging |
| βββ requirements.txt # Python dependencies |
| βββ log/ # Training logs |
| β βββ training_complete_20251010-103227.log |
| βββ plots_complete/ # Training visualizations |
| βββ training_curves.png |
| βββ learning_rate_schedule.png |
| βββ class_metrics.png |
| βββ confusion_matrix.png |
| βββ classification_report.txt |
| ``` |
|
|
| ## Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| # Clone the repository |
| git clone <your-repo-url> |
| cd CIFAR100 |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Training |
|
|
| ```bash |
| # Train with OneCycle LR for 100 epochs |
| python main.py \ |
| --scheduler onecycle \ |
| --epochs 100 \ |
| --batch_size 512 \ |
| --lr 0.1 \ |
| --momentum 0.9 \ |
| --weight_decay 1e-4 \ |
| --amp \ |
| --plot_training \ |
| --plot_evaluation |
| |
| # Or use the complete training script with logging |
| python run_complete_training.py |
| ``` |
|
|
| ### Inference |
|
|
| ```bash |
| # Run interactive web demo |
| python app.py |
| |
| # Or use inference script |
| python inference.py --image path/to/image.jpg --model snapshots/best_model.pth |
| ``` |
|
|
| ## Key Features |
|
|
| ### 1. **OneCycle Learning Rate Policy** |
| Implements the OneCycle LR schedule from "Super-Convergence: Very Fast Training of Neural Networks" paper: |
| - Achieves faster convergence |
| - Better generalization |
| - Higher final accuracy |
|
|
| ### 2. **Comprehensive Metrics Logging** |
| After each training run, the script automatically outputs: |
| - Training and test accuracy/loss curves |
| - Top-1, Top-3, Top-5 accuracies |
| - Precision, Recall, F1-Score (macro and weighted) |
| - Per-class performance breakdown |
| - Confusion matrix and classification report |
|
|
| ### 3. **Mixed Precision Training (AMP)** |
| - 2-3x faster training on modern GPUs |
| - Reduced memory usage |
| - Maintains accuracy with float16/float32 mixed precision |
|
|
| ### 4. **Advanced Data Augmentation** |
| Uses Albumentations for efficient augmentation: |
| - Faster than torchvision transforms |
| - More augmentation options |
| - GPU-compatible with minimal overhead |
|
|
| ### 5. **Model Checkpointing** |
| - Automatic snapshot saving at specified intervals |
| - Best model tracking based on test accuracy |
| - Resume training from any checkpoint |
|
|
| ## Detailed Training Log |
|
|
| Full training logs are available in `log/training_complete_20251010-103227.log`, including: |
| - Per-epoch train/test loss and accuracy |
| - Learning rate at each epoch |
| - Final comprehensive evaluation with per-class metrics |
| - Training time and resource utilization |
|
|
| Example final output: |
| ``` |
| ====================================================================== |
| TRAINING COMPLETED - FINAL EVALUATION |
| ====================================================================== |
|
|
| TRAINING SUMMARY |
| ---------------------------------------------------------------------- |
| Total Epochs Trained: 100 |
| Final Training Loss: 0.5584 |
| Final Training Accuracy: 80.47% |
| Best Training Accuracy: 81.05% (Epoch 94) |
| Final Learning Rate: 0.001500 |
|
|
| TEST/VALIDATION SUMMARY |
| ---------------------------------------------------------------------- |
| Final Test Loss: 0.8985 |
| Final Test Accuracy: 76.68% |
| Best Test Accuracy: 76.79% (Epoch 99) |
|
|
| COMPREHENSIVE TEST SET METRICS |
| ---------------------------------------------------------------------- |
| Top-1 Accuracy (Test): 76.68% |
| Top-3 Accuracy (Test): 90.95% |
| Top-5 Accuracy (Test): 94.07% |
| ``` |
| |
| ## Requirements Met |
| |
| β
**Training from Scratch**: Custom ResNet (5M params) trained without pre-trained weights |
| β
**CIFAR-100 Dataset**: All 100 classes used (50,000 train / 10,000 test) |
| β
**Target Accuracy**: **76.68% achieved** (target: 73%) - **Exceeded by 3.68%** |
| β
**Training Duration**: 100 epochs with OneCycle LR schedule |
| β
**Modern Tools**: Extensive use of ChatGPT/Cursor for development |
| β
**Comprehensive Evaluation**: Full metrics, plots, and detailed analysis |
| β
**Model Architecture**: Custom lightweight ResNet optimized for CIFAR-100 |
| β
**Reproducibility**: Complete logs, checkpoints, and configuration documented |
| |
| ## Technologies Used |
| |
| - **PyTorch** - Deep learning framework |
| - **Albumentations** - Data augmentation |
| - **Gradio** - Web interface for inference |
| - **scikit-learn** - Metrics calculation |
| - **matplotlib/seaborn** - Visualization |
| - **numpy** - Numerical operations |
| |
| ## Model Comparison |
| |
| | Model Variant | Parameters | Expected Accuracy | Notes | |
| |---------------|------------|-------------------|-------| |
| | **Our Model** (4 blocks) | **5M** | **76.68%** | Balanced efficiency & accuracy | |
| | Standard ResNet-18 | 11M | ~76-78% | Good baseline for CIFAR | |
| | Standard ResNet-34 | 21M | ~78-80% | More capacity, slower training | |
| | Wide-ResNet-28-10 | 36M | ~80-82% | State-of-art, requires more resources | |
| | PyramidNet | 26M | ~82-84% | Complex architecture | |
| |
| **Our lightweight design achieves competitive accuracy with 2-4Γ fewer parameters than standard ResNets.** |
| |
| ## Future Improvements |
| |
| Potential enhancements to reach higher accuracy (78%+): |
| 1. **Architecture upgrades**: |
| - Increase blocks per stage: [2, 2, 2, 2] or [3, 3, 3, 3] |
| - Try Wide-ResNet with wider channels |
| - Add Squeeze-and-Excitation (SE) blocks |
| 2. **Training tricks**: |
| - Mixup (Ξ±=0.2) for better generalization |
| - CutMix for spatial regularization |
| - AutoAugment or RandAugment policies |
| 3. **Regularization**: |
| - Stochastic Depth (survival probability 0.8-0.9) |
| - DropBlock for spatial dropout |
| - Increased label smoothing (0.2) |
| 4. **Ensemble methods**: |
| - Train 3-5 models with different seeds |
| - Snapshot ensembles (save last N checkpoints) |
| 5. **Longer training**: |
| - 200-300 epochs with cosine annealing |
| - Multi-step or exponential LR decay |
| 6. **Knowledge distillation**: |
| - Train larger teacher model first |
| - Use soft targets for student training |
| |
| ## Technical Implementation Details |
| |
| ### Architecture Design Rationale |
| |
| **Why a lightweight ResNet variant?** |
| |
| 1. **CIFAR-100 Image Size**: At 32Γ32 pixels, CIFAR images contain less spatial information than ImageNet (224Γ224) |
| - Standard ResNet-34's [3,4,6,3] block structure is over-parameterized |
| - Our [1,1,1,1] structure provides sufficient capacity without overfitting |
| |
| 2. **Parameter Efficiency**: |
| - 5M parameters: Sweet spot between underfitting and overfitting |
| - Faster training: 100 epochs in ~2-3 hours vs 5-6 hours for ResNet-34 |
| - Lower memory footprint: Can use larger batch sizes |
| |
| 3. **CIFAR-Specific Modifications**: |
| - **3Γ3 initial conv** (vs 7Γ7): Preserves fine details in small images |
| - **No MaxPool layer**: Maintains spatial resolution (32Γ32 β 4Γ4 over 4 stages) |
| - **Stride-2 convolutions**: Gradual downsampling for feature hierarchy |
| |
| ### Code Reference |
| |
| From `model_cifar.py`: |
| ```python |
| class ResNet34(nn.Module): |
| def __init__(self, num_classes=100): |
| super().__init__() |
| self.in_channels = 64 |
| |
| # CIFAR-specific: 3Γ3 conv, no maxpool |
| self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False) |
| self.bn1 = nn.BatchNorm2d(64) |
| self.relu = nn.ReLU(inplace=True) |
| |
| # 4 stages with 1 BasicBlock each |
| self.layer1 = self._make_layer(64, 1) # 32Γ32Γ64 |
| self.layer2 = self._make_layer(128, 1, stride=2) # 16Γ16Γ128 |
| self.layer3 = self._make_layer(256, 1, stride=2) # 8Γ8Γ256 |
| self.layer4 = self._make_layer(512, 1, stride=2) # 4Γ4Γ512 |
| |
| # Classification head |
| self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) |
| self.fc = nn.Linear(512, num_classes) |
| ``` |
| |
| **BasicBlock** (2 conv layers + skip connection): |
| ```python |
| class BasicBlock(nn.Module): |
| def forward(self, x): |
| identity = x |
| out = F.relu(self.bn1(self.conv1(x))) # Conv β BN β ReLU |
| out = self.bn2(self.conv2(out)) # Conv β BN |
| out += identity # Add skip connection |
| out = F.relu(out) # ReLU |
| return out |
| ``` |
|
|
| ## References |
|
|
| **Papers:** |
| - He et al., "Deep Residual Learning for Image Recognition" (2016) - ResNet architecture |
| - Smith, "Super-Convergence: Very Fast Training of Neural Networks" (2018) - OneCycle LR |
| - Krizhevsky, "Learning Multiple Layers of Features from Tiny Images" (2009) - CIFAR-100 |
|
|
| **Implementation Resources:** |
| - PyTorch official ResNet implementation |
| - Albumentations library for efficient augmentation |
| - torchvision.datasets for CIFAR-100 loading |
|
|
| ## License |
|
|
| MIT License |
|
|
| ## Acknowledgments |
|
|
| This project was developed with extensive assistance from: |
| - ChatGPT for architecture design and debugging |
| - Cursor AI for code completion and refactoring |
| - PyTorch and torchvision communities for reference implementations |
|
|
| --- |
|
|
| **Note:** Training logs, model checkpoints, and detailed per-class metrics are available in the `log/` and `plots_complete/` directories. |
|
|