HF Deploy
Deploy CIFAR-100 classifier
a92663e
---
title: CIFAR-100 Image Classifier
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.49.1"
app_file: app.py
pinned: false
license: mit
---
# CIFAR-100 ResNet Training from Scratch
A ResNet-34 model trained from scratch on CIFAR-100 dataset, achieving **76.68% top-1 accuracy** in 100 epochs with OneCycle Learning Rate scheduling.
## Project Overview
This project demonstrates training a ResNet architecture from scratch on the CIFAR-100 dataset without using any pre-trained models. The implementation leverages modern deep learning techniques including data augmentation, OneCycle LR scheduling, and mixed precision training.
## Results Summary
### Performance Metrics (100 Epochs)
| Metric | Score |
|--------|-------|
| **Top-1 Accuracy** | **76.68%** βœ… (Target: 73%) |
| **Top-3 Accuracy** | **90.95%** |
| **Top-5 Accuracy** | **94.07%** |
| **Best Test Accuracy** | **76.79%** (Epoch 99) |
| **Macro F1-Score** | **0.7670** |
| **Weighted F1-Score** | **0.7668** |
### Averaged Metrics
**Macro-Averaged (unweighted):**
- Precision: 0.7708
- Recall: 0.7668
- F1-Score: 0.7670
**Weighted-Averaged (by class support):**
- Precision: 0.7708
- Recall: 0.7668
- F1-Score: 0.7668
## Training Configuration
### Model Architecture
**Custom Lightweight ResNet for CIFAR-100**
A specially designed ResNet variant optimized for small image classification:
```
Model: ResNet34 (CIFAR-optimized)
Total Parameters: 4,949,412 (~5M)
Trainable Parameters: 4,949,412
Input Size: 32Γ—32Γ—3 (RGB)
Output Classes: 100
```
**Architecture Details** (from `model_cifar.py`):
### Layer-by-Layer Feature Map Progression
| Layer | Operation | Kernel | Stride | Padding | Input Size | Output Size | Channels | Receptive Field |
|-------|-----------|--------|--------|---------|------------|-------------|----------|-----------------|
| **Input** | - | - | - | - | 32Γ—32 | 32Γ—32 | 3 | 1Γ—1 |
| **conv1** | Conv2d | 3Γ—3 | 1 | 1 | 32Γ—32Γ—3 | 32Γ—32Γ—64 | 64 | **3Γ—3** |
| **bn1+relu** | BN+ReLU | - | - | - | 32Γ—32Γ—64 | 32Γ—32Γ—64 | 64 | 3Γ—3 |
| **layer1** | BasicBlock | 3Γ—3,3Γ—3 | 1,1 | 1,1 | 32Γ—32Γ—64 | **32Γ—32Γ—64** | 64 | **7Γ—7** |
| **layer2** | BasicBlock | 3Γ—3,3Γ—3 | 2,1 | 1,1 | 32Γ—32Γ—64 | **16Γ—16Γ—128** | 128 | **15Γ—15** |
| **layer3** | BasicBlock | 3Γ—3,3Γ—3 | 2,1 | 1,1 | 16Γ—16Γ—128 | **8Γ—8Γ—256** | 256 | **31Γ—31** |
| **layer4** | BasicBlock | 3Γ—3,3Γ—3 | 2,1 | 1,1 | 8Γ—8Γ—256 | **4Γ—4Γ—512** | 512 | **63Γ—63** |
| **avgpool** | AdaptiveAvgPool2d | 4Γ—4 | - | - | 4Γ—4Γ—512 | 1Γ—1Γ—512 | 512 | **Full image** |
| **fc** | Linear | - | - | - | 512 | 100 | 100 | - |
**Key Observations:**
- **Receptive field at layer4**: 63Γ—63 pixels (covers **full 32Γ—32 image** with 2Γ— margin)
- **Spatial downsampling**: 3 stride-2 operations reduce 32Γ—32 β†’ 4Γ—4 (8Γ— reduction)
- **Channel expansion**: 3 β†’ 64 β†’ 128 β†’ 256 β†’ 512 (progressive feature richness)
- **Feature map efficiency**: No information loss from MaxPooling (common in ImageNet models)
### Detailed Architecture Components
1. **Initial Convolution Block**
```
Input: 32Γ—32Γ—3 β†’ Conv2d(3β†’64, k=3Γ—3, s=1, p=1) β†’ BN β†’ ReLU β†’ Output: 32Γ—32Γ—64
Receptive Field: 1Γ—1 β†’ 3Γ—3
```
- CIFAR-optimized: 3Γ—3 conv (not 7Γ—7 like ImageNet ResNets)
- Preserves spatial resolution (no stride-2 or MaxPool)
- Captures fine-grained details essential for small images
2. **Layer 1: Residual Stage 1** (64 channels, no downsampling)
```
Input: 32Γ—32Γ—64
BasicBlock:
β”œβ”€ Conv(64β†’64, k=3Γ—3, s=1, p=1) β†’ BN β†’ ReLU β†’ 32Γ—32Γ—64
β”œβ”€ Conv(64β†’64, k=3Γ—3, s=1, p=1) β†’ BN β†’ 32Γ—32Γ—64
└─ Add(identity) β†’ ReLU β†’ Output: 32Γ—32Γ—64
Receptive Field: 3Γ—3 β†’ 7Γ—7
```
- No spatial downsampling (stride=1)
- Identity skip connection (no projection needed)
- RF grows by 4 pixels (2 conv layers Γ— 2 pixels each)
3. **Layer 2: Residual Stage 2** (128 channels, downsample)
```
Input: 32Γ—32Γ—64
BasicBlock:
β”œβ”€ Conv(64β†’128, k=3Γ—3, s=2, p=1) β†’ BN β†’ ReLU β†’ 16Γ—16Γ—128
β”œβ”€ Conv(128β†’128, k=3Γ—3, s=1, p=1) β†’ BN β†’ 16Γ—16Γ—128
β”œβ”€ Skip: Conv(64β†’128, k=1Γ—1, s=2) β†’ BN β†’ 16Γ—16Γ—128 (projection)
└─ Add(skip) β†’ ReLU β†’ Output: 16Γ—16Γ—128
Receptive Field: 7Γ—7 β†’ 15Γ—15
```
- **Spatial downsampling**: 32Γ—32 β†’ 16Γ—16 (stride=2 in first conv)
- **Channel expansion**: 64 β†’ 128
- **Projection shortcut**: 1Γ—1 conv matches dimensions
- RF doubles due to stride-2 convolution
4. **Layer 3: Residual Stage 3** (256 channels, downsample)
```
Input: 16Γ—16Γ—128
BasicBlock:
β”œβ”€ Conv(128β†’256, k=3Γ—3, s=2, p=1) β†’ BN β†’ ReLU β†’ 8Γ—8Γ—256
β”œβ”€ Conv(256β†’256, k=3Γ—3, s=1, p=1) β†’ BN β†’ 8Γ—8Γ—256
β”œβ”€ Skip: Conv(128β†’256, k=1Γ—1, s=2) β†’ BN β†’ 8Γ—8Γ—256 (projection)
└─ Add(skip) β†’ ReLU β†’ Output: 8Γ—8Γ—256
Receptive Field: 15Γ—15 β†’ 31Γ—31
```
- **Spatial downsampling**: 16Γ—16 β†’ 8Γ—8
- **Channel expansion**: 128 β†’ 256
- RF now covers most of the input image
5. **Layer 4: Residual Stage 4** (512 channels, downsample)
```
Input: 8Γ—8Γ—256
BasicBlock:
β”œβ”€ Conv(256β†’512, k=3Γ—3, s=2, p=1) β†’ BN β†’ ReLU β†’ 4Γ—4Γ—512
β”œβ”€ Conv(512β†’512, k=3Γ—3, s=1, p=1) β†’ BN β†’ 4Γ—4Γ—512
β”œβ”€ Skip: Conv(256β†’512, k=1Γ—1, s=2) β†’ BN β†’ 4Γ—4Γ—512 (projection)
└─ Add(skip) β†’ ReLU β†’ Output: 4Γ—4Γ—512
Receptive Field: 31Γ—31 β†’ 63Γ—63
```
- **Final spatial downsampling**: 8Γ—8 β†’ 4Γ—4
- **Maximum channels**: 512 (highest feature richness)
- **RF exceeds input size**: 63Γ—63 > 32Γ—32 (full image context)
6. **Classification Head**
```
Input: 4Γ—4Γ—512
β”œβ”€ AdaptiveAvgPool2d((1,1)) β†’ 1Γ—1Γ—512 (global spatial pooling)
β”œβ”€ Flatten β†’ 512
└─ Linear(512 β†’ 100) β†’ 100 class logits
```
- Global Average Pooling: Each of 512 channels β†’ single value
- Reduces overfitting vs fully-connected layers
- Translation invariant features
7. **Initialization Strategy**
- **Kaiming (He) Normal** for Conv2d weights
- Optimal for ReLU activations
- `std = sqrt(2 / fan_in)`
- **Constant initialization** for BatchNorm
- weight = 1, bias = 0
### Architecture Flow Diagram
```
Input Image (32Γ—32Γ—3, RF=1Γ—1)
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STEM: Conv 3Γ—3 β†’ BN β†’ ReLU β”‚
β”‚ Output: 32Γ—32Γ—64, RF=3Γ—3 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 1: BasicBlock (64 channels, stride=1) β”‚
β”‚ Conv 3Γ—3 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLU β”‚
β”‚ Output: 32Γ—32Γ—64, RF=7Γ—7 β”‚
β”‚ Skip: Identity (no projection needed) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ [Spatial: 32Γ—32, Channels: 64, RF: 7Γ—7]
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 2: BasicBlock (128 channels, stride=2) ↓↓ β”‚
β”‚ Conv 3Γ—3,s2 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLUβ”‚
β”‚ Output: 16Γ—16Γ—128, RF=15Γ—15 β”‚
β”‚ Skip: Conv 1Γ—1,s2 (projection: 64β†’128) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ [Spatial: 16Γ—16, Channels: 128, RF: 15Γ—15]
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 3: BasicBlock (256 channels, stride=2) ↓↓ β”‚
β”‚ Conv 3Γ—3,s2 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLUβ”‚
β”‚ Output: 8Γ—8Γ—256, RF=31Γ—31 β”‚
β”‚ Skip: Conv 1Γ—1,s2 (projection: 128β†’256) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ [Spatial: 8Γ—8, Channels: 256, RF: 31Γ—31]
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STAGE 4: BasicBlock (512 channels, stride=2) ↓↓ β”‚
β”‚ Conv 3Γ—3,s2 β†’ BN β†’ ReLU β†’ Conv 3Γ—3 β†’ BN β†’ (+) β†’ ReLUβ”‚
β”‚ Output: 4Γ—4Γ—512, RF=63Γ—63 (exceeds 32Γ—32!) β”‚
β”‚ Skip: Conv 1Γ—1,s2 (projection: 256β†’512) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ [Spatial: 4Γ—4, Channels: 512, RF: Full Image]
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ HEAD: Global Average Pooling β†’ FC β”‚
β”‚ AdaptiveAvgPool2d(1,1) β†’ Flatten β†’ Linear(512β†’100) β”‚
β”‚ Output: 100 class logits β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
Predictions (100 classes)
```
**Key Design Choices:**
- βœ… **CIFAR-specific stem**: 3Γ—3 conv instead of 7Γ—7 (ImageNet-style)
- βœ… **No aggressive downsampling**: Preserves spatial information for 32Γ—32 images
- βœ… **Lightweight**: 1 block per stage instead of [3,4,6,3] for efficient training
- βœ… **Residual connections**: Enable gradient flow for deeper networks
- βœ… **Global Average Pooling**: Reduces overfitting vs fully-connected layers
- βœ… **Progressive RF growth**: Each layer sees more context (7β†’15β†’31β†’63 pixels)
### Training Hyperparameters
```python
Epochs: 100
Batch Size: 512
Optimizer: SGD with Nesterov momentum
Momentum: 0.9
Weight Decay: 1e-4
Label Smoothing: 0.1
Mixed Precision: Enabled (AMP)
Gradient Clipping: 1.0
# OneCycle Learning Rate Schedule
LR Schedule: OneCycle (Custom)
- Phase 1 (Epochs 0-40): 0.01 β†’ 0.1 (warmup)
- Phase 2 (Epochs 41-81): 0.1 β†’ 0.01 (cooldown)
- Phase 3 (Epochs 82-99): 0.01 β†’ 0.001 (annihilation)
```
### Data Augmentation
Using Albumentations library:
- **Training:**
- Random padding (32β†’36) + Random crop (36β†’32)
- Horizontal flip (p=0.5)
- ShiftScaleRotate (shift=0.05, scale=0.05, rotate=5Β°, p=0.3)
- CoarseDropout/Cutout (16Γ—16, p=0.4)
- Color jitter (brightness, contrast, saturation, hue, p=0.4)
- Normalization (CIFAR-100 mean/std)
- **Testing:**
- Normalization only
## Training Results
### Training Curves
![Training Curves](plots_complete/training_curves.png)
The training curves show:
- **Steady convergence** with minimal overfitting
- **Effective learning rate schedule** with OneCycle policy
- **Generalization gap** maintained below 5% throughout training
- **Final training accuracy:** 80.47%
### Learning Rate Schedule
![Learning Rate Schedule](plots_complete/learning_rate_schedule.png)
The OneCycle LR schedule implementation:
1. **Warmup Phase (41 epochs):** Linear increase from 0.01 to 0.1
2. **Cooldown Phase (41 epochs):** Linear decrease from 0.1 to 0.01
3. **Annihilation Phase (18 epochs):** Linear decrease from 0.01 to 0.001
This schedule helps the model:
- Escape local minima early in training
- Find a wide minimum for better generalization
- Fine-tune with very small learning rates at the end
### Per-Class Performance
![Class Metrics](plots_complete/class_metrics.png)
**Top 5 Best Performing Classes:**
1. **wardrobe** - F1: 0.9458 (Precision: 0.9320, Recall: 0.9600)
2. **sunflower** - F1: 0.9381 (Precision: 0.9681, Recall: 0.9100)
3. **poppy** - F1: 0.9315 (Precision: 0.9444, Recall: 0.9189)
4. **can** - F1: 0.9310 (Precision: 0.9000, Recall: 0.9643)
5. **skyscraper** - F1: 0.9100 (Precision: 0.9100, Recall: 0.9100)
**Most Challenging Classes:**
- **boy** - F1: 0.4286 (Fine-grained human features)
- **girl** - F1: 0.4646 (Similar to boy)
- **baby** - F1: 0.5079 (Fine-grained human features)
- **man** - F1: 0.5758 (Similar to boy)
- **plate** - F1: 0.5797 (Simple objects, easily confused)
The model performs exceptionally well on distinct objects (flowers, buildings, furniture) but struggles with fine-grained human categorization, which is expected for CIFAR-100's 32Γ—32 resolution.
## Model Architecture Summary
From `model_cifar.py`:
| Component | Specification |
|-----------|---------------|
| **Model Name** | ResNet34 (CIFAR-optimized) |
| **Total Parameters** | 4,949,412 (~5M) |
| **Architecture Depth** | 10 weight layers (1 initial + 8 residual + 1 FC) |
| **Residual Blocks** | 4 BasicBlocks (1 per stage) |
| **Channel Progression** | 3 β†’ 64 β†’ 128 β†’ 256 β†’ 512 β†’ 100 |
| **Spatial Downsampling** | 32Γ—32 β†’ 16Γ—16 β†’ 8Γ—8 β†’ 4Γ—4 β†’ 1Γ—1 |
| **Receptive Field Growth** | 1Γ—1 β†’ 3Γ—3 β†’ 7Γ—7 β†’ 15Γ—15 β†’ 31Γ—31 β†’ 63Γ—63 |
| **Skip Connections** | 4 (1 identity + 3 projection shortcuts) |
| **Pooling Strategy** | Global Average Pooling (4Γ—4 β†’ 1Γ—1) |
| **Initialization** | Kaiming Normal (He) for Conv, Constant for BN |
| **Downsampling Method** | Strided convolutions (no MaxPool) |
**Why This Architecture Works for CIFAR-100:**
1. **Right-sized capacity**: 5M parameters balances expressiveness vs overfitting risk
2. **Preserved resolution**: No aggressive downsampling maintains spatial detail in 32Γ—32 images
3. **Optimal receptive field**: 63Γ—63 RF exceeds input size (32Γ—32), capturing full image context
4. **Progressive downsampling**: 3 stride-2 ops (vs 1 MaxPool + 4 stride-2 in ImageNet ResNet)
5. **Residual learning**: Skip connections enable gradient flow through 10 weight layers
6. **Efficient computation**: Lightweight design trains in ~2-3 hours on single GPU
**Receptive Field Analysis:**
- By **layer2** (16Γ—16Γ—128): RF = 15Γ—15 β†’ covers ~50% of image
- By **layer3** (8Γ—8Γ—256): RF = 31Γ—31 β†’ covers ~95% of image
- By **layer4** (4Γ—4Γ—512): RF = 63Γ—63 β†’ covers **full image + context**
- Each neuron in final feature map can "see" the entire input image
## Project Structure
```
CIFAR100/
β”œβ”€β”€ main.py # Main training script with OneCycle LR
β”œβ”€β”€ model_cifar.py # Custom ResNet architecture (5M params)
β”‚ β”œβ”€β”€ BasicBlock # 2-layer residual block with skip connection
β”‚ └── ResNet34 # CIFAR-optimized ResNet variant
β”œβ”€β”€ train.py # Training and evaluation loops
β”œβ”€β”€ preprocess.py # Data loading with Albumentations
β”œβ”€β”€ visualization.py # Metrics calculation and plotting
β”œβ”€β”€ inference.py # Model inference utilities
β”œβ”€β”€ app.py # Gradio web interface for demo
β”œβ”€β”€ run_complete_training.py # Full training pipeline with logging
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ log/ # Training logs
β”‚ └── training_complete_20251010-103227.log
└── plots_complete/ # Training visualizations
β”œβ”€β”€ training_curves.png
β”œβ”€β”€ learning_rate_schedule.png
β”œβ”€β”€ class_metrics.png
β”œβ”€β”€ confusion_matrix.png
└── classification_report.txt
```
## Quick Start
### Installation
```bash
# Clone the repository
git clone <your-repo-url>
cd CIFAR100
# Install dependencies
pip install -r requirements.txt
```
### Training
```bash
# Train with OneCycle LR for 100 epochs
python main.py \
--scheduler onecycle \
--epochs 100 \
--batch_size 512 \
--lr 0.1 \
--momentum 0.9 \
--weight_decay 1e-4 \
--amp \
--plot_training \
--plot_evaluation
# Or use the complete training script with logging
python run_complete_training.py
```
### Inference
```bash
# Run interactive web demo
python app.py
# Or use inference script
python inference.py --image path/to/image.jpg --model snapshots/best_model.pth
```
## Key Features
### 1. **OneCycle Learning Rate Policy**
Implements the OneCycle LR schedule from "Super-Convergence: Very Fast Training of Neural Networks" paper:
- Achieves faster convergence
- Better generalization
- Higher final accuracy
### 2. **Comprehensive Metrics Logging**
After each training run, the script automatically outputs:
- Training and test accuracy/loss curves
- Top-1, Top-3, Top-5 accuracies
- Precision, Recall, F1-Score (macro and weighted)
- Per-class performance breakdown
- Confusion matrix and classification report
### 3. **Mixed Precision Training (AMP)**
- 2-3x faster training on modern GPUs
- Reduced memory usage
- Maintains accuracy with float16/float32 mixed precision
### 4. **Advanced Data Augmentation**
Uses Albumentations for efficient augmentation:
- Faster than torchvision transforms
- More augmentation options
- GPU-compatible with minimal overhead
### 5. **Model Checkpointing**
- Automatic snapshot saving at specified intervals
- Best model tracking based on test accuracy
- Resume training from any checkpoint
## Detailed Training Log
Full training logs are available in `log/training_complete_20251010-103227.log`, including:
- Per-epoch train/test loss and accuracy
- Learning rate at each epoch
- Final comprehensive evaluation with per-class metrics
- Training time and resource utilization
Example final output:
```
======================================================================
TRAINING COMPLETED - FINAL EVALUATION
======================================================================
TRAINING SUMMARY
----------------------------------------------------------------------
Total Epochs Trained: 100
Final Training Loss: 0.5584
Final Training Accuracy: 80.47%
Best Training Accuracy: 81.05% (Epoch 94)
Final Learning Rate: 0.001500
TEST/VALIDATION SUMMARY
----------------------------------------------------------------------
Final Test Loss: 0.8985
Final Test Accuracy: 76.68%
Best Test Accuracy: 76.79% (Epoch 99)
COMPREHENSIVE TEST SET METRICS
----------------------------------------------------------------------
Top-1 Accuracy (Test): 76.68%
Top-3 Accuracy (Test): 90.95%
Top-5 Accuracy (Test): 94.07%
```
## Requirements Met
βœ… **Training from Scratch**: Custom ResNet (5M params) trained without pre-trained weights
βœ… **CIFAR-100 Dataset**: All 100 classes used (50,000 train / 10,000 test)
βœ… **Target Accuracy**: **76.68% achieved** (target: 73%) - **Exceeded by 3.68%**
βœ… **Training Duration**: 100 epochs with OneCycle LR schedule
βœ… **Modern Tools**: Extensive use of ChatGPT/Cursor for development
βœ… **Comprehensive Evaluation**: Full metrics, plots, and detailed analysis
βœ… **Model Architecture**: Custom lightweight ResNet optimized for CIFAR-100
βœ… **Reproducibility**: Complete logs, checkpoints, and configuration documented
## Technologies Used
- **PyTorch** - Deep learning framework
- **Albumentations** - Data augmentation
- **Gradio** - Web interface for inference
- **scikit-learn** - Metrics calculation
- **matplotlib/seaborn** - Visualization
- **numpy** - Numerical operations
## Model Comparison
| Model Variant | Parameters | Expected Accuracy | Notes |
|---------------|------------|-------------------|-------|
| **Our Model** (4 blocks) | **5M** | **76.68%** | Balanced efficiency & accuracy |
| Standard ResNet-18 | 11M | ~76-78% | Good baseline for CIFAR |
| Standard ResNet-34 | 21M | ~78-80% | More capacity, slower training |
| Wide-ResNet-28-10 | 36M | ~80-82% | State-of-art, requires more resources |
| PyramidNet | 26M | ~82-84% | Complex architecture |
**Our lightweight design achieves competitive accuracy with 2-4Γ— fewer parameters than standard ResNets.**
## Future Improvements
Potential enhancements to reach higher accuracy (78%+):
1. **Architecture upgrades**:
- Increase blocks per stage: [2, 2, 2, 2] or [3, 3, 3, 3]
- Try Wide-ResNet with wider channels
- Add Squeeze-and-Excitation (SE) blocks
2. **Training tricks**:
- Mixup (Ξ±=0.2) for better generalization
- CutMix for spatial regularization
- AutoAugment or RandAugment policies
3. **Regularization**:
- Stochastic Depth (survival probability 0.8-0.9)
- DropBlock for spatial dropout
- Increased label smoothing (0.2)
4. **Ensemble methods**:
- Train 3-5 models with different seeds
- Snapshot ensembles (save last N checkpoints)
5. **Longer training**:
- 200-300 epochs with cosine annealing
- Multi-step or exponential LR decay
6. **Knowledge distillation**:
- Train larger teacher model first
- Use soft targets for student training
## Technical Implementation Details
### Architecture Design Rationale
**Why a lightweight ResNet variant?**
1. **CIFAR-100 Image Size**: At 32Γ—32 pixels, CIFAR images contain less spatial information than ImageNet (224Γ—224)
- Standard ResNet-34's [3,4,6,3] block structure is over-parameterized
- Our [1,1,1,1] structure provides sufficient capacity without overfitting
2. **Parameter Efficiency**:
- 5M parameters: Sweet spot between underfitting and overfitting
- Faster training: 100 epochs in ~2-3 hours vs 5-6 hours for ResNet-34
- Lower memory footprint: Can use larger batch sizes
3. **CIFAR-Specific Modifications**:
- **3Γ—3 initial conv** (vs 7Γ—7): Preserves fine details in small images
- **No MaxPool layer**: Maintains spatial resolution (32Γ—32 β†’ 4Γ—4 over 4 stages)
- **Stride-2 convolutions**: Gradual downsampling for feature hierarchy
### Code Reference
From `model_cifar.py`:
```python
class ResNet34(nn.Module):
def __init__(self, num_classes=100):
super().__init__()
self.in_channels = 64
# CIFAR-specific: 3Γ—3 conv, no maxpool
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
# 4 stages with 1 BasicBlock each
self.layer1 = self._make_layer(64, 1) # 32Γ—32Γ—64
self.layer2 = self._make_layer(128, 1, stride=2) # 16Γ—16Γ—128
self.layer3 = self._make_layer(256, 1, stride=2) # 8Γ—8Γ—256
self.layer4 = self._make_layer(512, 1, stride=2) # 4Γ—4Γ—512
# Classification head
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512, num_classes)
```
**BasicBlock** (2 conv layers + skip connection):
```python
class BasicBlock(nn.Module):
def forward(self, x):
identity = x
out = F.relu(self.bn1(self.conv1(x))) # Conv β†’ BN β†’ ReLU
out = self.bn2(self.conv2(out)) # Conv β†’ BN
out += identity # Add skip connection
out = F.relu(out) # ReLU
return out
```
## References
**Papers:**
- He et al., "Deep Residual Learning for Image Recognition" (2016) - ResNet architecture
- Smith, "Super-Convergence: Very Fast Training of Neural Networks" (2018) - OneCycle LR
- Krizhevsky, "Learning Multiple Layers of Features from Tiny Images" (2009) - CIFAR-100
**Implementation Resources:**
- PyTorch official ResNet implementation
- Albumentations library for efficient augmentation
- torchvision.datasets for CIFAR-100 loading
## License
MIT License
## Acknowledgments
This project was developed with extensive assistance from:
- ChatGPT for architecture design and debugging
- Cursor AI for code completion and refactoring
- PyTorch and torchvision communities for reference implementations
---
**Note:** Training logs, model checkpoints, and detailed per-class metrics are available in the `log/` and `plots_complete/` directories.