π§ͺ ResNet vs Plain CNN on CIFAR-10 (Degradation Study)
This repository contains implementations, trained weights, and experiment notebooks comparing plain convolutional networks and residual networks on the CIFAR-10 dataset, demonstrating the degradation problem and how residual connections mitigate it.
π Contents
checkpoints/β trained model weights: Plain20, Plain56, ResNet20, ResNet56notebooks/β Jupyter notebooks covering architectures, training, evaluation, and comparisonsresults/β performance plots (accuracy, loss curves, degradation behaviour)README.mdβ this file
π§ Experiment Summary
Motivation
As network depth increases, plain convolutional nets may suffer increased training error (the βdegradationβ problem). Residual networks (ResNets) with skip-connections address this problem.
Models
Dataset
- CIFAR-10: 10 classes, 32Γ32 colour images
- 50k Examples in Train set
- 10k Examples in Test Set
Key Findings
- Plain56 shows higher training/test error than Plain20 (degradation).
- ResNet56 trains and generalises better than Plain56, illustrating the value of skip-connections.
- Detailed curves and comparisons in
notebooks/andresults/.
π Results
1. The Degradation Problem (Plain Networks)
Notice how the 56-layer network (Red) has higher loss and error than the 20-layer network (Blue).
2. The ResNet Solution
With skip connections, the 56-layer network (Red) now outperforms the 20-layer network (Blue).
βοΈ How to Use
import torch
from huggingface_hub import hf_hub_download
from models import create_model
repo_id = "arpit-gour02/resnet-vs-plainnets-cifar10"
ckpt = hf_hub_download(repo_id=repo_id, filename="resnet56.pth")
model = create_model("resnet56", num_classes=10)
state_dict = torch.load(ckpt, map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
# Example inference
x = torch.randn(1, 3, 32, 32)
logits = model(x)
pred = logits.argmax(dim=1).item()
print("Predicted class:", pred)
βοΈ Training Configuration
The training setup strictly follows the original paper to ensure fair comparison.
| Hyperparameter | Value |
|---|---|
| Dataset | CIFAR-10 |
| Batch Size | 128 |
| Optimizer | SGD (Stochastic Gradient Descent) |
| Initial Learning Rate | 0.1 |
| Momentum | 0.9 |
| Weight Decay | 0.0001 ($10^{-4}$) |
| Total Epochs | ~164 (64k iterations) |
| Initialization | He Normal (kaiming_normal_) |
Learning Rate Schedule
We use a MultiStepLR scheduler to drop the learning rate when the loss plateaus.
- Epoch 0 - 81:
lr = 0.1 - Epoch 82 - 122:
lr = 0.01 - Epoch 123 - End:
lr = 0.001
πΌοΈ Data Preprocessing
Standard data augmentation is applied to prevent overfitting on the small CIFAR-10 images:
- Normalization: Per-channel mean subtraction and division by standard deviation.
- Mean:
(0.4914, 0.4822, 0.4465) - Std:
(0.2023, 0.1994, 0.2010)
- Mean:
- Padding: Pad 4 pixels on each side (image becomes $40 \times 40$).
- Random Crop: Crop back to $32 \times 32$.
- Horizontal Flip: Random probability of 0.5.









