🧪 ResNet vs Plain CNN on CIFAR-10 (Degradation Study)

This repository contains implementations, trained weights, and experiment notebooks comparing plain convolutional networks and residual networks on the CIFAR-10 dataset, demonstrating the degradation problem and how residual connections mitigate it.

📋 Contents

checkpoints/ — trained model weights: Plain20, Plain56, ResNet20, ResNet56
notebooks/ — Jupyter notebooks covering architectures, training, evaluation, and comparisons
results/ — performance plots (accuracy, loss curves, degradation behaviour)
README.md — this file

🧠 Experiment Summary

Motivation

As network depth increases, plain convolutional nets may suffer increased training error (the “degradation” problem). Residual networks (ResNets) with skip-connections address this problem.

Models

Plain20 — plain CNN, 20 layers
Plain56 — plain CNN, 56 layers (demonstrates degradation)
ResNet20 — residual network, 20 layers
ResNet56 — residual network, 56 layers (shows improved depth)

Dataset

CIFAR-10: 10 classes, 32×32 colour images
50k Examples in Train set
10k Examples in Test Set

Key Findings

Plain56 shows higher training/test error than Plain20 (degradation).
ResNet56 trains and generalises better than Plain56, illustrating the value of skip-connections.
Detailed curves and comparisons in notebooks/ and results/.

📊 Results

1. The Degradation Problem (Plain Networks)

Notice how the 56-layer network (Red) has higher loss and error than the 20-layer network (Blue).

2. The ResNet Solution

With skip connections, the 56-layer network (Red) now outperforms the 20-layer network (Blue).

⚙️ How to Use

import torch
from huggingface_hub import hf_hub_download
from models import create_model

repo_id = "arpit-gour02/resnet-vs-plainnets-cifar10"

ckpt = hf_hub_download(repo_id=repo_id, filename="resnet56.pth")
model = create_model("resnet56", num_classes=10)
state_dict = torch.load(ckpt, map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

# Example inference
x = torch.randn(1, 3, 32, 32)
logits = model(x)
pred = logits.argmax(dim=1).item()
print("Predicted class:", pred)

⚙️ Training Configuration

The training setup strictly follows the original paper to ensure fair comparison.

Hyperparameter	Value
Dataset	CIFAR-10
Batch Size	128
Optimizer	SGD (Stochastic Gradient Descent)
Initial Learning Rate	0.1
Momentum	0.9
Weight Decay	0.0001 ($10^{-4}$)
Total Epochs	~164 (64k iterations)
Initialization	He Normal (`kaiming_normal_`)

Learning Rate Schedule

We use a MultiStepLR scheduler to drop the learning rate when the loss plateaus.

Epoch 0 - 81: lr = 0.1
Epoch 82 - 122: lr = 0.01
Epoch 123 - End: lr = 0.001

🖼️ Data Preprocessing

Standard data augmentation is applied to prevent overfitting on the small CIFAR-10 images:

Normalization: Per-channel mean subtraction and division by standard deviation.
- Mean: (0.4914, 0.4822, 0.4465)
- Std: (0.2023, 0.1994, 0.2010)
Padding: Pad 4 pixels on each side (image becomes $40 \times 40$).
Random Crop: Crop back to $32 \times 32$.
Horizontal Flip: Random probability of 0.5.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

arpit-gour02
/

resnet-vs-plainnets-cifar10