ResNet CIFAR-10 (Trained from Scratch)

A custom ResNet trained from scratch on CIFAR-10, with analysis of the generalization gap and out-of-distribution failure modes.


Architecture	Custom ResNet (3 residual stages, BatchNorm, ReLU)
Parameters	~1.1M
Dataset	CIFAR-10 (50k train / 10k test, 32×32)
Train Accuracy	~99%
Test Accuracy	~91%
Framework	PyTorch
Hardware	NVIDIA RTX 4070 / 4070 Super

Model Description

This is a lightweight ResNet-style classifier built to study how a model trained from scratch on a small, low-resolution dataset generalizes — and where it breaks.

The architecture consists of an initial 3×64 convolution stem followed by three residual stages (64 → 128 → 256 channels), each containing two residual blocks with skip connections. Global average pooling feeds into a single linear classification head over 10 classes.

The model achieves near-perfect training accuracy (>99%) but plateaus at ~91% on the validation set, indicating a generalization gap of ~8%. This gap is analyzed in detail below.

GitHub Repository: github.com/DavidH2802/resnet-cifar-10-from-scratch

How to Use

pip install torch torchvision huggingface_hub

# --- 1. Clone the repo ---
git clone https://github.com/DavidH2802/resnet-cifar-10-from-scratch.git
 
# --- 2. Download weights into checkpoints/ ---
 
from huggingface_hub import hf_hub_download
 
hf_hub_download(
    repo_id="DavidH2802/resnet-cifar-10-from-scratch",
    filename="best_model.pth",
    local_dir="checkpoints",
    local_dir_use_symlinks=False
)

# --- 3. Run inference ---
 
python src/infer.py your_image.jpg
# Provide image path: your_image.jpg
# Prediction: CAR | Confidence: 99.97%

Training Details

Hyperparameters

Parameter	Value
Optimizer	AdamW
Learning Rate	0.001
Weight Decay	1e-4
Batch Size	128
Max Epochs	200
Early Stopping Patience	15 epochs
Scheduler	Early stopping on val loss

Data Augmentation

Training images were augmented with random cropping (32×32 with 4px padding) and random horizontal flips. Validation and test images received no augmentation. All images were normalized to the CIFAR-10 channel-wise mean and standard deviation.

Training Dynamics

The model was trained until early stopping triggered at approximately epoch 30–40. Training loss converged near zero while validation loss plateaued significantly earlier, producing the ~8% generalization gap. This suggests the model has enough capacity to memorize the training set but relies partially on dataset-specific texture and color shortcuts rather than fully generalized semantic features.

Limitations & Failure Modes

This model was trained on 32×32 images from a fixed distribution. It works well on inputs that resemble CIFAR-10's data characteristics but fails predictably in several ways:

Spurious Correlations (Background Dependence)

The model predicts "Airplane" with 100% confidence on a standard photo of a plane against a blue sky. While correct, this extreme confidence suggests reliance on the dominant blue background — a strong correlate of the airplane class in CIFAR-10 — rather than the aircraft's geometry.

Feature Dependency (Wheel Detection)

A Lamborghini Veneno in side profile is correctly classified as "Car" at 99.97% confidence despite a complex crowd background. The likely driver is the wheel feature: high-contrast dark circles are the most discriminative signal for the car class at low resolution.

Out-of-Distribution Viewpoints

A top-down photo of a sedan (roof visible, no wheels) is misclassified as "Truck" at 97.42% confidence. Without the expected side-profile shape or visible wheels, the model defaults to "Truck" based on the large, rectangular metallic surface. This confirms the model has learned 2D texture/shape heuristics rather than any 3D geometric understanding of objects.

General Limitations

Input must be resized to 32×32, causing severe information loss on high-resolution images.
Performance degrades on viewpoints, lighting conditions, and backgrounds not represented in CIFAR-10.
The model has no robustness to adversarial examples or significant domain shift.

Intended Use

This model is intended for educational and research purposes — specifically for studying generalization, overfitting dynamics, and inductive biases of small convolutional networks trained from scratch on low-resolution data. It is not intended for production use.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train DavidH2802/resnet-cifar-10-from-scratch

Evaluation results

Test Accuracy on CIFAR-10
self-reported

91.000
Training Accuracy on CIFAR-10
self-reported

99.000