ResNet CIFAR-10 (Trained from Scratch)
A custom ResNet trained from scratch on CIFAR-10, with analysis of the generalization gap and out-of-distribution failure modes.
| Architecture | Custom ResNet (3 residual stages, BatchNorm, ReLU) |
| Parameters | ~1.1M |
| Dataset | CIFAR-10 (50k train / 10k test, 32Γ32) |
| Train Accuracy | ~99% |
| Test Accuracy | ~91% |
| Framework | PyTorch |
| Hardware | NVIDIA RTX 4070 / 4070 Super |
Model Description
This is a lightweight ResNet-style classifier built to study how a model trained from scratch on a small, low-resolution dataset generalizes β and where it breaks.
The architecture consists of an initial 3Γ64 convolution stem followed by three residual stages (64 β 128 β 256 channels), each containing two residual blocks with skip connections. Global average pooling feeds into a single linear classification head over 10 classes.
The model achieves near-perfect training accuracy (>99%) but plateaus at ~91% on the validation set, indicating a generalization gap of ~8%. This gap is analyzed in detail below.
GitHub Repository: github.com/DavidH2802/resnet-cifar-10-from-scratch
How to Use
pip install torch torchvision huggingface_hub
# --- 1. Clone the repo ---
git clone https://github.com/DavidH2802/resnet-cifar-10-from-scratch.git
# --- 2. Download weights into checkpoints/ ---
from huggingface_hub import hf_hub_download
hf_hub_download(
repo_id="DavidH2802/resnet-cifar-10-from-scratch",
filename="best_model.pth",
local_dir="checkpoints",
local_dir_use_symlinks=False
)
# --- 3. Run inference ---
python src/infer.py your_image.jpg
# Provide image path: your_image.jpg
# Prediction: CAR | Confidence: 99.97%
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 0.001 |
| Weight Decay | 1e-4 |
| Batch Size | 128 |
| Max Epochs | 200 |
| Early Stopping Patience | 15 epochs |
| Scheduler | Early stopping on val loss |
Data Augmentation
Training images were augmented with random cropping (32Γ32 with 4px padding) and random horizontal flips. Validation and test images received no augmentation. All images were normalized to the CIFAR-10 channel-wise mean and standard deviation.
Training Dynamics
The model was trained until early stopping triggered at approximately epoch 30β40. Training loss converged near zero while validation loss plateaued significantly earlier, producing the ~8% generalization gap. This suggests the model has enough capacity to memorize the training set but relies partially on dataset-specific texture and color shortcuts rather than fully generalized semantic features.
Limitations & Failure Modes
This model was trained on 32Γ32 images from a fixed distribution. It works well on inputs that resemble CIFAR-10's data characteristics but fails predictably in several ways:
Spurious Correlations (Background Dependence)
The model predicts "Airplane" with 100% confidence on a standard photo of a plane against a blue sky. While correct, this extreme confidence suggests reliance on the dominant blue background β a strong correlate of the airplane class in CIFAR-10 β rather than the aircraft's geometry.
Feature Dependency (Wheel Detection)
A Lamborghini Veneno in side profile is correctly classified as "Car" at 99.97% confidence despite a complex crowd background. The likely driver is the wheel feature: high-contrast dark circles are the most discriminative signal for the car class at low resolution.
Out-of-Distribution Viewpoints
A top-down photo of a sedan (roof visible, no wheels) is misclassified as "Truck" at 97.42% confidence. Without the expected side-profile shape or visible wheels, the model defaults to "Truck" based on the large, rectangular metallic surface. This confirms the model has learned 2D texture/shape heuristics rather than any 3D geometric understanding of objects.
General Limitations
- Input must be resized to 32Γ32, causing severe information loss on high-resolution images.
- Performance degrades on viewpoints, lighting conditions, and backgrounds not represented in CIFAR-10.
- The model has no robustness to adversarial examples or significant domain shift.
Intended Use
This model is intended for educational and research purposes β specifically for studying generalization, overfitting dynamics, and inductive biases of small convolutional networks trained from scratch on low-resolution data. It is not intended for production use.
Dataset used to train DavidH2802/resnet-cifar-10-from-scratch
Evaluation results
- Test Accuracy on CIFAR-10self-reported91.000
- Training Accuracy on CIFAR-10self-reported99.000