--- license: mit datasets: - Pie33000/image-net metrics: - accuracy --- # AlexNet ImageNet Training ## 1. Introduction This repository contains a **from-scratch PyTorch implementation of AlexNet** trained on the ImageNet-1K dataset. It reproduces the classic 2012 network with modern training utilities such as data augmentation, learning-rate warm-up, and cosine/step decay scheduling.

## 2. Project Structure ``` ├── model.py # AlexNet architecture (5 conv + 3 fc) ├── load_data.py # ImageNet dataloaders & preprocessing ├── train.py # Training / validation loop & scheduler setup ├── models/ # (auto-created) checkpoints & logs └── README.md # You are here ``` ### `model.py` * **Features block** – 5 convolutional layers: 1. 96 × \(11\times11\) conv, stride 4 2. 256 × \(5\times5\) conv, padding 2 3. 384 × \(3\times3\) conv, padding 1 4. 384 × \(3\times3\) conv, padding 1 5. 256 × \(3\times3\) conv, padding 1 * **Classifier** – flatten → 4096 → 4096 → 1000 with ReLU and Dropout. * Optional Kaiming/Xavier weight initialisation via `--init_weights`. ### `load_data.py` * **Training augmentations** – resize shorter side to 256 px → random 224-px crop → horizontal flip. * **Validation augmentations** – resize 256 px → **TenCrop(224)** (5 crops + mirror) → normalisation. * Returns two PyTorch `DataLoader`s. ### `train.py` * Implements the epoch/iteration loop, loss backwards pass, accuracy calculation and checkpointing. * Supports **learning-rate warm-up** for the first *N* epochs (`--warmup_epochs`). * Choose between **step decay** or **cosine annealing** via `--scheduler`. * Logs Top-1 accuracy & loss to `models/top1_accuracy.txt` and saves a checkpoint every 10 epochs. ## 3. Dataset The code expects the ImageNet directory in the original layout: ``` ILSVRC2012 ├── train │ ├── n01440764 │ │ ├── n01440764_10026.JPEG │ │ └── ... └── val ├── n01440764 │ ├── ILSVRC2012_val_00000293.JPEG │ └── ... ``` Pass the root directory with `--root /path/to/ILSVRC2012`. > 💡 **ImageNet licence** – obtaining the dataset requires registration with the ImageNet website. ## 4. Installation ```bash # (Optional) create a virtual environment python -m venv .venv && source .venv/bin/activate pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu # or the CUDA wheels if you have a GPU ``` ## 5. Training Run: ```bash python train.py \ --root /datasets/ILSVRC2012 \ --device cuda:0 # or cpu / mps ``` Common flags: * `--epochs` (default 100) * `--batch_size` (default 128) * `--lr`, `--momentum`, `--weight_decay` * `--scheduler` `step|cosine` + `--lr_step_size`, `--lr_gamma` * `--warmup_epochs` – linear warm-up length * `--save_dir` – directory for checkpoints & logs ### Resuming / fine-tuning To resume from a checkpoint: ```bash python train.py --root /datasets/ILSVRC2012 --device cuda \ --init_weights False \ --save_dir models \ --epochs 30 # then inside train.py adapt: model.load_state_dict(torch.load('models/model_XX.pth')) ``` ## 6. Metrics The script prints **Top-1 Accuracy** after every epoch. You can extend it to Top-5 with: ```python maxk = 5 _, pred = logits.topk(maxk, 1, True, True) # (batch, 5) correct = pred.eq(labels.view(-1, 1).expand_as(pred)) correct_top5 += correct.any(1).float().sum().item() ``` ## 7. Citation If you use this code in your research, please cite: > Krizhevsky, Alex, Ilya Sutskever, and Geoffrey Hinton. "ImageNet classification with deep convolutional neural networks." *NeurIPS* 2012. ## 8. License license: mit