alexnet / README.md
Pie33000's picture
Create README.md
1e19031 verified
---
license: mit
datasets:
- Pie33000/image-net
metrics:
- accuracy
---
# AlexNet ImageNet Training
## 1. Introduction
This repository contains a **from-scratch PyTorch implementation of AlexNet** trained on the ImageNet-1K dataset. It reproduces the classic 2012 network with modern training utilities such as data augmentation, learning-rate warm-up, and cosine/step decay scheduling.
<p align="center">
<img src="https://upload.wikimedia.org/wikipedia/commons/6/60/AlexNet.svg" width="550"/>
</p>
## 2. Project Structure
```
β”œβ”€β”€ model.py # AlexNet architecture (5 conv + 3 fc)
β”œβ”€β”€ load_data.py # ImageNet dataloaders & preprocessing
β”œβ”€β”€ train.py # Training / validation loop & scheduler setup
β”œβ”€β”€ models/ # (auto-created) checkpoints & logs
└── README.md # You are here
```
### `model.py`
* **Features block** – 5 convolutional layers:
1. 96 Γ— \(11\times11\) conv, stride 4
2. 256 Γ— \(5\times5\) conv, padding 2
3. 384 Γ— \(3\times3\) conv, padding 1
4. 384 Γ— \(3\times3\) conv, padding 1
5. 256 Γ— \(3\times3\) conv, padding 1
* **Classifier** – flatten β†’ 4096 β†’ 4096 β†’ 1000 with ReLU and Dropout.
* Optional Kaiming/Xavier weight initialisation via `--init_weights`.
### `load_data.py`
* **Training augmentations** – resize shorter side to 256 px β†’ random 224-px crop β†’ horizontal flip.
* **Validation augmentations** – resize 256 px β†’ **TenCrop(224)** (5 crops + mirror) β†’ normalisation.
* Returns two PyTorch `DataLoader`s.
### `train.py`
* Implements the epoch/iteration loop, loss backwards pass, accuracy calculation and checkpointing.
* Supports **learning-rate warm-up** for the first *N* epochs (`--warmup_epochs`).
* Choose between **step decay** or **cosine annealing** via `--scheduler`.
* Logs Top-1 accuracy & loss to `models/top1_accuracy.txt` and saves a checkpoint every 10 epochs.
## 3. Dataset
The code expects the ImageNet directory in the original layout:
```
ILSVRC2012
β”œβ”€β”€ train
β”‚ β”œβ”€β”€ n01440764
β”‚ β”‚ β”œβ”€β”€ n01440764_10026.JPEG
β”‚ β”‚ └── ...
└── val
β”œβ”€β”€ n01440764
β”‚ β”œβ”€β”€ ILSVRC2012_val_00000293.JPEG
β”‚ └── ...
```
Pass the root directory with `--root /path/to/ILSVRC2012`.
> πŸ’‘ **ImageNet licence** – obtaining the dataset requires registration with the ImageNet website.
## 4. Installation
```bash
# (Optional) create a virtual environment
python -m venv .venv && source .venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# or the CUDA wheels if you have a GPU
```
## 5. Training
Run:
```bash
python train.py \
--root /datasets/ILSVRC2012 \
--device cuda:0 # or cpu / mps
```
Common flags:
* `--epochs` (default 100)
* `--batch_size` (default 128)
* `--lr`, `--momentum`, `--weight_decay`
* `--scheduler` `step|cosine` + `--lr_step_size`, `--lr_gamma`
* `--warmup_epochs` – linear warm-up length
* `--save_dir` – directory for checkpoints & logs
### Resuming / fine-tuning
To resume from a checkpoint:
```bash
python train.py --root /datasets/ILSVRC2012 --device cuda \
--init_weights False \
--save_dir models \
--epochs 30
# then inside train.py adapt: model.load_state_dict(torch.load('models/model_XX.pth'))
```
## 6. Metrics
The script prints **Top-1 Accuracy** after every epoch. You can extend it to Top-5 with:
```python
maxk = 5
_, pred = logits.topk(maxk, 1, True, True) # (batch, 5)
correct = pred.eq(labels.view(-1, 1).expand_as(pred))
correct_top5 += correct.any(1).float().sum().item()
```
## 7. Citation
If you use this code in your research, please cite:
> Krizhevsky, Alex, Ilya Sutskever, and Geoffrey Hinton. "ImageNet classification with deep convolutional neural networks." *NeurIPS* 2012.
## 8. License
license: mit