alexnet / README.md

Create README.md

1e19031 verified 8 months ago

3.91 kB

	---
	license: mit
	datasets:
	- Pie33000/image-net
	metrics:
	- accuracy
	---
	# AlexNet ImageNet Training

	## 1. Introduction
	This repository contains a from-scratch PyTorch implementation of AlexNet trained on the ImageNet-1K dataset. It reproduces the classic 2012 network with modern training utilities such as data augmentation, learning-rate warm-up, and cosine/step decay scheduling.

	<p align="center">
	<img src="https://upload.wikimedia.org/wikipedia/commons/6/60/AlexNet.svg" width="550"/>
	</p>

	## 2. Project Structure
	```
	├── model.py # AlexNet architecture (5 conv + 3 fc)
	├── load_data.py # ImageNet dataloaders & preprocessing
	├── train.py # Training / validation loop & scheduler setup
	├── models/ # (auto-created) checkpoints & logs
	└── README.md # You are here
	```

	### `model.py`
	* Features block – 5 convolutional layers:
	1. 96 × \(11\times11\) conv, stride 4
	2. 256 × \(5\times5\) conv, padding 2
	3. 384 × \(3\times3\) conv, padding 1
	4. 384 × \(3\times3\) conv, padding 1
	5. 256 × \(3\times3\) conv, padding 1
	* Classifier – flatten → 4096 → 4096 → 1000 with ReLU and Dropout.
	* Optional Kaiming/Xavier weight initialisation via `--init_weights`.

	### `load_data.py`
	* Training augmentations – resize shorter side to 256 px → random 224-px crop → horizontal flip.
	* Validation augmentations – resize 256 px → TenCrop(224) (5 crops + mirror) → normalisation.
	* Returns two PyTorch `DataLoader`s.

	### `train.py`
	* Implements the epoch/iteration loop, loss backwards pass, accuracy calculation and checkpointing.
	* Supports learning-rate warm-up for the first N epochs (`--warmup_epochs`).
	* Choose between step decay or cosine annealing via `--scheduler`.
	* Logs Top-1 accuracy & loss to `models/top1_accuracy.txt` and saves a checkpoint every 10 epochs.

	## 3. Dataset
	The code expects the ImageNet directory in the original layout:
	```
	ILSVRC2012
	├── train
	│ ├── n01440764
	│ │ ├── n01440764_10026.JPEG
	│ │ └── ...
	└── val
	├── n01440764
	│ ├── ILSVRC2012_val_00000293.JPEG
	│ └── ...
	```
	Pass the root directory with `--root /path/to/ILSVRC2012`.

	> 💡 ImageNet licence – obtaining the dataset requires registration with the ImageNet website.

	## 4. Installation
	```bash
	# (Optional) create a virtual environment
	python -m venv .venv && source .venv/bin/activate

	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
	# or the CUDA wheels if you have a GPU
	```

	## 5. Training
	Run:
	```bash
	python train.py \
	--root /datasets/ILSVRC2012 \
	--device cuda:0 # or cpu / mps
	```

	Common flags:
	* `--epochs` (default 100)
	* `--batch_size` (default 128)
	* `--lr`, `--momentum`, `--weight_decay`
	* `--scheduler` `step\|cosine` + `--lr_step_size`, `--lr_gamma`
	* `--warmup_epochs` – linear warm-up length
	* `--save_dir` – directory for checkpoints & logs

	### Resuming / fine-tuning
	To resume from a checkpoint:
	```bash
	python train.py --root /datasets/ILSVRC2012 --device cuda \
	--init_weights False \
	--save_dir models \
	--epochs 30
	# then inside train.py adapt: model.load_state_dict(torch.load('models/model_XX.pth'))
	```

	## 6. Metrics
	The script prints Top-1 Accuracy after every epoch. You can extend it to Top-5 with:
	```python
	maxk = 5
	_, pred = logits.topk(maxk, 1, True, True) # (batch, 5)
	correct = pred.eq(labels.view(-1, 1).expand_as(pred))
	correct_top5 += correct.any(1).float().sum().item()
	```

	## 7. Citation
	If you use this code in your research, please cite:
	> Krizhevsky, Alex, Ilya Sutskever, and Geoffrey Hinton. "ImageNet classification with deep convolutional neural networks." NeurIPS 2012.

	## 8. License
	license: mit