MNIST MLP Classifier

This is a fully connected MNIST digit classifier trained as part of the dlab deep-learning experimentation roadmap.

The model is intentionally an MLP rather than a CNN. It is useful as a strong baseline for studying optimization, regularization, augmentation, seed variance, and the ceiling of non-convolutional models on MNIST.

Results

10-seed confirmation sweep:

metric	result
validation accuracy	99.3600% ± 0.0817 pp
validation loss	0.15172 ± 0.00235
test accuracy	99.4470% ± 0.0195 pp
test loss	0.14746 ± 0.00034
test errors	55.3 ± 1.95 / 10000

Architecture

Dataset: MNIST
Model: MLP
Hidden width: 1024
Hidden layers: 3
Activation: ReLU
Batch normalization: enabled
Dropout: 0.2
Optimizer: Adam
Learning rate: 0.001
Scheduler: OneCycleLR
Weight decay: 0.0001
Label smoothing: 0.02
Weight averaging: EMA
Train augmentation: random affine, 10 degrees, translate 0.05, scale 0.9-1.1

Files

model.ckpt: PyTorch Lightning checkpoint from the best run.
model.onnx: ONNX export of the EMA/current model state used for evaluation.
config.yaml: resolved Hydra training config.
metrics.csv: training metrics from the run.
metadata.json: compact metadata for inference and provenance.

Preprocessing

Inputs should be MNIST grayscale images converted to tensors and normalized with:

mean = [0.1307]
std = [0.3081]

The ONNX model expects float tensors shaped [batch, 1, 28, 28] under the input name images, and returns class logits under the output name logits.

Provenance

W&B run: https://wandb.ai/tsilva/dlab/runs/gsuy1ifx
W&B sweep: https://wandb.ai/tsilva/dlab/sweeps/xa56lubb

Downloads last month: 23

tsilva
/

mnist-mlp-classifier