MNIST Distilled Student Model
A neural network trained on the MNIST dataset using knowledge distillation from a teacher model.
Model Description
This is a StudentNet model trained on MNIST using knowledge distillation with the following architecture:
- Fully connected: 28 ร 28 โ 128 โ 10 (output)
- ReLU activation
The model was trained using knowledge distillation, combining:
- KL divergence between student and teacher logits (with temperature scaling)
- Cross-entropy loss on true labels
Training Details
Training Hyperparameters
- Batch size: 128
- Epochs: 10
- Learning rate: 0.001
- Weight decay: 0.0
- Optimizer: AdamW
- Training set size: 50,000
- Validation set size: 10,000
- Test set size: 10,000
- Device: cuda
- Seed: 42
Distillation Parameters
- Temperature: 3.0
- Alpha (KL weight): 0.5
The loss function is: loss = alpha ร KL_loss + (1 - alpha) ร CE_loss
Results
- Test Accuracy: 0.9785
- Test Loss: 0.0808
Usage
import torch
from pathlib import Path
# Download the model
model_path = "model.pt"
state_dict = torch.load(model_path)
# Load into your StudentNet architecture
# (you'll need to define the StudentNet class from the training script)
model = StudentNet()
model.load_state_dict(state_dict)
model.eval()
# Make predictions
with torch.no_grad():
predictions = model(images)
Dataset
The model was trained on the MNIST dataset, which contains 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels.
Model Card Authors
Generated automatically during training.
- Downloads last month
- 10