MNIST Distilled Student Model

A neural network trained on the MNIST dataset using knowledge distillation from a teacher model.

Model Description

This is a StudentNet model trained on MNIST using knowledge distillation with the following architecture:

  • Fully connected: 28 ร— 28 โ†’ 128 โ†’ 10 (output)
  • ReLU activation

The model was trained using knowledge distillation, combining:

  • KL divergence between student and teacher logits (with temperature scaling)
  • Cross-entropy loss on true labels

Training Details

Training Hyperparameters

  • Batch size: 128
  • Epochs: 10
  • Learning rate: 0.001
  • Weight decay: 0.0
  • Optimizer: AdamW
  • Training set size: 50,000
  • Validation set size: 10,000
  • Test set size: 10,000
  • Device: cuda
  • Seed: 42

Distillation Parameters

  • Temperature: 3.0
  • Alpha (KL weight): 0.5

The loss function is: loss = alpha ร— KL_loss + (1 - alpha) ร— CE_loss

Results

  • Test Accuracy: 0.9785
  • Test Loss: 0.0808

Usage

import torch
from pathlib import Path

# Download the model
model_path = "model.pt"
state_dict = torch.load(model_path)

# Load into your StudentNet architecture
# (you'll need to define the StudentNet class from the training script)
model = StudentNet()
model.load_state_dict(state_dict)
model.eval()

# Make predictions
with torch.no_grad():
    predictions = model(images)

Dataset

The model was trained on the MNIST dataset, which contains 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels.

Model Card Authors

Generated automatically during training.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support