Pratik45's picture
Initial upload: MNIST CNN classifier with 99.60% accuracy
21f4ad5
metadata
language: en
tags:
  - pytorch
  - computer-vision
  - image-classification
  - mnist
  - digit-recognition
  - cnn
license: mit
datasets:
  - mnist
metrics:
  - accuracy
model-index:
  - name: mnist-cnn-classifier
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          name: MNIST
          type: mnist
        metrics:
          - type: accuracy
            value: 99.6
            name: Test Accuracy
          - type: accuracy
            value: 99.27
            name: Validation Accuracy

MNIST CNN Classifier

A production-ready Convolutional Neural Network for handwritten digit recognition, achieving 99.60% accuracy on the MNIST test set.

Model Description

This model uses a 4-layer CNN architecture with batch normalization and dropout for robust digit classification. It's designed for production use with comprehensive training, evaluation, and inference pipelines.

Key Features:

  • 🎯 99.60% test accuracy on MNIST
  • πŸ—οΈ CNN Architecture: 4 convolutional layers + 3 fully connected layers
  • ⚑ Fast Inference: ~5ms per image on CPU
  • πŸ“¦ Lightweight: Only 271K parameters
  • πŸ”§ Production Ready: Complete preprocessing and error handling

Model Architecture

ConvNet(
  - Conv Block 1: Conv2d(1β†’32) + BatchNorm + ReLU + Conv2d(32β†’64) + BatchNorm + ReLU + MaxPool + Dropout
  - Conv Block 2: Conv2d(64β†’128) + BatchNorm + ReLU + Conv2d(128β†’128) + BatchNorm + ReLU + MaxPool + Dropout
  - FC Block 1: Linear(6272β†’256) + BatchNorm + ReLU + Dropout
  - FC Block 2: Linear(256β†’128) + BatchNorm + ReLU + Dropout
  - Output: Linear(128β†’10)
)

Total Parameters: 271,114

Training Details

Training Data

  • Dataset: MNIST (60,000 training images)
  • Split: 54,000 train / 6,000 validation / 10,000 test
  • Augmentation: Random rotation (Β±10Β°), affine transforms, random erasing

Training Hyperparameters

  • Optimizer: AdamW
  • Learning Rate: 0.001 with OneCycleLR scheduler
  • Batch Size: 128
  • Epochs: 20 (early stopping after 17)
  • Weight Decay: 0.0001
  • Dropout: 0.3
  • Gradient Clipping: 1.0

Training Results

Metric Value
Training Accuracy 98.74%
Validation Accuracy 99.27%
Test Accuracy 99.60%
Training Time ~85 minutes (CPU)

Per-Class Performance

Digit Precision Recall F1-Score Support
0 1.00 1.00 1.00 980
1 1.00 1.00 1.00 1135
2 0.99 1.00 0.99 1032
3 0.99 1.00 1.00 1010
4 1.00 1.00 1.00 982
5 1.00 0.99 0.99 892
6 1.00 0.99 1.00 958
7 0.99 0.99 0.99 1028
8 1.00 1.00 1.00 974
9 1.00 0.99 1.00 1009

Usage

Installation

pip install torch torchvision pillow numpy

Quick Start

import torch
from PIL import Image
from torchvision import transforms

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('best_model.pth', map_location=device)
model.eval()

# Preprocess image
transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.Grayscale(),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Load and predict
image = Image.open('digit.png')
image_tensor = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    output = model(image_tensor)
    prediction = output.argmax(dim=1).item()
    confidence = torch.softmax(output, dim=1).max().item()

print(f"Predicted digit: {prediction} (confidence: {confidence:.2%})")

Using the Inference Script

# Single image
python inference.py --model-path best_model.pth --image-path digit.png

# Batch inference
python inference.py --model-path best_model.pth --image-dir ./images/

Training Your Own Model

# Install requirements
pip install -r requirements.txt

# Train with default settings
python improved_mnist_classifier.py --use-gpu

# Train with custom settings
python improved_mnist_classifier.py \
    --epochs 20 \
    --batch-size 128 \
    --lr 0.001 \
    --use-gpu \
    --use-amp

Limitations and Biases

  • Domain: Only works for handwritten digits (0-9), not letters or symbols
  • Image Format: Expects 28Γ—28 grayscale images or will resize
  • Background: Trained on white/light digits on dark background (MNIST format)
  • Quality: Performance may degrade on very blurry or distorted digits
  • Real-world: May need fine-tuning for specific use cases (checks, forms, etc.)

Ethical Considerations

This model is designed for digit recognition and should not be used for:

  • Automated decision-making without human oversight
  • Privacy-sensitive applications without proper consent
  • High-stakes scenarios without validation on domain-specific data

Citation

If you use this model, please cite:

@misc{mnist-cnn-classifier,
  author = {Your Name},
  title = {MNIST CNN Classifier: Production-Ready Digit Recognition},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/your-username/mnist-cnn-classifier}}
}

Model Card Authors

License

MIT License - See LICENSE file for details

Acknowledgments

  • MNIST dataset: LeCun et al.
  • PyTorch framework
  • Hugging Face for hosting