mnist-cnn-classifier / README.md

Pratik45

Initial upload: MNIST CNN classifier with 99.60% accuracy

21f4ad5 1 day ago

preview code

raw

history blame contribute delete

5.71 kB

metadata

language: en
tags:
  - pytorch
  - computer-vision
  - image-classification
  - mnist
  - digit-recognition
  - cnn
license: mit
datasets:
  - mnist
metrics:
  - accuracy
model-index:
  - name: mnist-cnn-classifier
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          name: MNIST
          type: mnist
        metrics:
          - type: accuracy
            value: 99.6
            name: Test Accuracy
          - type: accuracy
            value: 99.27
            name: Validation Accuracy

MNIST CNN Classifier

A production-ready Convolutional Neural Network for handwritten digit recognition, achieving 99.60% accuracy on the MNIST test set.

Model Description

This model uses a 4-layer CNN architecture with batch normalization and dropout for robust digit classification. It's designed for production use with comprehensive training, evaluation, and inference pipelines.

Key Features:

🎯 99.60% test accuracy on MNIST
🏗️ CNN Architecture: 4 convolutional layers + 3 fully connected layers
⚡ Fast Inference: ~5ms per image on CPU
📦 Lightweight: Only 271K parameters
🔧 Production Ready: Complete preprocessing and error handling

Model Architecture

ConvNet(
  - Conv Block 1: Conv2d(1→32) + BatchNorm + ReLU + Conv2d(32→64) + BatchNorm + ReLU + MaxPool + Dropout
  - Conv Block 2: Conv2d(64→128) + BatchNorm + ReLU + Conv2d(128→128) + BatchNorm + ReLU + MaxPool + Dropout
  - FC Block 1: Linear(6272→256) + BatchNorm + ReLU + Dropout
  - FC Block 2: Linear(256→128) + BatchNorm + ReLU + Dropout
  - Output: Linear(128→10)
)

Total Parameters: 271,114

Training Details

Training Data

Dataset: MNIST (60,000 training images)
Split: 54,000 train / 6,000 validation / 10,000 test
Augmentation: Random rotation (±10°), affine transforms, random erasing

Training Hyperparameters

Optimizer: AdamW
Learning Rate: 0.001 with OneCycleLR scheduler
Batch Size: 128
Epochs: 20 (early stopping after 17)
Weight Decay: 0.0001
Dropout: 0.3
Gradient Clipping: 1.0

Training Results

Metric	Value
Training Accuracy	98.74%
Validation Accuracy	99.27%
Test Accuracy	99.60%
Training Time	~85 minutes (CPU)

Per-Class Performance

Digit	Precision	Recall	F1-Score	Support
0	1.00	1.00	1.00	980
1	1.00	1.00	1.00	1135
2	0.99	1.00	0.99	1032
3	0.99	1.00	1.00	1010
4	1.00	1.00	1.00	982
5	1.00	0.99	0.99	892
6	1.00	0.99	1.00	958
7	0.99	0.99	0.99	1028
8	1.00	1.00	1.00	974
9	1.00	0.99	1.00	1009

Usage

Installation

pip install torch torchvision pillow numpy

Quick Start

import torch
from PIL import Image
from torchvision import transforms

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('best_model.pth', map_location=device)
model.eval()

# Preprocess image
transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.Grayscale(),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Load and predict
image = Image.open('digit.png')
image_tensor = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    output = model(image_tensor)
    prediction = output.argmax(dim=1).item()
    confidence = torch.softmax(output, dim=1).max().item()

print(f"Predicted digit: {prediction} (confidence: {confidence:.2%})")

Using the Inference Script

# Single image
python inference.py --model-path best_model.pth --image-path digit.png

# Batch inference
python inference.py --model-path best_model.pth --image-dir ./images/

Training Your Own Model

# Install requirements
pip install -r requirements.txt

# Train with default settings
python improved_mnist_classifier.py --use-gpu

# Train with custom settings
python improved_mnist_classifier.py \
    --epochs 20 \
    --batch-size 128 \
    --lr 0.001 \
    --use-gpu \
    --use-amp

Limitations and Biases

Domain: Only works for handwritten digits (0-9), not letters or symbols
Image Format: Expects 28×28 grayscale images or will resize
Background: Trained on white/light digits on dark background (MNIST format)
Quality: Performance may degrade on very blurry or distorted digits
Real-world: May need fine-tuning for specific use cases (checks, forms, etc.)

Ethical Considerations

This model is designed for digit recognition and should not be used for:

Automated decision-making without human oversight
Privacy-sensitive applications without proper consent
High-stakes scenarios without validation on domain-specific data

Citation

If you use this model, please cite:

@misc{mnist-cnn-classifier,
  author = {Your Name},
  title = {MNIST CNN Classifier: Production-Ready Digit Recognition},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/your-username/mnist-cnn-classifier}}
}

Model Card Authors

Your Name - GitHub | LinkedIn

License

MIT License - See LICENSE file for details

Acknowledgments

MNIST dataset: LeCun et al.
PyTorch framework
Hugging Face for hosting