Pratik45's picture
Initial upload: MNIST CNN classifier with 99.60% accuracy
21f4ad5
---
language: en
tags:
- pytorch
- computer-vision
- image-classification
- mnist
- digit-recognition
- cnn
license: mit
datasets:
- mnist
metrics:
- accuracy
model-index:
- name: mnist-cnn-classifier
results:
- task:
type: image-classification
name: Image Classification
dataset:
name: MNIST
type: mnist
metrics:
- type: accuracy
value: 99.60
name: Test Accuracy
- type: accuracy
value: 99.27
name: Validation Accuracy
---
# MNIST CNN Classifier
A production-ready Convolutional Neural Network for handwritten digit recognition, achieving **99.60% accuracy** on the MNIST test set.
## Model Description
This model uses a 4-layer CNN architecture with batch normalization and dropout for robust digit classification. It's designed for production use with comprehensive training, evaluation, and inference pipelines.
**Key Features:**
- 🎯 **99.60% test accuracy** on MNIST
- πŸ—οΈ **CNN Architecture**: 4 convolutional layers + 3 fully connected layers
- ⚑ **Fast Inference**: ~5ms per image on CPU
- πŸ“¦ **Lightweight**: Only 271K parameters
- πŸ”§ **Production Ready**: Complete preprocessing and error handling
## Model Architecture
```
ConvNet(
- Conv Block 1: Conv2d(1β†’32) + BatchNorm + ReLU + Conv2d(32β†’64) + BatchNorm + ReLU + MaxPool + Dropout
- Conv Block 2: Conv2d(64β†’128) + BatchNorm + ReLU + Conv2d(128β†’128) + BatchNorm + ReLU + MaxPool + Dropout
- FC Block 1: Linear(6272β†’256) + BatchNorm + ReLU + Dropout
- FC Block 2: Linear(256β†’128) + BatchNorm + ReLU + Dropout
- Output: Linear(128β†’10)
)
```
**Total Parameters:** 271,114
## Training Details
### Training Data
- **Dataset**: MNIST (60,000 training images)
- **Split**: 54,000 train / 6,000 validation / 10,000 test
- **Augmentation**: Random rotation (Β±10Β°), affine transforms, random erasing
### Training Hyperparameters
- **Optimizer**: AdamW
- **Learning Rate**: 0.001 with OneCycleLR scheduler
- **Batch Size**: 128
- **Epochs**: 20 (early stopping after 17)
- **Weight Decay**: 0.0001
- **Dropout**: 0.3
- **Gradient Clipping**: 1.0
### Training Results
| Metric | Value |
|--------|-------|
| Training Accuracy | 98.74% |
| Validation Accuracy | 99.27% |
| Test Accuracy | **99.60%** |
| Training Time | ~85 minutes (CPU) |
### Per-Class Performance
| Digit | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 0 | 1.00 | 1.00 | 1.00 | 980 |
| 1 | 1.00 | 1.00 | 1.00 | 1135 |
| 2 | 0.99 | 1.00 | 0.99 | 1032 |
| 3 | 0.99 | 1.00 | 1.00 | 1010 |
| 4 | 1.00 | 1.00 | 1.00 | 982 |
| 5 | 1.00 | 0.99 | 0.99 | 892 |
| 6 | 1.00 | 0.99 | 1.00 | 958 |
| 7 | 0.99 | 0.99 | 0.99 | 1028 |
| 8 | 1.00 | 1.00 | 1.00 | 974 |
| 9 | 1.00 | 0.99 | 1.00 | 1009 |
## Usage
### Installation
```bash
pip install torch torchvision pillow numpy
```
### Quick Start
```python
import torch
from PIL import Image
from torchvision import transforms
# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('best_model.pth', map_location=device)
model.eval()
# Preprocess image
transform = transforms.Compose([
transforms.Resize((28, 28)),
transforms.Grayscale(),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Load and predict
image = Image.open('digit.png')
image_tensor = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
output = model(image_tensor)
prediction = output.argmax(dim=1).item()
confidence = torch.softmax(output, dim=1).max().item()
print(f"Predicted digit: {prediction} (confidence: {confidence:.2%})")
```
### Using the Inference Script
```bash
# Single image
python inference.py --model-path best_model.pth --image-path digit.png
# Batch inference
python inference.py --model-path best_model.pth --image-dir ./images/
```
## Training Your Own Model
```bash
# Install requirements
pip install -r requirements.txt
# Train with default settings
python improved_mnist_classifier.py --use-gpu
# Train with custom settings
python improved_mnist_classifier.py \
--epochs 20 \
--batch-size 128 \
--lr 0.001 \
--use-gpu \
--use-amp
```
## Limitations and Biases
- **Domain**: Only works for handwritten digits (0-9), not letters or symbols
- **Image Format**: Expects 28Γ—28 grayscale images or will resize
- **Background**: Trained on white/light digits on dark background (MNIST format)
- **Quality**: Performance may degrade on very blurry or distorted digits
- **Real-world**: May need fine-tuning for specific use cases (checks, forms, etc.)
## Ethical Considerations
This model is designed for digit recognition and should not be used for:
- Automated decision-making without human oversight
- Privacy-sensitive applications without proper consent
- High-stakes scenarios without validation on domain-specific data
## Citation
If you use this model, please cite:
```bibtex
@misc{mnist-cnn-classifier,
author = {Your Name},
title = {MNIST CNN Classifier: Production-Ready Digit Recognition},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/your-username/mnist-cnn-classifier}}
}
```
## Model Card Authors
- **Your Name** - [GitHub](https://github.com/your-username) | [LinkedIn](https://linkedin.com/in/your-profile)
## License
MIT License - See LICENSE file for details
## Acknowledgments
- MNIST dataset: LeCun et al.
- PyTorch framework
- Hugging Face for hosting