Model Card β Handwritten Digit Classifier (CNN)
A Convolutional Neural Network (CNN) trained on the MNIST dataset to classify handwritten digits (0β9) with high accuracy. Designed for real-time inference in a web-based drawing interface.
Model Details
Model Description
This model is a CNN trained from scratch on the MNIST benchmark dataset. It accepts 28Γ28 grayscale images of handwritten digits and outputs a probability distribution over 10 classes (digits 0β9). It is the backbone of the Digit Classifier web app.
- Developed by: Abdul Rafay
- Model type: Convolutional Neural Network (CNN)
- Language(s): N/A (Computer Vision β image input only)
- License: MIT
- Framework: PyTorch 2.0+
- Finetuned from: Trained from scratch (no pretrained base)
Model Sources
- Demo: Hugging Face Space
digit_classifier(1)
Uses
Direct Use
This model can be used directly to classify 28Γ28 grayscale images of handwritten digits β no fine-tuning required. It is best suited for:
- Educational demos of deep learning and CNNs
- Handwritten digit recognition in controlled environments
- Integration into apps via the provided web UI or API
Downstream Use
The model can be fine-tuned or adapted for:
- Multi-digit number recognition (e.g., street numbers, forms)
- Similar single-character classification tasks
- Transfer learning baseline for other image classification problems
Out-of-Scope Use
This model is not suitable for:
- Recognizing letters, symbols, or non-digit characters
- Noisy, real-world document scans without preprocessing
- Multi-digit or multi-character sequences in a single image
- Safety-critical systems (e.g., medical, legal document processing)
Bias, Risks, and Limitations
- Dataset bias: MNIST digits are clean, centered, and size-normalized. The model may underperform on digits written in non-Western styles, extreme stroke widths, or unusual orientations.
- Domain shift: Performance degrades on images that differ significantly from the MNIST distribution (e.g., photos of digits on paper, different fonts).
- No uncertainty calibration: The model outputs softmax probabilities, which may appear confident even on out-of-distribution inputs.
Recommendations
- Preprocess input images to 28Γ28 grayscale and center/normalize digits before inference.
- Do not rely on model confidence scores alone β add a rejection threshold for production use.
- Evaluate on your specific distribution before deploying in any real-world scenario.
How to Get Started with the Model
import torch
from torchvision import transforms
from PIL import Image
from model import Model # your model definition
# Load model
model = Model()
model.load_state_dict(torch.load("model.pt"))
model.eval()
# Preprocess image
transform = transforms.Compose([
transforms.Grayscale(),
transforms.Resize((28, 28)),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
img = Image.open("digit.png")
tensor = transform(img).unsqueeze(0) # shape: [1, 1, 28, 28]
# Predict
with torch.no_grad():
output = model(tensor)
prediction = output.argmax(dim=1).item()
print(f"Predicted digit: {prediction}")
Training Details
Training Data
- Dataset: MNIST β 70,000 grayscale images (60,000 train / 10,000 test)
- Input size: 28Γ28 pixels, single channel
- Classes: 10 (digits 0β9)
Training Procedure
Preprocessing
- Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081)
- Training augmentation: random rotation (Β±10Β°), random affine with translation (Β±10%), scale (0.9β1.1Γ), and shear (Β±5Β°)
- Test images: normalization only β no augmentation
Training Hyperparameters
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 3e-3 (max, OneCycleLR) |
| Weight Decay | 1e-4 |
| Batch Size | 64 |
| Epochs | 50 |
| Loss Function | CrossEntropyLoss |
| Label Smoothing | 0.1 |
| LR Scheduler | OneCycleLR (10% warmup, cosine anneal) |
| Dropout (conv) | 0.25 (Dropout2d) |
| Dropout (FC) | 0.25 |
| Random Seed | 23 |
| Training regime | fp32 |
Speeds, Sizes, Times
- Training time: ~10 minutes on a single GPU (NVIDIA T4, Google Colab)
- Model parameters: 160,842
- Inference speed: <50ms per image (CPU)
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluated on the standard MNIST test split β 10,000 images not seen during training.
Factors
Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata).
Metrics
- Accuracy β primary metric; proportion of correctly classified digits
- Confusion Matrix β to identify per-class error patterns
Results
| Metric | Value |
|---|---|
| Test Accuracy | 99.43% |
Per-Class Accuracy
| Digit | Correct | Errors | Accuracy |
|---|---|---|---|
| 0 | 980 | 0 | 100.0% |
| 1 | 1132 | 3 | 99.7% |
| 2 | 1025 | 7 | 99.3% |
| 3 | 1008 | 2 | 99.8% |
| 4 | 976 | 6 | 99.4% |
| 5 | 885 | 7 | 99.2% |
| 6 | 949 | 9 | 99.1% |
| 7 | 1020 | 8 | 99.2% |
| 8 | 968 | 6 | 99.4% |
| 9 | 1000 | 9 | 99.1% |
Summary
The model achieves 99.43% accuracy on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity.
Model Examination
The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions.
Environmental Impact
Carbon emissions estimated using the ML Impact Calculator.
| Factor | Value |
|---|---|
| Hardware Type | NVIDIA T4 GPU |
| Hours Used | ~0.2 hrs (10 min) |
| Cloud Provider | Google Colab |
| Compute Region | Singapore |
| Carbon Emitted | ~0.01 kg COβeq (est.) |
Technical Specifications
Model Architecture
The model uses 4 convolutional blocks followed by a compact fully connected head.
Convolutional Blocks
| Block | Layer | Output Shape | Details |
|---|---|---|---|
| Block 1 | Conv2d | (32, 28, 28) | 32 filters, 3Γ3, padding=1 |
| BatchNorm2d | (32, 28, 28) | β | |
| ReLU | (32, 28, 28) | β | |
| MaxPool2d | (32, 14, 14) | 2Γ2 | |
| Dropout2d | (32, 14, 14) | p=0.25 | |
| Block 2 | Conv2d | (64, 14, 14) | 64 filters, 3Γ3, padding=1 |
| BatchNorm2d | (64, 14, 14) | β | |
| ReLU | (64, 14, 14) | β | |
| MaxPool2d | (64, 7, 7) | 2Γ2 | |
| Dropout2d | (64, 7, 7) | p=0.25 | |
| Block 3 | Conv2d | (128, 7, 7) | 128 filters, 3Γ3, padding=1 |
| BatchNorm2d | (128, 7, 7) | β | |
| ReLU | (128, 7, 7) | β | |
| MaxPool2d | (128, 3, 3) | 2Γ2 | |
| Dropout2d | (128, 3, 3) | p=0.25 | |
| Block 4 | Conv2d | (256, 3, 3) | 256 filters, 1Γ1 kernel (no pad) |
| BatchNorm2d | (256, 3, 3) | β | |
| ReLU | (256, 3, 3) | β | |
| MaxPool2d | (256, 1, 1) | 2Γ2 | |
| Dropout2d | (256, 1, 1) | p=0.25 |
Fully Connected Layers
| Layer | Output | Details |
|---|---|---|
| Flatten | 256 | 256 Γ 1 Γ 1 = 256 |
| Linear | 128 | + ReLU + Dropout(0.25) |
| Linear | 10 | Raw logits |
Total Parameters: 160,842
Shape Flow
Input: (B, 1, 28, 28)
Block 1: (B, 32, 14, 14)
Block 2: (B, 64, 7, 7)
Block 3: (B, 128, 3, 3)
Block 4: (B, 256, 1, 1)
Flatten: (B, 256)
FC1: (B, 128)
Output: (B, 10)
Compute Infrastructure
- Hardware: NVIDIA T4 GPU (Google Colab)
- Software: Python 3.10+, PyTorch 2.0, torchvision
Citation
If you use this model in your work, please cite:
BibTeX:
@misc{digit-classifier-2026,
author = {Abdul Rafay},
title = {Handwritten Digit Classifier (CNN on MNIST)},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/abdurafay19/Digit-Classifier}
}
APA:
Abdul Rafay. (2026). Handwritten Digit Classifier (CNN on MNIST). Hugging Face. https://huggingface.co/abdurafay19/Digit-Classifier
Glossary
| Term | Definition |
|---|---|
| CNN | Convolutional Neural Network β a deep learning architecture suited for image data |
| MNIST | A benchmark dataset of 70,000 handwritten digit images |
| Softmax | Activation function that converts raw outputs to probabilities summing to 1 |
| Dropout | Regularization technique that randomly disables neurons during training |
| BatchNorm | Batch Normalization β normalizes layer activations to stabilize and speed up training |
| OneCycleLR | Learning rate schedule with warmup and cosine decay for faster convergence |
| Label Smoothing | Softens hard targets to reduce overconfidence and improve generalization |
| Grad-CAM | Gradient-weighted Class Activation Mapping β a model interpretability technique |
Model Card Authors
Abdul Rafay β abdulrafay17wolf@gmail.com
Model Card Contact
For questions or issues, open a GitHub issue at github.com/abdurafay19/Digit-Classifier or reach out via Hugging Face.