Digit-Classifier / README.md
abdurafay19's picture
Update README.md
541d601 verified
---
language:
- en
license: mit
tags:
- image-classification
- digit-recognition
- cnn
- mnist
- pytorch
datasets:
- mnist
metrics:
- accuracy
---
# Model Card β€” Handwritten Digit Classifier (CNN)
A Convolutional Neural Network (CNN) trained on the MNIST dataset to classify handwritten digits (0–9) with high accuracy. Designed for real-time inference in a web-based drawing interface.
---
## Model Details
### Model Description
This model is a CNN trained from scratch on the MNIST benchmark dataset. It accepts 28Γ—28 grayscale images of handwritten digits and outputs a probability distribution over 10 classes (digits 0–9). It is the backbone of the [Digit Classifier web app](https://huggingface.co/spaces/abdurafay19/Digit-Classifier).
- **Developed by:** [Abdul Rafay](www.linkedin.com/in/abdurafay19)
- **Model type:** Convolutional Neural Network (CNN)
- **Language(s):** N/A (Computer Vision β€” image input only)
- **License:** MIT
- **Framework:** PyTorch 2.0+
- **Finetuned from:** Trained from scratch (no pretrained base)
### Model Sources
- **Demo:** [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier)
---
digit_classifier(1)
## Uses
### Direct Use
This model can be used directly to classify 28Γ—28 grayscale images of handwritten digits β€” no fine-tuning required. It is best suited for:
- Educational demos of deep learning and CNNs
- Handwritten digit recognition in controlled environments
- Integration into apps via the provided web UI or API
### Downstream Use
The model can be fine-tuned or adapted for:
- Multi-digit number recognition (e.g., street numbers, forms)
- Similar single-character classification tasks
- Transfer learning baseline for other image classification problems
### Out-of-Scope Use
This model is **not** suitable for:
- Recognizing letters, symbols, or non-digit characters
- Noisy, real-world document scans without preprocessing
- Multi-digit or multi-character sequences in a single image
- Safety-critical systems (e.g., medical, legal document processing)
---
## Bias, Risks, and Limitations
- **Dataset bias:** MNIST digits are clean, centered, and size-normalized. The model may underperform on digits written in non-Western styles, extreme stroke widths, or unusual orientations.
- **Domain shift:** Performance degrades on images that differ significantly from the MNIST distribution (e.g., photos of digits on paper, different fonts).
- **No uncertainty calibration:** The model outputs softmax probabilities, which may appear confident even on out-of-distribution inputs.
### Recommendations
- Preprocess input images to 28Γ—28 grayscale and center/normalize digits before inference.
- Do not rely on model confidence scores alone β€” add a rejection threshold for production use.
- Evaluate on your specific distribution before deploying in any real-world scenario.
---
## How to Get Started with the Model
```python
import torch
from torchvision import transforms
from PIL import Image
from model import Model # your model definition
# Load model
model = Model()
model.load_state_dict(torch.load("model.pt"))
model.eval()
# Preprocess image
transform = transforms.Compose([
transforms.Grayscale(),
transforms.Resize((28, 28)),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
img = Image.open("digit.png")
tensor = transform(img).unsqueeze(0) # shape: [1, 1, 28, 28]
# Predict
with torch.no_grad():
output = model(tensor)
prediction = output.argmax(dim=1).item()
print(f"Predicted digit: {prediction}")
```
---
## Training Details
### Training Data
- **Dataset:** [MNIST](https://huggingface.co/datasets/mnist) β€” 70,000 grayscale images (60,000 train / 10,000 test)
- **Input size:** 28Γ—28 pixels, single channel
- **Classes:** 10 (digits 0–9)
### Training Procedure
#### Preprocessing
- Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081)
- Training augmentation: random rotation (Β±10Β°), random affine with translation (Β±10%), scale (0.9–1.1Γ—), and shear (Β±5Β°)
- Test images: normalization only β€” no augmentation
#### Training Hyperparameters
| Parameter | Value |
|-----------------|------------------------------|
| Optimizer | AdamW |
| Learning Rate | 3e-3 (max, OneCycleLR) |
| Weight Decay | 1e-4 |
| Batch Size | 64 |
| Epochs | 50 |
| Loss Function | CrossEntropyLoss |
| Label Smoothing | 0.1 |
| LR Scheduler | OneCycleLR (10% warmup, cosine anneal) |
| Dropout (conv) | 0.25 (Dropout2d) |
| Dropout (FC) | 0.25 |
| Random Seed | 23 |
| Training regime | fp32 |
#### Speeds, Sizes, Times
- **Training time:** ~10 minutes on a single GPU (NVIDIA T4, Google Colab)
- **Model parameters:** 160,842
- **Inference speed:** <50ms per image (CPU)
---
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
Evaluated on the standard MNIST test split β€” 10,000 images not seen during training.
#### Factors
Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata).
#### Metrics
- **Accuracy** β€” primary metric; proportion of correctly classified digits
- **Confusion Matrix** β€” to identify per-class error patterns
### Results
| Metric | Value |
|---------------|---------|
| Test Accuracy | 99.43% |
#### Per-Class Accuracy
| Digit | Correct | Errors | Accuracy |
|-------|---------|--------|----------|
| 0 | 980 | 0 | 100.0% |
| 1 | 1132 | 3 | 99.7% |
| 2 | 1025 | 7 | 99.3% |
| 3 | 1008 | 2 | 99.8% |
| 4 | 976 | 6 | 99.4% |
| 5 | 885 | 7 | 99.2% |
| 6 | 949 | 9 | 99.1% |
| 7 | 1020 | 8 | 99.2% |
| 8 | 968 | 6 | 99.4% |
| 9 | 1000 | 9 | 99.1% |
#### Summary
The model achieves **99.43% accuracy** on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity.
---
## Model Examination
The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions.
---
## Environmental Impact
Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute).
| Factor | Value |
|-----------------|------------------------|
| Hardware Type | NVIDIA T4 GPU |
| Hours Used | ~0.2 hrs (10 min) |
| Cloud Provider | Google Colab |
| Compute Region | Singapore |
| Carbon Emitted | ~0.01 kg COβ‚‚eq (est.) |
---
## Technical Specifications
### Model Architecture
The model uses 4 convolutional blocks followed by a compact fully connected head.
#### Convolutional Blocks
| Block | Layer | Output Shape | Details |
|---------|-------------|----------------|--------------------------------------|
| Block 1 | Conv2d | (32, 28, 28) | 32 filters, 3Γ—3, padding=1 |
| | BatchNorm2d | (32, 28, 28) | β€” |
| | ReLU | (32, 28, 28) | β€” |
| | MaxPool2d | (32, 14, 14) | 2Γ—2 |
| | Dropout2d | (32, 14, 14) | p=0.25 |
| Block 2 | Conv2d | (64, 14, 14) | 64 filters, 3Γ—3, padding=1 |
| | BatchNorm2d | (64, 14, 14) | β€” |
| | ReLU | (64, 14, 14) | β€” |
| | MaxPool2d | (64, 7, 7) | 2Γ—2 |
| | Dropout2d | (64, 7, 7) | p=0.25 |
| Block 3 | Conv2d | (128, 7, 7) | 128 filters, 3Γ—3, padding=1 |
| | BatchNorm2d | (128, 7, 7) | β€” |
| | ReLU | (128, 7, 7) | β€” |
| | MaxPool2d | (128, 3, 3) | 2Γ—2 |
| | Dropout2d | (128, 3, 3) | p=0.25 |
| Block 4 | Conv2d | (256, 3, 3) | 256 filters, **1Γ—1** kernel (no pad) |
| | BatchNorm2d | (256, 3, 3) | β€” |
| | ReLU | (256, 3, 3) | β€” |
| | MaxPool2d | (256, 1, 1) | 2Γ—2 |
| | Dropout2d | (256, 1, 1) | p=0.25 |
#### Fully Connected Layers
| Layer | Output | Details |
|----------|--------|----------------------|
| Flatten | 256 | 256 Γ— 1 Γ— 1 = 256 |
| Linear | 128 | + ReLU + Dropout(0.25) |
| Linear | 10 | Raw logits |
**Total Parameters: 160,842**
#### Shape Flow
```
Input: (B, 1, 28, 28)
Block 1: (B, 32, 14, 14)
Block 2: (B, 64, 7, 7)
Block 3: (B, 128, 3, 3)
Block 4: (B, 256, 1, 1)
Flatten: (B, 256)
FC1: (B, 128)
Output: (B, 10)
```
### Compute Infrastructure
- **Hardware:** NVIDIA T4 GPU (Google Colab)
- **Software:** Python 3.10+, PyTorch 2.0, torchvision
---
## Citation
If you use this model in your work, please cite:
**BibTeX:**
```bibtex
@misc{digit-classifier-2026,
author = {Abdul Rafay},
title = {Handwritten Digit Classifier (CNN on MNIST)},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/abdurafay19/Digit-Classifier}
}
```
**APA:**
> Abdul Rafay. (2026). *Handwritten Digit Classifier (CNN on MNIST)*. Hugging Face. https://huggingface.co/abdurafay19/Digit-Classifier
---
## Glossary
| Term | Definition |
|--------------|------------|
| CNN | Convolutional Neural Network β€” a deep learning architecture suited for image data |
| MNIST | A benchmark dataset of 70,000 handwritten digit images |
| Softmax | Activation function that converts raw outputs to probabilities summing to 1 |
| Dropout | Regularization technique that randomly disables neurons during training |
| BatchNorm | Batch Normalization β€” normalizes layer activations to stabilize and speed up training |
| OneCycleLR | Learning rate schedule with warmup and cosine decay for faster convergence |
| Label Smoothing | Softens hard targets to reduce overconfidence and improve generalization |
| Grad-CAM | Gradient-weighted Class Activation Mapping β€” a model interpretability technique |
---
## Model Card Authors
Abdul Rafay β€” abdulrafay17wolf@gmail.com
## Model Card Contact
For questions or issues, open a GitHub issue at [github.com/abdurafay19/Digit-Classifier](https://github.com/abdurafay19/Digit-Classifier) or reach out via Hugging Face.