---
language:
- en
license: mit
tags:
- image-classification
- digit-recognition
- cnn
- mnist
- pytorch
datasets:
- mnist
metrics:
- accuracy
---

# Model Card — Handwritten Digit Classifier (CNN)

A Convolutional Neural Network (CNN) trained on the MNIST dataset to classify handwritten digits (0–9) with high accuracy. Designed for real-time inference in a web-based drawing interface.

---

## Model Details

### Model Description

This model is a CNN trained from scratch on the MNIST benchmark dataset. It accepts 28×28 grayscale images of handwritten digits and outputs a probability distribution over 10 classes (digits 0–9). It is the backbone of the [Digit Classifier web app](https://huggingface.co/spaces/abdurafay19/Digit-Classifier).

- **Developed by:** [Abdul Rafay](www.linkedin.com/in/abdurafay19)
- **Model type:** Convolutional Neural Network (CNN)
- **Language(s):** N/A (Computer Vision — image input only)
- **License:** MIT
- **Framework:** PyTorch 2.0+
- **Finetuned from:** Trained from scratch (no pretrained base)

### Model Sources
- **Demo:** [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier)

---
digit_classifier(1)
## Uses

### Direct Use

This model can be used directly to classify 28×28 grayscale images of handwritten digits — no fine-tuning required. It is best suited for:

- Educational demos of deep learning and CNNs
- Handwritten digit recognition in controlled environments
- Integration into apps via the provided web UI or API

### Downstream Use

The model can be fine-tuned or adapted for:

- Multi-digit number recognition (e.g., street numbers, forms)
- Similar single-character classification tasks
- Transfer learning baseline for other image classification problems

### Out-of-Scope Use

This model is **not** suitable for:

- Recognizing letters, symbols, or non-digit characters
- Noisy, real-world document scans without preprocessing
- Multi-digit or multi-character sequences in a single image
- Safety-critical systems (e.g., medical, legal document processing)

---

## Bias, Risks, and Limitations

- **Dataset bias:** MNIST digits are clean, centered, and size-normalized. The model may underperform on digits written in non-Western styles, extreme stroke widths, or unusual orientations.
- **Domain shift:** Performance degrades on images that differ significantly from the MNIST distribution (e.g., photos of digits on paper, different fonts).
- **No uncertainty calibration:** The model outputs softmax probabilities, which may appear confident even on out-of-distribution inputs.

### Recommendations

- Preprocess input images to 28×28 grayscale and center/normalize digits before inference.
- Do not rely on model confidence scores alone — add a rejection threshold for production use.
- Evaluate on your specific distribution before deploying in any real-world scenario.

---

## How to Get Started with the Model

```python
import torch
from torchvision import transforms
from PIL import Image
from model import Model  # your model definition

# Load model
model = Model()
model.load_state_dict(torch.load("model.pt"))
model.eval()

# Preprocess image
transform = transforms.Compose([
    transforms.Grayscale(),
    transforms.Resize((28, 28)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

img = Image.open("digit.png")
tensor = transform(img).unsqueeze(0)  # shape: [1, 1, 28, 28]

# Predict
with torch.no_grad():
    output = model(tensor)
    prediction = output.argmax(dim=1).item()

print(f"Predicted digit: {prediction}")
```

---

## Training Details

### Training Data

- **Dataset:** [MNIST](https://huggingface.co/datasets/mnist) — 70,000 grayscale images (60,000 train / 10,000 test)
- **Input size:** 28×28 pixels, single channel
- **Classes:** 10 (digits 0–9)

### Training Procedure

#### Preprocessing

- Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081)
- Training augmentation: random rotation (±10°), random affine with translation (±10%), scale (0.9–1.1×), and shear (±5°)
- Test images: normalization only — no augmentation

#### Training Hyperparameters

| Parameter       | Value                        |
|-----------------|------------------------------|
| Optimizer       | AdamW                        |
| Learning Rate   | 3e-3 (max, OneCycleLR)       |
| Weight Decay    | 1e-4                         |
| Batch Size      | 64                           |
| Epochs          | 50                           |
| Loss Function   | CrossEntropyLoss             |
| Label Smoothing | 0.1                          |
| LR Scheduler    | OneCycleLR (10% warmup, cosine anneal) |
| Dropout (conv)  | 0.25 (Dropout2d)             |
| Dropout (FC)    | 0.25                         |
| Random Seed     | 23                           |
| Training regime | fp32                         |

#### Speeds, Sizes, Times

- **Training time:** ~10 minutes on a single GPU (NVIDIA T4, Google Colab)
- **Model parameters:** 160,842
- **Inference speed:** <50ms per image (CPU)

---

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

Evaluated on the standard MNIST test split — 10,000 images not seen during training.

#### Factors

Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata).

#### Metrics

- **Accuracy** — primary metric; proportion of correctly classified digits
- **Confusion Matrix** — to identify per-class error patterns

### Results

| Metric        | Value   |
|---------------|---------|
| Test Accuracy | 99.43%  |

#### Per-Class Accuracy

| Digit | Correct | Errors | Accuracy |
|-------|---------|--------|----------|
| 0     | 980     | 0      | 100.0%   |
| 1     | 1132    | 3      | 99.7%    |
| 2     | 1025    | 7      | 99.3%    |
| 3     | 1008    | 2      | 99.8%    |
| 4     | 976     | 6      | 99.4%    |
| 5     | 885     | 7      | 99.2%    |
| 6     | 949     | 9      | 99.1%    |
| 7     | 1020    | 8      | 99.2%    |
| 8     | 968     | 6      | 99.4%    |
| 9     | 1000    | 9      | 99.1%    |

#### Summary

The model achieves **99.43% accuracy** on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity.

---

## Model Examination

The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions.

---

## Environmental Impact

Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute).

| Factor          | Value                  |
|-----------------|------------------------|
| Hardware Type   | NVIDIA T4 GPU          |
| Hours Used      | ~0.2 hrs (10 min)      |
| Cloud Provider  | Google Colab           |
| Compute Region  | Singapore              |
| Carbon Emitted  | ~0.01 kg CO₂eq (est.)  |

---

## Technical Specifications

### Model Architecture

The model uses 4 convolutional blocks followed by a compact fully connected head.

#### Convolutional Blocks

| Block   | Layer       | Output Shape   | Details                              |
|---------|-------------|----------------|--------------------------------------|
| Block 1 | Conv2d      | (32, 28, 28)   | 32 filters, 3×3, padding=1          |
|         | BatchNorm2d | (32, 28, 28)   | —                                    |
|         | ReLU        | (32, 28, 28)   | —                                    |
|         | MaxPool2d   | (32, 14, 14)   | 2×2                                  |
|         | Dropout2d   | (32, 14, 14)   | p=0.25                               |
| Block 2 | Conv2d      | (64, 14, 14)   | 64 filters, 3×3, padding=1          |
|         | BatchNorm2d | (64, 14, 14)   | —                                    |
|         | ReLU        | (64, 14, 14)   | —                                    |
|         | MaxPool2d   | (64, 7, 7)     | 2×2                                  |
|         | Dropout2d   | (64, 7, 7)     | p=0.25                               |
| Block 3 | Conv2d      | (128, 7, 7)    | 128 filters, 3×3, padding=1         |
|         | BatchNorm2d | (128, 7, 7)    | —                                    |
|         | ReLU        | (128, 7, 7)    | —                                    |
|         | MaxPool2d   | (128, 3, 3)    | 2×2                                  |
|         | Dropout2d   | (128, 3, 3)    | p=0.25                               |
| Block 4 | Conv2d      | (256, 3, 3)    | 256 filters, **1×1** kernel (no pad) |
|         | BatchNorm2d | (256, 3, 3)    | —                                    |
|         | ReLU        | (256, 3, 3)    | —                                    |
|         | MaxPool2d   | (256, 1, 1)    | 2×2                                  |
|         | Dropout2d   | (256, 1, 1)    | p=0.25                               |

#### Fully Connected Layers

| Layer    | Output | Details              |
|----------|--------|----------------------|
| Flatten  | 256    | 256 × 1 × 1 = 256    |
| Linear   | 128    | + ReLU + Dropout(0.25) |
| Linear   | 10     | Raw logits           |

**Total Parameters: 160,842**

#### Shape Flow

```
Input:   (B,   1, 28, 28)
Block 1: (B,  32, 14, 14)
Block 2: (B,  64,  7,  7)
Block 3: (B, 128,  3,  3)
Block 4: (B, 256,  1,  1)
Flatten: (B, 256)
FC1:     (B, 128)
Output:  (B,  10)
```

### Compute Infrastructure

- **Hardware:** NVIDIA T4 GPU (Google Colab)
- **Software:** Python 3.10+, PyTorch 2.0, torchvision

---

## Citation

If you use this model in your work, please cite:

**BibTeX:**
```bibtex
@misc{digit-classifier-2026,
  author    = {Abdul Rafay},
  title     = {Handwritten Digit Classifier (CNN on MNIST)},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/abdurafay19/Digit-Classifier}
}
```

**APA:**
> Abdul Rafay. (2026). *Handwritten Digit Classifier (CNN on MNIST)*. Hugging Face. https://huggingface.co/abdurafay19/Digit-Classifier

---

## Glossary

| Term         | Definition |
|--------------|------------|
| CNN          | Convolutional Neural Network — a deep learning architecture suited for image data |
| MNIST        | A benchmark dataset of 70,000 handwritten digit images |
| Softmax      | Activation function that converts raw outputs to probabilities summing to 1 |
| Dropout      | Regularization technique that randomly disables neurons during training |
| BatchNorm    | Batch Normalization — normalizes layer activations to stabilize and speed up training |
| OneCycleLR   | Learning rate schedule with warmup and cosine decay for faster convergence |
| Label Smoothing | Softens hard targets to reduce overconfidence and improve generalization |
| Grad-CAM     | Gradient-weighted Class Activation Mapping — a model interpretability technique |

---

## Model Card Authors

Abdul Rafay — abdulrafay17wolf@gmail.com

## Model Card Contact

For questions or issues, open a GitHub issue at [github.com/abdurafay19/Digit-Classifier](https://github.com/abdurafay19/Digit-Classifier) or reach out via Hugging Face.