| --- |
| language: |
| - en |
| license: mit |
| tags: |
| - image-classification |
| - digit-recognition |
| - cnn |
| - mnist |
| - pytorch |
| datasets: |
| - mnist |
| metrics: |
| - accuracy |
| --- |
| |
| # Model Card β Handwritten Digit Classifier (CNN) |
|
|
| A Convolutional Neural Network (CNN) trained on the MNIST dataset to classify handwritten digits (0β9) with high accuracy. Designed for real-time inference in a web-based drawing interface. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| This model is a CNN trained from scratch on the MNIST benchmark dataset. It accepts 28Γ28 grayscale images of handwritten digits and outputs a probability distribution over 10 classes (digits 0β9). It is the backbone of the [Digit Classifier web app](https://huggingface.co/spaces/abdurafay19/Digit-Classifier). |
|
|
| - **Developed by:** [Abdul Rafay](www.linkedin.com/in/abdurafay19) |
| - **Model type:** Convolutional Neural Network (CNN) |
| - **Language(s):** N/A (Computer Vision β image input only) |
| - **License:** MIT |
| - **Framework:** PyTorch 2.0+ |
| - **Finetuned from:** Trained from scratch (no pretrained base) |
|
|
| ### Model Sources |
| - **Demo:** [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier) |
|
|
| --- |
| digit_classifier(1) |
| ## Uses |
| |
| ### Direct Use |
| |
| This model can be used directly to classify 28Γ28 grayscale images of handwritten digits β no fine-tuning required. It is best suited for: |
| |
| - Educational demos of deep learning and CNNs |
| - Handwritten digit recognition in controlled environments |
| - Integration into apps via the provided web UI or API |
| |
| ### Downstream Use |
| |
| The model can be fine-tuned or adapted for: |
| |
| - Multi-digit number recognition (e.g., street numbers, forms) |
| - Similar single-character classification tasks |
| - Transfer learning baseline for other image classification problems |
| |
| ### Out-of-Scope Use |
| |
| This model is **not** suitable for: |
| |
| - Recognizing letters, symbols, or non-digit characters |
| - Noisy, real-world document scans without preprocessing |
| - Multi-digit or multi-character sequences in a single image |
| - Safety-critical systems (e.g., medical, legal document processing) |
| |
| --- |
| |
| ## Bias, Risks, and Limitations |
| |
| - **Dataset bias:** MNIST digits are clean, centered, and size-normalized. The model may underperform on digits written in non-Western styles, extreme stroke widths, or unusual orientations. |
| - **Domain shift:** Performance degrades on images that differ significantly from the MNIST distribution (e.g., photos of digits on paper, different fonts). |
| - **No uncertainty calibration:** The model outputs softmax probabilities, which may appear confident even on out-of-distribution inputs. |
| |
| ### Recommendations |
| |
| - Preprocess input images to 28Γ28 grayscale and center/normalize digits before inference. |
| - Do not rely on model confidence scores alone β add a rejection threshold for production use. |
| - Evaluate on your specific distribution before deploying in any real-world scenario. |
| |
| --- |
| |
| ## How to Get Started with the Model |
| |
| ```python |
| import torch |
| from torchvision import transforms |
| from PIL import Image |
| from model import Model # your model definition |
| |
| # Load model |
| model = Model() |
| model.load_state_dict(torch.load("model.pt")) |
| model.eval() |
| |
| # Preprocess image |
| transform = transforms.Compose([ |
| transforms.Grayscale(), |
| transforms.Resize((28, 28)), |
| transforms.ToTensor(), |
| transforms.Normalize((0.1307,), (0.3081,)) |
| ]) |
| |
| img = Image.open("digit.png") |
| tensor = transform(img).unsqueeze(0) # shape: [1, 1, 28, 28] |
| |
| # Predict |
| with torch.no_grad(): |
| output = model(tensor) |
| prediction = output.argmax(dim=1).item() |
| |
| print(f"Predicted digit: {prediction}") |
| ``` |
| |
| --- |
| |
| ## Training Details |
| |
| ### Training Data |
| |
| - **Dataset:** [MNIST](https://huggingface.co/datasets/mnist) β 70,000 grayscale images (60,000 train / 10,000 test) |
| - **Input size:** 28Γ28 pixels, single channel |
| - **Classes:** 10 (digits 0β9) |
| |
| ### Training Procedure |
| |
| #### Preprocessing |
| |
| - Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081) |
| - Training augmentation: random rotation (Β±10Β°), random affine with translation (Β±10%), scale (0.9β1.1Γ), and shear (Β±5Β°) |
| - Test images: normalization only β no augmentation |
| |
| #### Training Hyperparameters |
| |
| | Parameter | Value | |
| |-----------------|------------------------------| |
| | Optimizer | AdamW | |
| | Learning Rate | 3e-3 (max, OneCycleLR) | |
| | Weight Decay | 1e-4 | |
| | Batch Size | 64 | |
| | Epochs | 50 | |
| | Loss Function | CrossEntropyLoss | |
| | Label Smoothing | 0.1 | |
| | LR Scheduler | OneCycleLR (10% warmup, cosine anneal) | |
| | Dropout (conv) | 0.25 (Dropout2d) | |
| | Dropout (FC) | 0.25 | |
| | Random Seed | 23 | |
| | Training regime | fp32 | |
| |
| #### Speeds, Sizes, Times |
| |
| - **Training time:** ~10 minutes on a single GPU (NVIDIA T4, Google Colab) |
| - **Model parameters:** 160,842 |
| - **Inference speed:** <50ms per image (CPU) |
| |
| --- |
| |
| ## Evaluation |
| |
| ### Testing Data, Factors & Metrics |
| |
| #### Testing Data |
| |
| Evaluated on the standard MNIST test split β 10,000 images not seen during training. |
| |
| #### Factors |
| |
| Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata). |
| |
| #### Metrics |
| |
| - **Accuracy** β primary metric; proportion of correctly classified digits |
| - **Confusion Matrix** β to identify per-class error patterns |
| |
| ### Results |
| |
| | Metric | Value | |
| |---------------|---------| |
| | Test Accuracy | 99.43% | |
| |
| #### Per-Class Accuracy |
| |
| | Digit | Correct | Errors | Accuracy | |
| |-------|---------|--------|----------| |
| | 0 | 980 | 0 | 100.0% | |
| | 1 | 1132 | 3 | 99.7% | |
| | 2 | 1025 | 7 | 99.3% | |
| | 3 | 1008 | 2 | 99.8% | |
| | 4 | 976 | 6 | 99.4% | |
| | 5 | 885 | 7 | 99.2% | |
| | 6 | 949 | 9 | 99.1% | |
| | 7 | 1020 | 8 | 99.2% | |
| | 8 | 968 | 6 | 99.4% | |
| | 9 | 1000 | 9 | 99.1% | |
| |
| #### Summary |
| |
| The model achieves **99.43% accuracy** on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity. |
| |
| --- |
| |
| ## Model Examination |
| |
| The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions. |
| |
| --- |
| |
| ## Environmental Impact |
| |
| Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute). |
| |
| | Factor | Value | |
| |-----------------|------------------------| |
| | Hardware Type | NVIDIA T4 GPU | |
| | Hours Used | ~0.2 hrs (10 min) | |
| | Cloud Provider | Google Colab | |
| | Compute Region | Singapore | |
| | Carbon Emitted | ~0.01 kg COβeq (est.) | |
| |
| --- |
| |
| ## Technical Specifications |
| |
| ### Model Architecture |
| |
| The model uses 4 convolutional blocks followed by a compact fully connected head. |
| |
| #### Convolutional Blocks |
| |
| | Block | Layer | Output Shape | Details | |
| |---------|-------------|----------------|--------------------------------------| |
| | Block 1 | Conv2d | (32, 28, 28) | 32 filters, 3Γ3, padding=1 | |
| | | BatchNorm2d | (32, 28, 28) | β | |
| | | ReLU | (32, 28, 28) | β | |
| | | MaxPool2d | (32, 14, 14) | 2Γ2 | |
| | | Dropout2d | (32, 14, 14) | p=0.25 | |
| | Block 2 | Conv2d | (64, 14, 14) | 64 filters, 3Γ3, padding=1 | |
| | | BatchNorm2d | (64, 14, 14) | β | |
| | | ReLU | (64, 14, 14) | β | |
| | | MaxPool2d | (64, 7, 7) | 2Γ2 | |
| | | Dropout2d | (64, 7, 7) | p=0.25 | |
| | Block 3 | Conv2d | (128, 7, 7) | 128 filters, 3Γ3, padding=1 | |
| | | BatchNorm2d | (128, 7, 7) | β | |
| | | ReLU | (128, 7, 7) | β | |
| | | MaxPool2d | (128, 3, 3) | 2Γ2 | |
| | | Dropout2d | (128, 3, 3) | p=0.25 | |
| | Block 4 | Conv2d | (256, 3, 3) | 256 filters, **1Γ1** kernel (no pad) | |
| | | BatchNorm2d | (256, 3, 3) | β | |
| | | ReLU | (256, 3, 3) | β | |
| | | MaxPool2d | (256, 1, 1) | 2Γ2 | |
| | | Dropout2d | (256, 1, 1) | p=0.25 | |
| |
| #### Fully Connected Layers |
| |
| | Layer | Output | Details | |
| |----------|--------|----------------------| |
| | Flatten | 256 | 256 Γ 1 Γ 1 = 256 | |
| | Linear | 128 | + ReLU + Dropout(0.25) | |
| | Linear | 10 | Raw logits | |
| |
| **Total Parameters: 160,842** |
| |
| #### Shape Flow |
| |
| ``` |
| Input: (B, 1, 28, 28) |
| Block 1: (B, 32, 14, 14) |
| Block 2: (B, 64, 7, 7) |
| Block 3: (B, 128, 3, 3) |
| Block 4: (B, 256, 1, 1) |
| Flatten: (B, 256) |
| FC1: (B, 128) |
| Output: (B, 10) |
| ``` |
| |
| ### Compute Infrastructure |
| |
| - **Hardware:** NVIDIA T4 GPU (Google Colab) |
| - **Software:** Python 3.10+, PyTorch 2.0, torchvision |
| |
| --- |
| |
| ## Citation |
| |
| If you use this model in your work, please cite: |
| |
| **BibTeX:** |
| ```bibtex |
| @misc{digit-classifier-2026, |
| author = {Abdul Rafay}, |
| title = {Handwritten Digit Classifier (CNN on MNIST)}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| url = {https://huggingface.co/abdurafay19/Digit-Classifier} |
| } |
| ``` |
| |
| **APA:** |
| > Abdul Rafay. (2026). *Handwritten Digit Classifier (CNN on MNIST)*. Hugging Face. https://huggingface.co/abdurafay19/Digit-Classifier |
| |
| --- |
| |
| ## Glossary |
| |
| | Term | Definition | |
| |--------------|------------| |
| | CNN | Convolutional Neural Network β a deep learning architecture suited for image data | |
| | MNIST | A benchmark dataset of 70,000 handwritten digit images | |
| | Softmax | Activation function that converts raw outputs to probabilities summing to 1 | |
| | Dropout | Regularization technique that randomly disables neurons during training | |
| | BatchNorm | Batch Normalization β normalizes layer activations to stabilize and speed up training | |
| | OneCycleLR | Learning rate schedule with warmup and cosine decay for faster convergence | |
| | Label Smoothing | Softens hard targets to reduce overconfidence and improve generalization | |
| | Grad-CAM | Gradient-weighted Class Activation Mapping β a model interpretability technique | |
| |
| --- |
| |
| ## Model Card Authors |
| |
| Abdul Rafay β abdulrafay17wolf@gmail.com |
| |
| ## Model Card Contact |
| |
| For questions or issues, open a GitHub issue at [github.com/abdurafay19/Digit-Classifier](https://github.com/abdurafay19/Digit-Classifier) or reach out via Hugging Face. |