--- language: - en license: mit tags: - image-classification - digit-recognition - cnn - mnist - pytorch datasets: - mnist metrics: - accuracy --- # Model Card — Handwritten Digit Classifier (CNN) A Convolutional Neural Network (CNN) trained on the MNIST dataset to classify handwritten digits (0–9) with high accuracy. Designed for real-time inference in a web-based drawing interface. --- ## Model Details ### Model Description This model is a CNN trained from scratch on the MNIST benchmark dataset. It accepts 28×28 grayscale images of handwritten digits and outputs a probability distribution over 10 classes (digits 0–9). It is the backbone of the [Digit Classifier web app](https://huggingface.co/spaces/abdurafay19/Digit-Classifier). - **Developed by:** [Abdul Rafay](www.linkedin.com/in/abdurafay19) - **Model type:** Convolutional Neural Network (CNN) - **Language(s):** N/A (Computer Vision — image input only) - **License:** MIT - **Framework:** PyTorch 2.0+ - **Finetuned from:** Trained from scratch (no pretrained base) ### Model Sources - **Demo:** [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier) --- digit_classifier(1) ## Uses ### Direct Use This model can be used directly to classify 28×28 grayscale images of handwritten digits — no fine-tuning required. It is best suited for: - Educational demos of deep learning and CNNs - Handwritten digit recognition in controlled environments - Integration into apps via the provided web UI or API ### Downstream Use The model can be fine-tuned or adapted for: - Multi-digit number recognition (e.g., street numbers, forms) - Similar single-character classification tasks - Transfer learning baseline for other image classification problems ### Out-of-Scope Use This model is **not** suitable for: - Recognizing letters, symbols, or non-digit characters - Noisy, real-world document scans without preprocessing - Multi-digit or multi-character sequences in a single image - Safety-critical systems (e.g., medical, legal document processing) --- ## Bias, Risks, and Limitations - **Dataset bias:** MNIST digits are clean, centered, and size-normalized. The model may underperform on digits written in non-Western styles, extreme stroke widths, or unusual orientations. - **Domain shift:** Performance degrades on images that differ significantly from the MNIST distribution (e.g., photos of digits on paper, different fonts). - **No uncertainty calibration:** The model outputs softmax probabilities, which may appear confident even on out-of-distribution inputs. ### Recommendations - Preprocess input images to 28×28 grayscale and center/normalize digits before inference. - Do not rely on model confidence scores alone — add a rejection threshold for production use. - Evaluate on your specific distribution before deploying in any real-world scenario. --- ## How to Get Started with the Model ```python import torch from torchvision import transforms from PIL import Image from model import Model # your model definition # Load model model = Model() model.load_state_dict(torch.load("model.pt")) model.eval() # Preprocess image transform = transforms.Compose([ transforms.Grayscale(), transforms.Resize((28, 28)), transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) img = Image.open("digit.png") tensor = transform(img).unsqueeze(0) # shape: [1, 1, 28, 28] # Predict with torch.no_grad(): output = model(tensor) prediction = output.argmax(dim=1).item() print(f"Predicted digit: {prediction}") ``` --- ## Training Details ### Training Data - **Dataset:** [MNIST](https://huggingface.co/datasets/mnist) — 70,000 grayscale images (60,000 train / 10,000 test) - **Input size:** 28×28 pixels, single channel - **Classes:** 10 (digits 0–9) ### Training Procedure #### Preprocessing - Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081) - Training augmentation: random rotation (±10°), random affine with translation (±10%), scale (0.9–1.1×), and shear (±5°) - Test images: normalization only — no augmentation #### Training Hyperparameters | Parameter | Value | |-----------------|------------------------------| | Optimizer | AdamW | | Learning Rate | 3e-3 (max, OneCycleLR) | | Weight Decay | 1e-4 | | Batch Size | 64 | | Epochs | 50 | | Loss Function | CrossEntropyLoss | | Label Smoothing | 0.1 | | LR Scheduler | OneCycleLR (10% warmup, cosine anneal) | | Dropout (conv) | 0.25 (Dropout2d) | | Dropout (FC) | 0.25 | | Random Seed | 23 | | Training regime | fp32 | #### Speeds, Sizes, Times - **Training time:** ~10 minutes on a single GPU (NVIDIA T4, Google Colab) - **Model parameters:** 160,842 - **Inference speed:** <50ms per image (CPU) --- ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Evaluated on the standard MNIST test split — 10,000 images not seen during training. #### Factors Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata). #### Metrics - **Accuracy** — primary metric; proportion of correctly classified digits - **Confusion Matrix** — to identify per-class error patterns ### Results | Metric | Value | |---------------|---------| | Test Accuracy | 99.43% | #### Per-Class Accuracy | Digit | Correct | Errors | Accuracy | |-------|---------|--------|----------| | 0 | 980 | 0 | 100.0% | | 1 | 1132 | 3 | 99.7% | | 2 | 1025 | 7 | 99.3% | | 3 | 1008 | 2 | 99.8% | | 4 | 976 | 6 | 99.4% | | 5 | 885 | 7 | 99.2% | | 6 | 949 | 9 | 99.1% | | 7 | 1020 | 8 | 99.2% | | 8 | 968 | 6 | 99.4% | | 9 | 1000 | 9 | 99.1% | #### Summary The model achieves **99.43% accuracy** on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity. --- ## Model Examination The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions. --- ## Environmental Impact Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute). | Factor | Value | |-----------------|------------------------| | Hardware Type | NVIDIA T4 GPU | | Hours Used | ~0.2 hrs (10 min) | | Cloud Provider | Google Colab | | Compute Region | Singapore | | Carbon Emitted | ~0.01 kg CO₂eq (est.) | --- ## Technical Specifications ### Model Architecture The model uses 4 convolutional blocks followed by a compact fully connected head. #### Convolutional Blocks | Block | Layer | Output Shape | Details | |---------|-------------|----------------|--------------------------------------| | Block 1 | Conv2d | (32, 28, 28) | 32 filters, 3×3, padding=1 | | | BatchNorm2d | (32, 28, 28) | — | | | ReLU | (32, 28, 28) | — | | | MaxPool2d | (32, 14, 14) | 2×2 | | | Dropout2d | (32, 14, 14) | p=0.25 | | Block 2 | Conv2d | (64, 14, 14) | 64 filters, 3×3, padding=1 | | | BatchNorm2d | (64, 14, 14) | — | | | ReLU | (64, 14, 14) | — | | | MaxPool2d | (64, 7, 7) | 2×2 | | | Dropout2d | (64, 7, 7) | p=0.25 | | Block 3 | Conv2d | (128, 7, 7) | 128 filters, 3×3, padding=1 | | | BatchNorm2d | (128, 7, 7) | — | | | ReLU | (128, 7, 7) | — | | | MaxPool2d | (128, 3, 3) | 2×2 | | | Dropout2d | (128, 3, 3) | p=0.25 | | Block 4 | Conv2d | (256, 3, 3) | 256 filters, **1×1** kernel (no pad) | | | BatchNorm2d | (256, 3, 3) | — | | | ReLU | (256, 3, 3) | — | | | MaxPool2d | (256, 1, 1) | 2×2 | | | Dropout2d | (256, 1, 1) | p=0.25 | #### Fully Connected Layers | Layer | Output | Details | |----------|--------|----------------------| | Flatten | 256 | 256 × 1 × 1 = 256 | | Linear | 128 | + ReLU + Dropout(0.25) | | Linear | 10 | Raw logits | **Total Parameters: 160,842** #### Shape Flow ``` Input: (B, 1, 28, 28) Block 1: (B, 32, 14, 14) Block 2: (B, 64, 7, 7) Block 3: (B, 128, 3, 3) Block 4: (B, 256, 1, 1) Flatten: (B, 256) FC1: (B, 128) Output: (B, 10) ``` ### Compute Infrastructure - **Hardware:** NVIDIA T4 GPU (Google Colab) - **Software:** Python 3.10+, PyTorch 2.0, torchvision --- ## Citation If you use this model in your work, please cite: **BibTeX:** ```bibtex @misc{digit-classifier-2026, author = {Abdul Rafay}, title = {Handwritten Digit Classifier (CNN on MNIST)}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/abdurafay19/Digit-Classifier} } ``` **APA:** > Abdul Rafay. (2026). *Handwritten Digit Classifier (CNN on MNIST)*. Hugging Face. https://huggingface.co/abdurafay19/Digit-Classifier --- ## Glossary | Term | Definition | |--------------|------------| | CNN | Convolutional Neural Network — a deep learning architecture suited for image data | | MNIST | A benchmark dataset of 70,000 handwritten digit images | | Softmax | Activation function that converts raw outputs to probabilities summing to 1 | | Dropout | Regularization technique that randomly disables neurons during training | | BatchNorm | Batch Normalization — normalizes layer activations to stabilize and speed up training | | OneCycleLR | Learning rate schedule with warmup and cosine decay for faster convergence | | Label Smoothing | Softens hard targets to reduce overconfidence and improve generalization | | Grad-CAM | Gradient-weighted Class Activation Mapping — a model interpretability technique | --- ## Model Card Authors Abdul Rafay — abdulrafay17wolf@gmail.com ## Model Card Contact For questions or issues, open a GitHub issue at [github.com/abdurafay19/Digit-Classifier](https://github.com/abdurafay19/Digit-Classifier) or reach out via Hugging Face.