Update README.md

541d601 verified about 1 month ago

11.4 kB

	---
	language:
	- en
	license: mit
	tags:
	- image-classification
	- digit-recognition
	- cnn
	- mnist
	- pytorch
	datasets:
	- mnist
	metrics:
	- accuracy
	---

	# Model Card — Handwritten Digit Classifier (CNN)

	A Convolutional Neural Network (CNN) trained on the MNIST dataset to classify handwritten digits (0–9) with high accuracy. Designed for real-time inference in a web-based drawing interface.

	---

	## Model Details

	### Model Description

	This model is a CNN trained from scratch on the MNIST benchmark dataset. It accepts 28×28 grayscale images of handwritten digits and outputs a probability distribution over 10 classes (digits 0–9). It is the backbone of the [Digit Classifier web app](https://huggingface.co/spaces/abdurafay19/Digit-Classifier).

	- Developed by: [Abdul Rafay](www.linkedin.com/in/abdurafay19)
	- Model type: Convolutional Neural Network (CNN)
	- Language(s): N/A (Computer Vision — image input only)
	- License: MIT
	- Framework: PyTorch 2.0+
	- Finetuned from: Trained from scratch (no pretrained base)

	### Model Sources
	- Demo: [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier)

	---
	digit_classifier(1)
	## Uses

	### Direct Use

	This model can be used directly to classify 28×28 grayscale images of handwritten digits — no fine-tuning required. It is best suited for:

	- Educational demos of deep learning and CNNs
	- Handwritten digit recognition in controlled environments
	- Integration into apps via the provided web UI or API

	### Downstream Use

	The model can be fine-tuned or adapted for:

	- Multi-digit number recognition (e.g., street numbers, forms)
	- Similar single-character classification tasks
	- Transfer learning baseline for other image classification problems

	### Out-of-Scope Use

	This model is not suitable for:

	- Recognizing letters, symbols, or non-digit characters
	- Noisy, real-world document scans without preprocessing
	- Multi-digit or multi-character sequences in a single image
	- Safety-critical systems (e.g., medical, legal document processing)

	---

	## Bias, Risks, and Limitations

	- Dataset bias: MNIST digits are clean, centered, and size-normalized. The model may underperform on digits written in non-Western styles, extreme stroke widths, or unusual orientations.
	- Domain shift: Performance degrades on images that differ significantly from the MNIST distribution (e.g., photos of digits on paper, different fonts).
	- No uncertainty calibration: The model outputs softmax probabilities, which may appear confident even on out-of-distribution inputs.

	### Recommendations

	- Preprocess input images to 28×28 grayscale and center/normalize digits before inference.
	- Do not rely on model confidence scores alone — add a rejection threshold for production use.
	- Evaluate on your specific distribution before deploying in any real-world scenario.

	---

	## How to Get Started with the Model

	```python
	import torch
	from torchvision import transforms
	from PIL import Image
	from model import Model # your model definition

	# Load model
	model = Model()
	model.load_state_dict(torch.load("model.pt"))
	model.eval()

	# Preprocess image
	transform = transforms.Compose([
	transforms.Grayscale(),
	transforms.Resize((28, 28)),
	transforms.ToTensor(),
	transforms.Normalize((0.1307,), (0.3081,))
	])

	img = Image.open("digit.png")
	tensor = transform(img).unsqueeze(0) # shape: [1, 1, 28, 28]

	# Predict
	with torch.no_grad():
	output = model(tensor)
	prediction = output.argmax(dim=1).item()

	print(f"Predicted digit: {prediction}")
	```

	---

	## Training Details

	### Training Data

	- Dataset: [MNIST](https://huggingface.co/datasets/mnist) — 70,000 grayscale images (60,000 train / 10,000 test)
	- Input size: 28×28 pixels, single channel
	- Classes: 10 (digits 0–9)

	### Training Procedure

	#### Preprocessing

	- Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081)
	- Training augmentation: random rotation (±10°), random affine with translation (±10%), scale (0.9–1.1×), and shear (±5°)
	- Test images: normalization only — no augmentation

	#### Training Hyperparameters

	\| Parameter \| Value \|
	\|-----------------\|------------------------------\|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 3e-3 (max, OneCycleLR) \|
	\| Weight Decay \| 1e-4 \|
	\| Batch Size \| 64 \|
	\| Epochs \| 50 \|
	\| Loss Function \| CrossEntropyLoss \|
	\| Label Smoothing \| 0.1 \|
	\| LR Scheduler \| OneCycleLR (10% warmup, cosine anneal) \|
	\| Dropout (conv) \| 0.25 (Dropout2d) \|
	\| Dropout (FC) \| 0.25 \|
	\| Random Seed \| 23 \|
	\| Training regime \| fp32 \|

	#### Speeds, Sizes, Times

	- Training time: ~10 minutes on a single GPU (NVIDIA T4, Google Colab)
	- Model parameters: 160,842
	- Inference speed: <50ms per image (CPU)

	---

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	Evaluated on the standard MNIST test split — 10,000 images not seen during training.

	#### Factors

	Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata).

	#### Metrics

	- Accuracy — primary metric; proportion of correctly classified digits
	- Confusion Matrix — to identify per-class error patterns

	### Results

	\| Metric \| Value \|
	\|---------------\|---------\|
	\| Test Accuracy \| 99.43% \|

	#### Per-Class Accuracy

	\| Digit \| Correct \| Errors \| Accuracy \|
	\|-------\|---------\|--------\|----------\|
	\| 0 \| 980 \| 0 \| 100.0% \|
	\| 1 \| 1132 \| 3 \| 99.7% \|
	\| 2 \| 1025 \| 7 \| 99.3% \|
	\| 3 \| 1008 \| 2 \| 99.8% \|
	\| 4 \| 976 \| 6 \| 99.4% \|
	\| 5 \| 885 \| 7 \| 99.2% \|
	\| 6 \| 949 \| 9 \| 99.1% \|
	\| 7 \| 1020 \| 8 \| 99.2% \|
	\| 8 \| 968 \| 6 \| 99.4% \|
	\| 9 \| 1000 \| 9 \| 99.1% \|

	#### Summary

	The model achieves 99.43% accuracy on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity.

	---

	## Model Examination

	The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions.

	---

	## Environmental Impact

	Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute).

	\| Factor \| Value \|
	\|-----------------\|------------------------\|
	\| Hardware Type \| NVIDIA T4 GPU \|
	\| Hours Used \| ~0.2 hrs (10 min) \|
	\| Cloud Provider \| Google Colab \|
	\| Compute Region \| Singapore \|
	\| Carbon Emitted \| ~0.01 kg CO₂eq (est.) \|

	---

	## Technical Specifications

	### Model Architecture

	The model uses 4 convolutional blocks followed by a compact fully connected head.

	#### Convolutional Blocks

	\| Block \| Layer \| Output Shape \| Details \|
	\|---------\|-------------\|----------------\|--------------------------------------\|
	\| Block 1 \| Conv2d \| (32, 28, 28) \| 32 filters, 3×3, padding=1 \|
	\| \| BatchNorm2d \| (32, 28, 28) \| — \|
	\| \| ReLU \| (32, 28, 28) \| — \|
	\| \| MaxPool2d \| (32, 14, 14) \| 2×2 \|
	\| \| Dropout2d \| (32, 14, 14) \| p=0.25 \|
	\| Block 2 \| Conv2d \| (64, 14, 14) \| 64 filters, 3×3, padding=1 \|
	\| \| BatchNorm2d \| (64, 14, 14) \| — \|
	\| \| ReLU \| (64, 14, 14) \| — \|
	\| \| MaxPool2d \| (64, 7, 7) \| 2×2 \|
	\| \| Dropout2d \| (64, 7, 7) \| p=0.25 \|
	\| Block 3 \| Conv2d \| (128, 7, 7) \| 128 filters, 3×3, padding=1 \|
	\| \| BatchNorm2d \| (128, 7, 7) \| — \|
	\| \| ReLU \| (128, 7, 7) \| — \|
	\| \| MaxPool2d \| (128, 3, 3) \| 2×2 \|
	\| \| Dropout2d \| (128, 3, 3) \| p=0.25 \|
	\| Block 4 \| Conv2d \| (256, 3, 3) \| 256 filters, 1×1 kernel (no pad) \|
	\| \| BatchNorm2d \| (256, 3, 3) \| — \|
	\| \| ReLU \| (256, 3, 3) \| — \|
	\| \| MaxPool2d \| (256, 1, 1) \| 2×2 \|
	\| \| Dropout2d \| (256, 1, 1) \| p=0.25 \|

	#### Fully Connected Layers

	\| Layer \| Output \| Details \|
	\|----------\|--------\|----------------------\|
	\| Flatten \| 256 \| 256 × 1 × 1 = 256 \|
	\| Linear \| 128 \| + ReLU + Dropout(0.25) \|
	\| Linear \| 10 \| Raw logits \|

	Total Parameters: 160,842

	#### Shape Flow

	```
	Input: (B, 1, 28, 28)
	Block 1: (B, 32, 14, 14)
	Block 2: (B, 64, 7, 7)
	Block 3: (B, 128, 3, 3)
	Block 4: (B, 256, 1, 1)
	Flatten: (B, 256)
	FC1: (B, 128)
	Output: (B, 10)
	```

	### Compute Infrastructure

	- Hardware: NVIDIA T4 GPU (Google Colab)
	- Software: Python 3.10+, PyTorch 2.0, torchvision

	---

	## Citation

	If you use this model in your work, please cite:

	BibTeX:
	```bibtex
	@misc{digit-classifier-2026,
	author = {Abdul Rafay},
	title = {Handwritten Digit Classifier (CNN on MNIST)},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/abdurafay19/Digit-Classifier}
	}
	```

	APA:
	> Abdul Rafay. (2026). Handwritten Digit Classifier (CNN on MNIST). Hugging Face. https://huggingface.co/abdurafay19/Digit-Classifier

	---

	## Glossary

	\| Term \| Definition \|
	\|--------------\|------------\|
	\| CNN \| Convolutional Neural Network — a deep learning architecture suited for image data \|
	\| MNIST \| A benchmark dataset of 70,000 handwritten digit images \|
	\| Softmax \| Activation function that converts raw outputs to probabilities summing to 1 \|
	\| Dropout \| Regularization technique that randomly disables neurons during training \|
	\| BatchNorm \| Batch Normalization — normalizes layer activations to stabilize and speed up training \|
	\| OneCycleLR \| Learning rate schedule with warmup and cosine decay for faster convergence \|
	\| Label Smoothing \| Softens hard targets to reduce overconfidence and improve generalization \|
	\| Grad-CAM \| Gradient-weighted Class Activation Mapping — a model interpretability technique \|

	---

	## Model Card Authors

	Abdul Rafay — abdulrafay17wolf@gmail.com

	## Model Card Contact

	For questions or issues, open a GitHub issue at [github.com/abdurafay19/Digit-Classifier](https://github.com/abdurafay19/Digit-Classifier) or reach out via Hugging Face.