abdurafay19 commited on
Commit
ddef3d5
·
verified ·
1 Parent(s): 038d8bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -44
README.md CHANGED
@@ -37,7 +37,7 @@ This model is a CNN trained from scratch on the MNIST benchmark dataset. It acce
37
  - **Demo:** [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier)
38
 
39
  ---
40
-
41
  ## Uses
42
 
43
  ### Direct Use
@@ -87,11 +87,11 @@ This model is **not** suitable for:
87
  import torch
88
  from torchvision import transforms
89
  from PIL import Image
90
- from model import CNN # your model definition
91
 
92
  # Load model
93
- model = CNN()
94
- model.load_state_dict(torch.load("model.pt"))
95
  model.eval()
96
 
97
  # Preprocess image
@@ -122,31 +122,36 @@ print(f"Predicted digit: {prediction}")
122
  - **Dataset:** [MNIST](https://huggingface.co/datasets/mnist) — 70,000 grayscale images (60,000 train / 10,000 test)
123
  - **Input size:** 28×28 pixels, single channel
124
  - **Classes:** 10 (digits 0–9)
125
- -
126
  ### Training Procedure
127
 
128
  #### Preprocessing
129
 
130
- - Images resized to 28×28 and converted to grayscale tensors
131
- - Pixel values normalized using MNIST dataset mean and standard deviation
132
- - Random horizontal flips and small rotations applied for data augmentation
133
 
134
  #### Training Hyperparameters
135
 
136
- | Parameter | Value |
137
- |------------------|----------------|
138
- | Optimizer | Adam |
139
- | Learning Rate | 1e-3 |
140
- | Batch Size | 64 |
141
- | Epochs | 28 |
142
- | Loss Function | CrossEntropyLoss |
143
- | Dropout | 0.5 |
144
- | Training regime | fp32 |
 
 
 
 
 
145
 
146
  #### Speeds, Sizes, Times
147
 
148
- - **Training time:** ~10 minutes on a single GPU (NVIDIA T4)
149
- - **Model size:** ~2.4 MB (`.pt` file)
150
  - **Inference speed:** <50ms per image (CPU)
151
 
152
  ---
@@ -161,7 +166,7 @@ Evaluated on the standard MNIST test split — 10,000 images not seen during tra
161
 
162
  #### Factors
163
 
164
- Evaluation was performed across all 10 digit classes equally. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata).
165
 
166
  #### Metrics
167
 
@@ -172,17 +177,32 @@ Evaluation was performed across all 10 digit classes equally. No disaggregation
172
 
173
  | Metric | Value |
174
  |---------------|---------|
175
- | Test Accuracy | 99.16% |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
  #### Summary
178
 
179
- The model achieves **99.16% accuracy** on the MNIST test set, consistent with state-of-the-art results for CNN-based approaches on this benchmark.
180
 
181
  ---
182
 
183
  ## Model Examination
184
 
185
- The model's convolutional filters learn edge detectors and stroke patterns in the first layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions.
186
 
187
  ---
188
 
@@ -193,8 +213,8 @@ Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github
193
  | Factor | Value |
194
  |-----------------|------------------------|
195
  | Hardware Type | NVIDIA T4 GPU |
196
- | Hours Used | ~0.2 hrs (10 min) |
197
- | Cloud Provider | Google Colab / Local |
198
  | Compute Region | Singapore |
199
  | Carbon Emitted | ~0.01 kg COâ‚‚eq (est.) |
200
 
@@ -204,33 +224,59 @@ Carbon emissions estimated using the [ML Impact Calculator](https://mlco2.github
204
 
205
  ### Model Architecture
206
 
 
 
207
  #### Convolutional Blocks
208
 
209
- | Layer | Output Shape | Details |
210
- |-------------|---------------|------------------------------|
211
- | Conv2d | (64, 28, 28) | 64 filters, 3×3, padding=1 |
212
- | BatchNorm2d | (64, 28, 28) | — |
213
- | ReLU | (64, 28, 28) | inplace=True |
214
- | MaxPool2d | (64, 14, 14) | 2×2 |
215
- | Conv2d | (128, 14, 14) | 128 filters, 3×3, padding=1 |
216
- | BatchNorm2d | (128, 14, 14) | — |
217
- | ReLU | (128, 14, 14) | inplace=True |
218
- | MaxPool2d | (128, 7, 7) | 2×2 |
 
 
 
 
 
 
 
 
 
 
 
 
219
 
220
  #### Fully Connected Layers
221
 
222
- | Layer | Output | Details |
223
- |----------|--------|-------------------------------------|
224
- | Flatten | 6272 | — |
225
- | Linear | 512 | + BatchNorm1d + ReLU + Dropout(0.4) |
226
- | Linear | 128 | + BatchNorm1d + ReLU + Dropout(0.2) |
227
- | Linear | 10 | Raw logits |
 
228
 
229
- **Total Parameters: ~3.5M** — Kaiming Normal initialization throughout.
 
 
 
 
 
 
 
 
 
 
 
230
 
231
  ### Compute Infrastructure
232
 
233
- - **Hardware:** NVIDIA T4 / any CUDA-capable GPU (or CPU for inference)
234
  - **Software:** Python 3.10+, PyTorch 2.0, torchvision
235
 
236
  ---
@@ -242,7 +288,7 @@ If you use this model in your work, please cite:
242
  **BibTeX:**
243
  ```bibtex
244
  @misc{digit-classifier-2026,
245
- author = Abdul Rafay,
246
  title = {Handwritten Digit Classifier (CNN on MNIST)},
247
  year = {2026},
248
  publisher = {Hugging Face},
@@ -263,6 +309,9 @@ If you use this model in your work, please cite:
263
  | MNIST | A benchmark dataset of 70,000 handwritten digit images |
264
  | Softmax | Activation function that converts raw outputs to probabilities summing to 1 |
265
  | Dropout | Regularization technique that randomly disables neurons during training |
 
 
 
266
  | Grad-CAM | Gradient-weighted Class Activation Mapping — a model interpretability technique |
267
 
268
  ---
 
37
  - **Demo:** [Hugging Face Space](https://huggingface.co/spaces/abdurafay19/Digit-Classifier)
38
 
39
  ---
40
+ digit_classifier(1)
41
  ## Uses
42
 
43
  ### Direct Use
 
87
  import torch
88
  from torchvision import transforms
89
  from PIL import Image
90
+ from model import Model # your model definition
91
 
92
  # Load model
93
+ model = Model()
94
+ model.load_state_dict(torch.load("mnist_best.pth"))
95
  model.eval()
96
 
97
  # Preprocess image
 
122
  - **Dataset:** [MNIST](https://huggingface.co/datasets/mnist) — 70,000 grayscale images (60,000 train / 10,000 test)
123
  - **Input size:** 28×28 pixels, single channel
124
  - **Classes:** 10 (digits 0–9)
125
+
126
  ### Training Procedure
127
 
128
  #### Preprocessing
129
 
130
+ - Images converted to tensors and normalized using MNIST dataset mean (0.1307) and std (0.3081)
131
+ - Training augmentation: random rotation (±10°), random affine with translation (±10%), scale (0.9–1.1×), and shear (±5°)
132
+ - Test images: normalization only — no augmentation
133
 
134
  #### Training Hyperparameters
135
 
136
+ | Parameter | Value |
137
+ |-----------------|------------------------------|
138
+ | Optimizer | AdamW |
139
+ | Learning Rate | 3e-3 (max, OneCycleLR) |
140
+ | Weight Decay | 1e-4 |
141
+ | Batch Size | 64 |
142
+ | Epochs | 50 |
143
+ | Loss Function | CrossEntropyLoss |
144
+ | Label Smoothing | 0.1 |
145
+ | LR Scheduler | OneCycleLR (10% warmup, cosine anneal) |
146
+ | Dropout (conv) | 0.25 (Dropout2d) |
147
+ | Dropout (FC) | 0.25 |
148
+ | Random Seed | 23 |
149
+ | Training regime | fp32 |
150
 
151
  #### Speeds, Sizes, Times
152
 
153
+ - **Training time:** ~10 minutes on a single GPU (NVIDIA T4, Google Colab)
154
+ - **Model parameters:** 160,842
155
  - **Inference speed:** <50ms per image (CPU)
156
 
157
  ---
 
166
 
167
  #### Factors
168
 
169
+ Evaluation was performed across all 10 digit classes. No disaggregation by subpopulation was conducted (MNIST does not include demographic metadata).
170
 
171
  #### Metrics
172
 
 
177
 
178
  | Metric | Value |
179
  |---------------|---------|
180
+ | Test Accuracy | 99.43% |
181
+
182
+ #### Per-Class Accuracy
183
+
184
+ | Digit | Correct | Errors | Accuracy |
185
+ |-------|---------|--------|----------|
186
+ | 0 | 980 | 0 | 100.0% |
187
+ | 1 | 1132 | 3 | 99.7% |
188
+ | 2 | 1025 | 7 | 99.3% |
189
+ | 3 | 1008 | 2 | 99.8% |
190
+ | 4 | 976 | 6 | 99.4% |
191
+ | 5 | 885 | 7 | 99.2% |
192
+ | 6 | 949 | 9 | 99.1% |
193
+ | 7 | 1020 | 8 | 99.2% |
194
+ | 8 | 968 | 6 | 99.4% |
195
+ | 9 | 1000 | 9 | 99.1% |
196
 
197
  #### Summary
198
 
199
+ The model achieves **99.43% accuracy** on the MNIST test set (57 total errors out of 10,000). Digit 0 achieves perfect classification. The most challenging classes are 6 and 9 (9 errors each), consistent with their visual similarity.
200
 
201
  ---
202
 
203
  ## Model Examination
204
 
205
+ The model's convolutional filters learn edge detectors and stroke patterns in early layers, which compose into digit-specific features in deeper layers. Standard CNN interpretability techniques (e.g., Grad-CAM) can be applied to visualize which regions most influence predictions.
206
 
207
  ---
208
 
 
213
  | Factor | Value |
214
  |-----------------|------------------------|
215
  | Hardware Type | NVIDIA T4 GPU |
216
+ | Hours Used | ~0.2 hrs (10 min) |
217
+ | Cloud Provider | Google Colab |
218
  | Compute Region | Singapore |
219
  | Carbon Emitted | ~0.01 kg COâ‚‚eq (est.) |
220
 
 
224
 
225
  ### Model Architecture
226
 
227
+ The model uses 4 convolutional blocks followed by a compact fully connected head.
228
+
229
  #### Convolutional Blocks
230
 
231
+ | Block | Layer | Output Shape | Details |
232
+ |---------|-------------|----------------|--------------------------------------|
233
+ | Block 1 | Conv2d | (32, 28, 28) | 32 filters, 3×3, padding=1 |
234
+ | | BatchNorm2d | (32, 28, 28) | — |
235
+ | | ReLU | (32, 28, 28) | — |
236
+ | | MaxPool2d | (32, 14, 14) | 2×2 |
237
+ | | Dropout2d | (32, 14, 14) | p=0.25 |
238
+ | Block 2 | Conv2d | (64, 14, 14) | 64 filters, 3×3, padding=1 |
239
+ | | BatchNorm2d | (64, 14, 14) | — |
240
+ | | ReLU | (64, 14, 14) | — |
241
+ | | MaxPool2d | (64, 7, 7) | 2×2 |
242
+ | | Dropout2d | (64, 7, 7) | p=0.25 |
243
+ | Block 3 | Conv2d | (128, 7, 7) | 128 filters, 3×3, padding=1 |
244
+ | | BatchNorm2d | (128, 7, 7) | — |
245
+ | | ReLU | (128, 7, 7) | — |
246
+ | | MaxPool2d | (128, 3, 3) | 2×2 |
247
+ | | Dropout2d | (128, 3, 3) | p=0.25 |
248
+ | Block 4 | Conv2d | (256, 3, 3) | 256 filters, **1×1** kernel (no pad) |
249
+ | | BatchNorm2d | (256, 3, 3) | — |
250
+ | | ReLU | (256, 3, 3) | — |
251
+ | | MaxPool2d | (256, 1, 1) | 2×2 |
252
+ | | Dropout2d | (256, 1, 1) | p=0.25 |
253
 
254
  #### Fully Connected Layers
255
 
256
+ | Layer | Output | Details |
257
+ |----------|--------|----------------------|
258
+ | Flatten | 256 | 256 × 1 × 1 = 256 |
259
+ | Linear | 128 | + ReLU + Dropout(0.25) |
260
+ | Linear | 10 | Raw logits |
261
+
262
+ **Total Parameters: 160,842**
263
 
264
+ #### Shape Flow
265
+
266
+ ```
267
+ Input: (B, 1, 28, 28)
268
+ Block 1: (B, 32, 14, 14)
269
+ Block 2: (B, 64, 7, 7)
270
+ Block 3: (B, 128, 3, 3)
271
+ Block 4: (B, 256, 1, 1)
272
+ Flatten: (B, 256)
273
+ FC1: (B, 128)
274
+ Output: (B, 10)
275
+ ```
276
 
277
  ### Compute Infrastructure
278
 
279
+ - **Hardware:** NVIDIA T4 GPU (Google Colab)
280
  - **Software:** Python 3.10+, PyTorch 2.0, torchvision
281
 
282
  ---
 
288
  **BibTeX:**
289
  ```bibtex
290
  @misc{digit-classifier-2026,
291
+ author = {Abdul Rafay},
292
  title = {Handwritten Digit Classifier (CNN on MNIST)},
293
  year = {2026},
294
  publisher = {Hugging Face},
 
309
  | MNIST | A benchmark dataset of 70,000 handwritten digit images |
310
  | Softmax | Activation function that converts raw outputs to probabilities summing to 1 |
311
  | Dropout | Regularization technique that randomly disables neurons during training |
312
+ | BatchNorm | Batch Normalization — normalizes layer activations to stabilize and speed up training |
313
+ | OneCycleLR | Learning rate schedule with warmup and cosine decay for faster convergence |
314
+ | Label Smoothing | Softens hard targets to reduce overconfidence and improve generalization |
315
  | Grad-CAM | Gradient-weighted Class Activation Mapping — a model interpretability technique |
316
 
317
  ---