File size: 3,722 Bytes
5b6d90c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09b93cd
5b6d90c
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---

license: mit
library_name: pytorch
tags:
- image-classification
- mnist
- cnn
- computer-vision
- pytorch
- mini-vision
- mini-vision-series
metrics:
- accuracy
pipeline_tag: image-classification
datasets:
- ylecun/mnist
---


# Mini-Vision-V2: MNIST Handwritten Digit Classifier

![Model Size](https://img.shields.io/badge/Params-0.82M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-99.3%25-green)

Welcome to **Mini-Vision-V2**, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.

## Model Description

Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only **0.82M parameters** (significantly smaller than V1), it achieves **99.3% accuracy** on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.

- **Dataset**: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes)
- **Framework**: PyTorch
- **Total Parameters**: 0.82M

## Model Architecture

The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.

| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
| :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
| **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Flatten** | - | - | - | - | - | - | Output: 3136 |
| **Linear 1** | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) |
| **Linear 2** | 256 | 10 | - | - | - | - | - |

## Training Strategy

The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.

- **Optimizer**: SGD (Momentum=0.8)
- **Initial Learning Rate**: 0.01
- **Scheduler**: StepLR (Step size=3, Gamma=0.5)
- **Loss Function**: CrossEntropyLoss
- **Batch Size**: 256
- **Epochs**: 40 (Best model)
- **Data Augmentation**:
  - Random Crop (28x28 with padding=2)
  - Random Rotation (10 degrees)

## Performance

The model achieved outstanding results on the MNIST test set:

| Metric | Value |
| :--- | :---: |
| **Test Accuracy** | **99.3%** |
| Test Loss | 0.0235 |
| Train Loss | 0.0615 |
| Parameters | 0.82M |

### Training Visualization (TensorBoard)

Below are the training and testing curves visualized via TensorBoard.

#### 1. Training Loss

![Training Loss](assets/train_loss.png)
*(Recorded every epoch)*

#### 2. Test Loss & Accuracy
![Test Loss](assets/test_loss.png)
*(Recorded every epoch)*

## Quick Start

### Dependencies
- Python 3.x
- PyTorch
- Torchvision
- Gradio (for demo)
- Datasets

### Inference / Web Demo

Run the Gradio demo to draw numbers and see predictions in real-time:

```bash

python demo.py

```

## File Structure

```

.

β”œβ”€β”€ model.py               # Model architecture definition

β”œβ”€β”€ train.py               # Training script

β”œβ”€β”€ demo.py                # Gradio Web Interface

β”œβ”€β”€ Mini-Vision-V2.pth     # Trained model weights

β”œβ”€β”€ config.json

β”œβ”€β”€ README.md

└── assets

      β”œβ”€β”€ train_loss.png   # Visualized train loss graph

      └── test_loss.png    # Visualized test loss graph

```

## License

This project is licensed under the MIT License.