File size: 3,722 Bytes
5b6d90c 09b93cd 5b6d90c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
license: mit
library_name: pytorch
tags:
- image-classification
- mnist
- cnn
- computer-vision
- pytorch
- mini-vision
- mini-vision-series
metrics:
- accuracy
pipeline_tag: image-classification
datasets:
- ylecun/mnist
---
# Mini-Vision-V2: MNIST Handwritten Digit Classifier
 
Welcome to **Mini-Vision-V2**, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.
## Model Description
Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only **0.82M parameters** (significantly smaller than V1), it achieves **99.3% accuracy** on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.
- **Dataset**: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes)
- **Framework**: PyTorch
- **Total Parameters**: 0.82M
## Model Architecture
The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.
| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
| :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
| **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Flatten** | - | - | - | - | - | - | Output: 3136 |
| **Linear 1** | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) |
| **Linear 2** | 256 | 10 | - | - | - | - | - |
## Training Strategy
The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.
- **Optimizer**: SGD (Momentum=0.8)
- **Initial Learning Rate**: 0.01
- **Scheduler**: StepLR (Step size=3, Gamma=0.5)
- **Loss Function**: CrossEntropyLoss
- **Batch Size**: 256
- **Epochs**: 40 (Best model)
- **Data Augmentation**:
- Random Crop (28x28 with padding=2)
- Random Rotation (10 degrees)
## Performance
The model achieved outstanding results on the MNIST test set:
| Metric | Value |
| :--- | :---: |
| **Test Accuracy** | **99.3%** |
| Test Loss | 0.0235 |
| Train Loss | 0.0615 |
| Parameters | 0.82M |
### Training Visualization (TensorBoard)
Below are the training and testing curves visualized via TensorBoard.
#### 1. Training Loss

*(Recorded every epoch)*
#### 2. Test Loss & Accuracy

*(Recorded every epoch)*
## Quick Start
### Dependencies
- Python 3.x
- PyTorch
- Torchvision
- Gradio (for demo)
- Datasets
### Inference / Web Demo
Run the Gradio demo to draw numbers and see predictions in real-time:
```bash
python demo.py
```
## File Structure
```
.
βββ model.py # Model architecture definition
βββ train.py # Training script
βββ demo.py # Gradio Web Interface
βββ Mini-Vision-V2.pth # Trained model weights
βββ config.json
βββ README.md
βββ assets
βββ train_loss.png # Visualized train loss graph
βββ test_loss.png # Visualized test loss graph
```
## License
This project is licensed under the MIT License.
|