| ---
|
| license: mit
|
| library_name: pytorch
|
| tags:
|
| - image-classification
|
| - mnist
|
| - cnn
|
| - computer-vision
|
| - pytorch
|
| - mini-vision
|
| - mini-vision-series
|
| metrics:
|
| - accuracy
|
| pipeline_tag: image-classification
|
| datasets:
|
| - ylecun/mnist
|
| ---
|
|
|
| # Mini-Vision-V2: MNIST Handwritten Digit Classifier
|
|
|
|  
|
|
|
| Welcome to **Mini-Vision-V2**, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.
|
|
|
| ## Model Description
|
|
|
| Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only **0.82M parameters** (significantly smaller than V1), it achieves **99.3% accuracy** on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.
|
|
|
| - **Dataset**: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes)
|
| - **Framework**: PyTorch
|
| - **Total Parameters**: 0.82M
|
|
|
| ## Model Architecture
|
|
|
| The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.
|
|
|
| | Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
|
| | :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
|
| | **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
|
| | **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
|
| | **Flatten** | - | - | - | - | - | - | Output: 3136 |
|
| | **Linear 1** | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) |
|
| | **Linear 2** | 256 | 10 | - | - | - | - | - |
|
|
|
| ## Training Strategy
|
|
|
| The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.
|
|
|
| - **Optimizer**: SGD (Momentum=0.8)
|
| - **Initial Learning Rate**: 0.01
|
| - **Scheduler**: StepLR (Step size=3, Gamma=0.5)
|
| - **Loss Function**: CrossEntropyLoss
|
| - **Batch Size**: 256
|
| - **Epochs**: 40 (Best model)
|
| - **Data Augmentation**:
|
| - Random Crop (28x28 with padding=2)
|
| - Random Rotation (10 degrees)
|
|
|
| ## Performance
|
|
|
| The model achieved outstanding results on the MNIST test set:
|
|
|
| | Metric | Value |
|
| | :--- | :---: |
|
| | **Test Accuracy** | **99.3%** |
|
| | Test Loss | 0.0235 |
|
| | Train Loss | 0.0615 |
|
| | Parameters | 0.82M |
|
|
|
| ### Training Visualization (TensorBoard)
|
|
|
| Below are the training and testing curves visualized via TensorBoard.
|
|
|
| #### 1. Training Loss
|
|
|
| 
|
| *(Recorded every epoch)*
|
|
|
| #### 2. Test Loss & Accuracy
|
| 
|
| *(Recorded every epoch)*
|
|
|
| ## Quick Start
|
|
|
| ### Dependencies
|
| - Python 3.x
|
| - PyTorch
|
| - Torchvision
|
| - Gradio (for demo)
|
| - Datasets
|
|
|
| ### Inference / Web Demo
|
|
|
| Run the Gradio demo to draw numbers and see predictions in real-time:
|
|
|
| ```bash
|
| python demo.py
|
| ```
|
|
|
| ## File Structure
|
|
|
| ```
|
| .
|
| βββ model.py # Model architecture definition
|
| βββ train.py # Training script
|
| βββ demo.py # Gradio Web Interface
|
| βββ Mini-Vision-V2.pth # Trained model weights
|
| βββ config.json
|
| βββ README.md
|
| βββ assets
|
| βββ train_loss.png # Visualized train loss graph
|
| βββ test_loss.png # Visualized test loss graph
|
| ```
|
|
|
| ## License
|
|
|
| This project is licensed under the MIT License.
|
|
|