--- license: mit library_name: pytorch tags: - image-classification - mnist - cnn - computer-vision - pytorch - mini-vision - mini-vision-series metrics: - accuracy pipeline_tag: image-classification datasets: - ylecun/mnist --- # Mini-Vision-V2: MNIST Handwritten Digit Classifier ![Model Size](https://img.shields.io/badge/Params-0.82M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-99.3%25-green) Welcome to **Mini-Vision-V2**, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost. ## Model Description Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only **0.82M parameters** (significantly smaller than V1), it achieves **99.3% accuracy** on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets. - **Dataset**: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes) - **Framework**: PyTorch - **Total Parameters**: 0.82M ## Model Architecture The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization. | Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other | | :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- | | **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm | | **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm | | **Flatten** | - | - | - | - | - | - | Output: 3136 | | **Linear 1** | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) | | **Linear 2** | 256 | 10 | - | - | - | - | - | ## Training Strategy The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler. - **Optimizer**: SGD (Momentum=0.8) - **Initial Learning Rate**: 0.01 - **Scheduler**: StepLR (Step size=3, Gamma=0.5) - **Loss Function**: CrossEntropyLoss - **Batch Size**: 256 - **Epochs**: 40 (Best model) - **Data Augmentation**: - Random Crop (28x28 with padding=2) - Random Rotation (10 degrees) ## Performance The model achieved outstanding results on the MNIST test set: | Metric | Value | | :--- | :---: | | **Test Accuracy** | **99.3%** | | Test Loss | 0.0235 | | Train Loss | 0.0615 | | Parameters | 0.82M | ### Training Visualization (TensorBoard) Below are the training and testing curves visualized via TensorBoard. #### 1. Training Loss ![Training Loss](assets/train_loss.png) *(Recorded every epoch)* #### 2. Test Loss & Accuracy ![Test Loss](assets/test_loss.png) *(Recorded every epoch)* ## Quick Start ### Dependencies - Python 3.x - PyTorch - Torchvision - Gradio (for demo) - Datasets ### Inference / Web Demo Run the Gradio demo to draw numbers and see predictions in real-time: ```bash python demo.py ``` ## File Structure ``` . ├── model.py # Model architecture definition ├── train.py # Training script ├── demo.py # Gradio Web Interface ├── Mini-Vision-V2.pth # Trained model weights ├── config.json ├── README.md └── assets ├── train_loss.png # Visualized train loss graph └── test_loss.png # Visualized test loss graph ``` ## License This project is licensed under the MIT License.