LWWZH
/

Mini-Vision-V2

Image Classification

computer-vision

mini-vision-series

Model card Files Files and versions

Mini-Vision-V2 / README.md

LWWZH's picture

Upload Mini-Vision-V2

09b93cd verified 27 days ago

|

history blame contribute delete

3.72 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- image-classification
	- mnist
	- cnn
	- computer-vision
	- pytorch
	- mini-vision
	- mini-vision-series
	metrics:
	- accuracy
	pipeline_tag: image-classification
	datasets:
	- ylecun/mnist
	---

	# Mini-Vision-V2: MNIST Handwritten Digit Classifier

	![Model Size](https://img.shields.io/badge/Params-0.82M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-99.3%25-green)

	Welcome to Mini-Vision-V2, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.

	## Model Description

	Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only 0.82M parameters (significantly smaller than V1), it achieves 99.3% accuracy on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.

	- Dataset: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes)
	- Framework: PyTorch
	- Total Parameters: 0.82M

	## Model Architecture

	The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.

	\| Layer \| Input Channels \| Output Channels \| Kernel Size \| Stride \| Padding \| Activation \| Other \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :--- \| :--- \|
	\| Conv Block 1 \| 1 \| 32 \| 3 \| 1 \| 1 \| ReLU \| MaxPool(2), BatchNorm \|
	\| Conv Block 2 \| 32 \| 64 \| 3 \| 1 \| 1 \| ReLU \| MaxPool(2), BatchNorm \|
	\| Flatten \| - \| - \| - \| - \| - \| - \| Output: 3136 \|
	\| Linear 1 \| 3136 \| 256 \| - \| - \| - \| ReLU \| Dropout(0.3) \|
	\| Linear 2 \| 256 \| 10 \| - \| - \| - \| - \| - \|

	## Training Strategy

	The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.

	- Optimizer: SGD (Momentum=0.8)
	- Initial Learning Rate: 0.01
	- Scheduler: StepLR (Step size=3, Gamma=0.5)
	- Loss Function: CrossEntropyLoss
	- Batch Size: 256
	- Epochs: 40 (Best model)
	- Data Augmentation:
	- Random Crop (28x28 with padding=2)
	- Random Rotation (10 degrees)

	## Performance

	The model achieved outstanding results on the MNIST test set:

	\| Metric \| Value \|
	\| :--- \| :---: \|
	\| Test Accuracy \| 99.3% \|
	\| Test Loss \| 0.0235 \|
	\| Train Loss \| 0.0615 \|
	\| Parameters \| 0.82M \|

	### Training Visualization (TensorBoard)

	Below are the training and testing curves visualized via TensorBoard.

	#### 1. Training Loss

	![Training Loss](assets/train_loss.png)
	(Recorded every epoch)

	#### 2. Test Loss & Accuracy
	![Test Loss](assets/test_loss.png)
	(Recorded every epoch)

	## Quick Start

	### Dependencies
	- Python 3.x
	- PyTorch
	- Torchvision
	- Gradio (for demo)
	- Datasets

	### Inference / Web Demo

	Run the Gradio demo to draw numbers and see predictions in real-time:

	```bash
	python demo.py
	```

	## File Structure

	```
	.
	├── model.py # Model architecture definition
	├── train.py # Training script
	├── demo.py # Gradio Web Interface
	├── Mini-Vision-V2.pth # Trained model weights
	├── config.json
	├── README.md
	└── assets
	├── train_loss.png # Visualized train loss graph
	└── test_loss.png # Visualized test loss graph
	```

	## License

	This project is licensed under the MIT License.