LWWZH
/

Mini-Vision-V1

Image Classification

computer-vision

mini-vision-series

Model card Files Files and versions

Mini-Vision-V1 / README.md

LWWZH's picture

Update README.md

6d5529d verified about 1 month ago

|

history blame contribute delete

3.51 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- image-classification
	- cifar10
	- cnn
	- computer-vision
	- pytorch
	- mini-vision
	- mini-vision-series
	metrics:
	- accuracy
	pipeline_tag: image-classification
	datasets:
	- uoft-cs/cifar10
	---

	# Mini-Vision-V1: CIFAR-10 CNN Classifier

	![Model Size](https://img.shields.io/badge/Params-1.34M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-78%25-green)

	Welcome to Mini-Vision-V1, the first model in the Mini-Vision series. This project demonstrates a robust implementation of a Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset. It is designed to be lightweight, efficient, and easy to understand, making it perfect for beginners learning PyTorch.

	## Model Description

	Mini-Vision-V1 is a custom 4-layer CNN architecture. It utilizes Batch Normalization and Dropout to prevent overfitting and ensure stable training. With only 1.34M parameters, it achieves a competitive accuracy on the CIFAR-10 test set.

	- Dataset: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (32x32 color images, 10 classes)
	- Framework: PyTorch
	- Total Parameters: 1.34M

	## Model Architecture

	The network consists of 4 convolutional blocks followed by a classifier head.

	\| Layer \| Input Channels \| Output Channels \| Kernel Size \| Stride \| Padding \| Activation \| Other \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :--- \| :--- \|
	\| Conv Block 1 \| 3 \| 32 \| 5 \| 1 \| 2 \| ReLU \| MaxPool(2), BatchNorm \|
	\| Conv Block 2 \| 32 \| 64 \| 5 \| 1 \| 2 \| ReLU \| MaxPool(2), BatchNorm \|
	\| Conv Block 3 \| 64 \| 128 \| 5 \| 1 \| 2 \| ReLU \| MaxPool(2), BatchNorm \|
	\| Conv Block 4 \| 128 \| 256 \| 5 \| 1 \| 2 \| ReLU \| MaxPool(2), BatchNorm \|
	\| Flatten \| - \| - \| - \| - \| - \| - \| Output: 1024 \|
	\| Linear 1 \| 1024 \| 256 \| - \| - \| - \| ReLU \| Dropout(0.5) \|
	\| Linear 2 \| 256 \| 10 \| - \| - \| - \| - \| - \|

	## Training Strategy

	The model was trained using standard practices for CIFAR-10 to maximize performance on a small footprint.

	- Optimizer: SGD (Momentum=0.9)
	- Initial Learning Rate: 0.007
	- Scheduler: StepLR (Step size=5, Gamma=0.5)
	- Loss Function: CrossEntropyLoss
	- Batch Size: 256
	- Epochs: Total 100 epochs, Best Accuracy 31 epoch
	- Data Augmentation:
	- Random Crop (32x32 with padding=4)
	- Random Horizontal Flip

	## Performance

	The model achieved the following results on the CIFAR-10 test set:

	\| Metric \| Value \|
	\| :--- \| :---: \|
	\| Test Accuracy \| 78% \|
	\| Parameters \| 1.34M \|

	### Training Visualization (TensorBoard)

	Below are the training and testing curves visualized via TensorBoard.

	#### 1. Training Loss

	![Training Loss](assets/train_loss.png)
	(Recorded every step)

	#### 2. Test Loss
	![Test Loss](assets/test_loss.png)
	(Recorded every epoch)

	## Quick Start

	### Dependencies
	- Python 3.x
	- PyTorch
	- Torchvision
	- requirements.txt

	### Inference

	You can easily load the model and perform inference on a single image using the test.py file.

	## File Structure

	```
	.
	├── model.py # Model architecture definition
	├── train.py # Training script
	├── test.py # Inference script
	├── Mini-Vision-V1.pth # Trained model weights
	├── config.json
	├── README.md
	└── assets
	├── train_loss.png # Visualized train loss graph
	└── test_loss.png # Visualized test loss graph
	```

	## License

	This project is licensed under the MIT License.