Mini-Vision-V1 / README.md
LWWZH's picture
Update README.md
6d5529d verified
---
license: mit
library_name: pytorch
tags:
- image-classification
- cifar10
- cnn
- computer-vision
- pytorch
- mini-vision
- mini-vision-series
metrics:
- accuracy
pipeline_tag: image-classification
datasets:
- uoft-cs/cifar10
---
# Mini-Vision-V1: CIFAR-10 CNN Classifier
![Model Size](https://img.shields.io/badge/Params-1.34M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-78%25-green)
Welcome to **Mini-Vision-V1**, the first model in the Mini-Vision series. This project demonstrates a robust implementation of a Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset. It is designed to be lightweight, efficient, and easy to understand, making it perfect for beginners learning PyTorch.
## Model Description
Mini-Vision-V1 is a custom 4-layer CNN architecture. It utilizes Batch Normalization and Dropout to prevent overfitting and ensure stable training. With only **1.34M parameters**, it achieves a competitive accuracy on the CIFAR-10 test set.
- **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (32x32 color images, 10 classes)
- **Framework**: PyTorch
- **Total Parameters**: 1.34M
## Model Architecture
The network consists of 4 convolutional blocks followed by a classifier head.
| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
| :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
| **Conv Block 1** | 3 | 32 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 2** | 32 | 64 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 3** | 64 | 128 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 4** | 128 | 256 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
| **Flatten** | - | - | - | - | - | - | Output: 1024 |
| **Linear 1** | 1024 | 256 | - | - | - | ReLU | Dropout(0.5) |
| **Linear 2** | 256 | 10 | - | - | - | - | - |
## Training Strategy
The model was trained using standard practices for CIFAR-10 to maximize performance on a small footprint.
- **Optimizer**: SGD (Momentum=0.9)
- **Initial Learning Rate**: 0.007
- **Scheduler**: StepLR (Step size=5, Gamma=0.5)
- **Loss Function**: CrossEntropyLoss
- **Batch Size**: 256
- **Epochs**: Total 100 epochs, Best Accuracy 31 epoch
- **Data Augmentation**:
- Random Crop (32x32 with padding=4)
- Random Horizontal Flip
## Performance
The model achieved the following results on the CIFAR-10 test set:
| Metric | Value |
| :--- | :---: |
| **Test Accuracy** | **78%** |
| Parameters | 1.34M |
### Training Visualization (TensorBoard)
Below are the training and testing curves visualized via TensorBoard.
#### 1. Training Loss
![Training Loss](assets/train_loss.png)
*(Recorded every step)*
#### 2. Test Loss
![Test Loss](assets/test_loss.png)
*(Recorded every epoch)*
## Quick Start
### Dependencies
- Python 3.x
- PyTorch
- Torchvision
- requirements.txt
### Inference
You can easily load the model and perform inference on a single image using the **test.py** file.
## File Structure
```
.
β”œβ”€β”€ model.py # Model architecture definition
β”œβ”€β”€ train.py # Training script
β”œβ”€β”€ test.py # Inference script
β”œβ”€β”€ Mini-Vision-V1.pth # Trained model weights
β”œβ”€β”€ config.json
β”œβ”€β”€ README.md
└── assets
β”œβ”€β”€ train_loss.png # Visualized train loss graph
└── test_loss.png # Visualized test loss graph
```
## License
This project is licensed under the MIT License.