| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - image-classification |
| - cifar10 |
| - cnn |
| - computer-vision |
| - pytorch |
| - mini-vision |
| - mini-vision-series |
| metrics: |
| - accuracy |
| pipeline_tag: image-classification |
| datasets: |
| - uoft-cs/cifar10 |
| --- |
| |
| # Mini-Vision-V1: CIFAR-10 CNN Classifier |
|
|
|   |
|
|
| Welcome to **Mini-Vision-V1**, the first model in the Mini-Vision series. This project demonstrates a robust implementation of a Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset. It is designed to be lightweight, efficient, and easy to understand, making it perfect for beginners learning PyTorch. |
|
|
| ## Model Description |
|
|
| Mini-Vision-V1 is a custom 4-layer CNN architecture. It utilizes Batch Normalization and Dropout to prevent overfitting and ensure stable training. With only **1.34M parameters**, it achieves a competitive accuracy on the CIFAR-10 test set. |
|
|
| - **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (32x32 color images, 10 classes) |
| - **Framework**: PyTorch |
| - **Total Parameters**: 1.34M |
|
|
| ## Model Architecture |
|
|
| The network consists of 4 convolutional blocks followed by a classifier head. |
|
|
| | Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other | |
| | :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- | |
| | **Conv Block 1** | 3 | 32 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | |
| | **Conv Block 2** | 32 | 64 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | |
| | **Conv Block 3** | 64 | 128 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | |
| | **Conv Block 4** | 128 | 256 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | |
| | **Flatten** | - | - | - | - | - | - | Output: 1024 | |
| | **Linear 1** | 1024 | 256 | - | - | - | ReLU | Dropout(0.5) | |
| | **Linear 2** | 256 | 10 | - | - | - | - | - | |
|
|
| ## Training Strategy |
|
|
| The model was trained using standard practices for CIFAR-10 to maximize performance on a small footprint. |
|
|
| - **Optimizer**: SGD (Momentum=0.9) |
| - **Initial Learning Rate**: 0.007 |
| - **Scheduler**: StepLR (Step size=5, Gamma=0.5) |
| - **Loss Function**: CrossEntropyLoss |
| - **Batch Size**: 256 |
| - **Epochs**: Total 100 epochs, Best Accuracy 31 epoch |
| - **Data Augmentation**: |
| - Random Crop (32x32 with padding=4) |
| - Random Horizontal Flip |
|
|
| ## Performance |
|
|
| The model achieved the following results on the CIFAR-10 test set: |
|
|
| | Metric | Value | |
| | :--- | :---: | |
| | **Test Accuracy** | **78%** | |
| | Parameters | 1.34M | |
|
|
| ### Training Visualization (TensorBoard) |
|
|
| Below are the training and testing curves visualized via TensorBoard. |
|
|
| #### 1. Training Loss |
|
|
|  |
| *(Recorded every step)* |
|
|
| #### 2. Test Loss |
|  |
| *(Recorded every epoch)* |
|
|
| ## Quick Start |
|
|
| ### Dependencies |
| - Python 3.x |
| - PyTorch |
| - Torchvision |
| - requirements.txt |
|
|
| ### Inference |
|
|
| You can easily load the model and perform inference on a single image using the **test.py** file. |
|
|
| ## File Structure |
|
|
| ``` |
| . |
| βββ model.py # Model architecture definition |
| βββ train.py # Training script |
| βββ test.py # Inference script |
| βββ Mini-Vision-V1.pth # Trained model weights |
| βββ config.json |
| βββ README.md |
| βββ assets |
| βββ train_loss.png # Visualized train loss graph |
| βββ test_loss.png # Visualized test loss graph |
| ``` |
|
|
| ## License |
|
|
| This project is licensed under the MIT License. |
|
|