--- license: mit library_name: pytorch tags: - image-classification - cifar10 - cnn - computer-vision - pytorch - mini-vision - mini-vision-series metrics: - accuracy pipeline_tag: image-classification datasets: - uoft-cs/cifar10 --- # Mini-Vision-V1: CIFAR-10 CNN Classifier ![Model Size](https://img.shields.io/badge/Params-1.34M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-78%25-green) Welcome to **Mini-Vision-V1**, the first model in the Mini-Vision series. This project demonstrates a robust implementation of a Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset. It is designed to be lightweight, efficient, and easy to understand, making it perfect for beginners learning PyTorch. ## Model Description Mini-Vision-V1 is a custom 4-layer CNN architecture. It utilizes Batch Normalization and Dropout to prevent overfitting and ensure stable training. With only **1.34M parameters**, it achieves a competitive accuracy on the CIFAR-10 test set. - **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (32x32 color images, 10 classes) - **Framework**: PyTorch - **Total Parameters**: 1.34M ## Model Architecture The network consists of 4 convolutional blocks followed by a classifier head. | Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other | | :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- | | **Conv Block 1** | 3 | 32 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | | **Conv Block 2** | 32 | 64 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | | **Conv Block 3** | 64 | 128 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | | **Conv Block 4** | 128 | 256 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm | | **Flatten** | - | - | - | - | - | - | Output: 1024 | | **Linear 1** | 1024 | 256 | - | - | - | ReLU | Dropout(0.5) | | **Linear 2** | 256 | 10 | - | - | - | - | - | ## Training Strategy The model was trained using standard practices for CIFAR-10 to maximize performance on a small footprint. - **Optimizer**: SGD (Momentum=0.9) - **Initial Learning Rate**: 0.007 - **Scheduler**: StepLR (Step size=5, Gamma=0.5) - **Loss Function**: CrossEntropyLoss - **Batch Size**: 256 - **Epochs**: Total 100 epochs, Best Accuracy 31 epoch - **Data Augmentation**: - Random Crop (32x32 with padding=4) - Random Horizontal Flip ## Performance The model achieved the following results on the CIFAR-10 test set: | Metric | Value | | :--- | :---: | | **Test Accuracy** | **78%** | | Parameters | 1.34M | ### Training Visualization (TensorBoard) Below are the training and testing curves visualized via TensorBoard. #### 1. Training Loss ![Training Loss](assets/train_loss.png) *(Recorded every step)* #### 2. Test Loss ![Test Loss](assets/test_loss.png) *(Recorded every epoch)* ## Quick Start ### Dependencies - Python 3.x - PyTorch - Torchvision - requirements.txt ### Inference You can easily load the model and perform inference on a single image using the **test.py** file. ## File Structure ``` . ├── model.py # Model architecture definition ├── train.py # Training script ├── test.py # Inference script ├── Mini-Vision-V1.pth # Trained model weights ├── config.json ├── README.md └── assets ├── train_loss.png # Visualized train loss graph └── test_loss.png # Visualized test loss graph ``` ## License This project is licensed under the MIT License.