Upload Mini-Vision-V1 of the Mini-Vision-Series

Browse files

Files changed (8) hide show

Mini-Vision-V1.pth +3 -0
README.md +107 -0
assets/test_loss.png +0 -0
assets/train_loss.png +0 -0
model.py +48 -0
requirements.txt +4 -0
test.py +29 -0
train.py +120 -0

Mini-Vision-V1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bcfd936affb42b0f41a13b831ed643f54623abe4e02b1adc1758dc7d27cc25e9
+size 597966

README.md ADDED Viewed

	@@ -0,0 +1,107 @@

+---
+license: mit
+library_name: pytorch
+tags:
+- image-classification
+- cifar10
+- cnn
+- computer-vision
+- pytorch
+- mini-vision
+- mini-vision-series
+metrics:
+- accuracy
+pipeline_tag: image-classification
+---
+# Mini-Vision-V1: CIFAR-10 CNN Classifier
+Welcome to **Mini-Vision-V1**, the first model in the Mini-Vision series. This project demonstrates a robust implementation of a Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset. It is designed to be lightweight, efficient, and easy to understand, making it perfect for beginners learning PyTorch.
+## Model Description
+Mini-Vision-V1 is a custom 4-layer CNN architecture. It utilizes Batch Normalization and Dropout to prevent overfitting and ensure stable training. With only **1.34M parameters**, it achieves a competitive accuracy on the CIFAR-10 test set.
+- **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (32x32 color images, 10 classes)
+- **Framework**: PyTorch
+- **Total Parameters**: 1.34M
+## Model Architecture
+The network consists of 4 convolutional blocks followed by a classifier head.
+| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
+| :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
+| **Conv Block 1** | 3 | 32 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
+| **Conv Block 2** | 32 | 64 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
+| **Conv Block 3** | 64 | 128 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
+| **Conv Block 4** | 128 | 256 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
+| **Flatten** | - | - | - | - | - | - | Output: 1024 |
+| **Linear 1** | 1024 | 256 | - | - | - | ReLU | Dropout(0.5) |
+| **Linear 2** | 256 | 10 | - | - | - | - | - |
+## Training Strategy
+The model was trained using standard practices for CIFAR-10 to maximize performance on a small footprint.
+- **Optimizer**: SGD (Momentum=0.9)
+- **Initial Learning Rate**: 0.007
+- **Scheduler**: StepLR (Step size=5, Gamma=0.5)
+- **Loss Function**: CrossEntropyLoss
+- **Batch Size**: 1024
+- **Epochs**: Total 100 epochs, Best Accuracy 31 epoch
+- **Data Augmentation**:
+  - Random Crop (32x32 with padding=4)
+  - Random Horizontal Flip
+## Performance
+The model achieved the following results on the CIFAR-10 test set:
+| Metric | Value |
+| :--- | :---: |
+| **Test Accuracy** | **78%** |
+| Parameters | 1.34M |
+### Training Visualization (TensorBoard)
+Below are the training and testing curves visualized via TensorBoard.
+#### 1. Training Loss
+![Training Loss](assetsrain_loss.png)
+*(Recorded every step)*
+#### 2. Test Loss
+![Test Loss](assetsest_loss.png)
+*(Recorded every epoch)*
+## Quick Start
+### Dependencies
+- Python 3.x
+- PyTorch
+- Torchvision
+- requirements.txt
+### Inference
+You can easily load the model and perform inference on a single image using the **test.py** file.
+## File Structure
+```
+.
+├── model.py               # Model architecture definition
+├── train.py               # Training script
+├── test.py                # Inference script
+├── Mini-Vision-V1.pth     # Trained model weights
+├── README.md
+└── assets
+      ├── train_loss.png   # Visualized train loss graph
+      └── test_loss.png    # Visualized test loss graph
+```
+## License
+This project is licensed under the MIT License.

assets/test_loss.png ADDED Viewed

assets/train_loss.png ADDED Viewed

model.py ADDED Viewed

	@@ -0,0 +1,48 @@

+import torch
+import torch.nn as nn
+class MyNetwork(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.model = nn.Sequential(
+            nn.Conv2d(3, 32, 5, padding=2),
+            nn.BatchNorm2d(32),
+            nn.ReLU(),
+            nn.MaxPool2d(2),
+            nn.Conv2d(32, 64, 5, padding=2),
+            nn.BatchNorm2d(64),
+            nn.ReLU(),
+            nn.MaxPool2d(2),
+            nn.Conv2d(64, 128, 5, padding=2),
+            nn.BatchNorm2d(128),
+            nn.ReLU(),
+            nn.MaxPool2d(2),
+            nn.Conv2d(128, 256, 5, padding=2),
+            nn.BatchNorm2d(256),
+            nn.ReLU(),
+            nn.MaxPool2d(2),
+            nn.Flatten(),
+            nn.Linear(1024, 256),
+            nn.ReLU(),
+            nn.Dropout(0.5),
+            nn.Linear(256, 10)
+        )
+    def forward(self, x):
+        x = self.model(x)
+        return x
+if __name__ == '__main__':
+    mynetwork = MyNetwork()
+    input = torch.ones((64, 3, 32, 32))
+    output = mynetwork(input)
+    print(output.shape)
+    total_params = sum(p.numel() for p in mynetwork.parameters())
+    print(f"Total params：{total_params}")
+    print(f"Total params：{total_params / 1000000}M")

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+torch
+torchvision
+tqdm
+pillow

test.py ADDED Viewed

	@@ -0,0 +1,29 @@

+import torch
+import torchvision
+from PIL import Image
+from model import *
+test_data = torchvision.datasets.CIFAR10("CIFAR10", False, download=False)
+print(test_data.class_to_idx)
+image_path = ""     # Your test image
+image = Image.open(image_path)
+print(image)
+image = image.convert("RGB")
+transform = torchvision.transforms.Compose([torchvision.transforms.Resize((32, 32)),
+                                            torchvision.transforms.ToTensor()])
+image = transform(image)
+print(image.shape)
+model = torch.load("./Mini-Vision-V1.pth", weights_only=False)
+image = torch.reshape(image, (1, 3, 32, 32))
+model.eval()
+with torch.no_grad():
+    output = model(image)
+print(output)
+print(output.argmax(1))

train.py ADDED Viewed

	@@ -0,0 +1,120 @@

+import os
+import sys
+import torchvision
+from model import *
+import torch.nn as nn
+from torch.utils.data import DataLoader
+from torch.utils.tensorboard import SummaryWriter
+from tqdm import tqdm
+# dir configs
+save_dir = "mini-vision"
+if not os.path.exists(save_dir):
+    os.mkdir(save_dir)
+# visualization
+writer = SummaryWriter("mini-vision-logs")
+# training config
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+batchsize = 256
+learning_rate = 7e-3
+# dataset preprocessing
+train_transforms = torchvision.transforms.Compose([torchvision.transforms.RandomCrop(32, 4),
+                                                   torchvision.transforms.RandomHorizontalFlip(),
+                                                   torchvision.transforms.ToTensor()])
+# dataset
+train_data = torchvision.datasets.CIFAR10("CIFAR10", True, train_transforms,
+                                          download=True)
+test_data = torchvision.datasets.CIFAR10("CIFAR10", False, torchvision.transforms.ToTensor(),
+                                          download=True)
+# dataset length
+train_data_size = len(train_data)
+test_data_size = len(test_data)
+print(train_data_size)
+print(test_data_size)
+# load dataset
+train_dataloader = DataLoader(train_data, batchsize, True)
+test_dataloader = DataLoader(test_data, batchsize, False)
+# create model
+mynetwork = MyNetwork().to(device)
+# loss function
+loss_fn = nn.CrossEntropyLoss().to(device)
+# optimizer
+optimizer = torch.optim.SGD(mynetwork.parameters(), learning_rate, 0.9)
+schedular = torch.optim.lr_scheduler.StepLR(optimizer, 5, 0.5)
+# training records
+# record train step
+total_train_step = 0
+# record test step
+total_test_step = 0
+# training epochs
+epoch = 100
+for i in range(epoch):
+    print(f"---------------Epoch {i + 1} start, LR：{optimizer.param_groups[0]["lr"]}---------------")
+    # start training
+    mynetwork.train()
+    total_train_loss = 0
+    print("Training Progress: ", flush=True)
+    for data in tqdm(train_dataloader, file=sys.stdout):
+        imgs, targets = data
+        imgs = imgs.to(device)
+        targets = targets.to(device)
+        output = mynetwork(imgs)
+        loss = loss_fn(output, targets)
+        # optim model
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+        total_train_loss += loss.item()
+        total_train_step += 1
+        writer.add_scalar("train_loss", loss.item(), total_train_step + 1)
+    train_loss_num = train_data_size / batchsize
+    total_train_loss /= train_loss_num
+    print(f"Total avg loss on train data: {total_train_loss:.2f}", flush=True)
+    # start testing
+    mynetwork.eval()
+    total_test_loss = 0
+    total_accuracy = 0
+    with torch.no_grad():
+        print("Testing Progress", flush=True)
+        for data in tqdm(test_dataloader, file=sys.stdout):
+            imgs, targets = data
+            imgs = imgs.to(device)
+            targets = targets.to(device)
+            output = mynetwork(imgs)
+            loss = loss_fn(output, targets)
+            total_test_loss += loss.item()
+            accuracy = (output.argmax(1) == targets).sum()
+            total_accuracy += accuracy
+        accuracy_percentage = round(float(total_accuracy / test_data_size * 100), 2)
+        test_loss_num = test_data_size / batchsize
+        total_test_loss /= test_loss_num
+        print(f"Total avg loss on test data: {total_test_loss:.2f}", flush=True)
+        print(f"Accuracy on test data: {accuracy_percentage}%", flush=True)
+        writer.add_scalar("test_loss", total_test_loss, total_test_step + 1)
+        writer.add_scalar("test_accuracy", accuracy_percentage, total_test_step + 1)
+        total_test_step += 1
+    schedular.step()
+    torch.save(mynetwork, f"{save_dir}/Mini-Vision-V1{i + 1}.pth")     # save every epoch
+    print("Model saved", flush=True)
+writer.close()