LWWZH commited on
Commit
c3f9ef7
·
verified ·
1 Parent(s): 3655cc2

Upload Mini-Vision-V1 of the Mini-Vision-Series

Browse files
Files changed (8) hide show
  1. Mini-Vision-V1.pth +3 -0
  2. README.md +107 -0
  3. assets/test_loss.png +0 -0
  4. assets/train_loss.png +0 -0
  5. model.py +48 -0
  6. requirements.txt +4 -0
  7. test.py +29 -0
  8. train.py +120 -0
Mini-Vision-V1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcfd936affb42b0f41a13b831ed643f54623abe4e02b1adc1758dc7d27cc25e9
3
+ size 597966
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - image-classification
6
+ - cifar10
7
+ - cnn
8
+ - computer-vision
9
+ - pytorch
10
+ - mini-vision
11
+ - mini-vision-series
12
+ metrics:
13
+ - accuracy
14
+ pipeline_tag: image-classification
15
+ ---
16
+
17
+ # Mini-Vision-V1: CIFAR-10 CNN Classifier
18
+
19
+ Welcome to **Mini-Vision-V1**, the first model in the Mini-Vision series. This project demonstrates a robust implementation of a Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset. It is designed to be lightweight, efficient, and easy to understand, making it perfect for beginners learning PyTorch.
20
+
21
+ ## Model Description
22
+
23
+ Mini-Vision-V1 is a custom 4-layer CNN architecture. It utilizes Batch Normalization and Dropout to prevent overfitting and ensure stable training. With only **1.34M parameters**, it achieves a competitive accuracy on the CIFAR-10 test set.
24
+
25
+ - **Dataset**: [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) (32x32 color images, 10 classes)
26
+ - **Framework**: PyTorch
27
+ - **Total Parameters**: 1.34M
28
+
29
+ ## Model Architecture
30
+
31
+ The network consists of 4 convolutional blocks followed by a classifier head.
32
+
33
+ | Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
34
+ | :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
35
+ | **Conv Block 1** | 3 | 32 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
36
+ | **Conv Block 2** | 32 | 64 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
37
+ | **Conv Block 3** | 64 | 128 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
38
+ | **Conv Block 4** | 128 | 256 | 5 | 1 | 2 | ReLU | MaxPool(2), BatchNorm |
39
+ | **Flatten** | - | - | - | - | - | - | Output: 1024 |
40
+ | **Linear 1** | 1024 | 256 | - | - | - | ReLU | Dropout(0.5) |
41
+ | **Linear 2** | 256 | 10 | - | - | - | - | - |
42
+
43
+ ## Training Strategy
44
+
45
+ The model was trained using standard practices for CIFAR-10 to maximize performance on a small footprint.
46
+
47
+ - **Optimizer**: SGD (Momentum=0.9)
48
+ - **Initial Learning Rate**: 0.007
49
+ - **Scheduler**: StepLR (Step size=5, Gamma=0.5)
50
+ - **Loss Function**: CrossEntropyLoss
51
+ - **Batch Size**: 1024
52
+ - **Epochs**: Total 100 epochs, Best Accuracy 31 epoch
53
+ - **Data Augmentation**:
54
+ - Random Crop (32x32 with padding=4)
55
+ - Random Horizontal Flip
56
+
57
+ ## Performance
58
+
59
+ The model achieved the following results on the CIFAR-10 test set:
60
+
61
+ | Metric | Value |
62
+ | :--- | :---: |
63
+ | **Test Accuracy** | **78%** |
64
+ | Parameters | 1.34M |
65
+
66
+ ### Training Visualization (TensorBoard)
67
+
68
+ Below are the training and testing curves visualized via TensorBoard.
69
+
70
+ #### 1. Training Loss
71
+
72
+ ![Training Loss](assetsrain_loss.png)
73
+ *(Recorded every step)*
74
+
75
+ #### 2. Test Loss
76
+ ![Test Loss](assetsest_loss.png)
77
+ *(Recorded every epoch)*
78
+
79
+ ## Quick Start
80
+
81
+ ### Dependencies
82
+ - Python 3.x
83
+ - PyTorch
84
+ - Torchvision
85
+ - requirements.txt
86
+
87
+ ### Inference
88
+
89
+ You can easily load the model and perform inference on a single image using the **test.py** file.
90
+
91
+ ## File Structure
92
+
93
+ ```
94
+ .
95
+ ├── model.py # Model architecture definition
96
+ ├── train.py # Training script
97
+ ├── test.py # Inference script
98
+ ├── Mini-Vision-V1.pth # Trained model weights
99
+ ├── README.md
100
+ └── assets
101
+ ├── train_loss.png # Visualized train loss graph
102
+ └── test_loss.png # Visualized test loss graph
103
+ ```
104
+
105
+ ## License
106
+
107
+ This project is licensed under the MIT License.
assets/test_loss.png ADDED
assets/train_loss.png ADDED
model.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+
4
+ class MyNetwork(nn.Module):
5
+ def __init__(self):
6
+ super().__init__()
7
+ self.model = nn.Sequential(
8
+
9
+ nn.Conv2d(3, 32, 5, padding=2),
10
+ nn.BatchNorm2d(32),
11
+ nn.ReLU(),
12
+ nn.MaxPool2d(2),
13
+
14
+ nn.Conv2d(32, 64, 5, padding=2),
15
+ nn.BatchNorm2d(64),
16
+ nn.ReLU(),
17
+ nn.MaxPool2d(2),
18
+
19
+ nn.Conv2d(64, 128, 5, padding=2),
20
+ nn.BatchNorm2d(128),
21
+ nn.ReLU(),
22
+ nn.MaxPool2d(2),
23
+
24
+ nn.Conv2d(128, 256, 5, padding=2),
25
+ nn.BatchNorm2d(256),
26
+ nn.ReLU(),
27
+ nn.MaxPool2d(2),
28
+
29
+ nn.Flatten(),
30
+ nn.Linear(1024, 256),
31
+ nn.ReLU(),
32
+ nn.Dropout(0.5),
33
+ nn.Linear(256, 10)
34
+ )
35
+
36
+
37
+ def forward(self, x):
38
+ x = self.model(x)
39
+ return x
40
+
41
+ if __name__ == '__main__':
42
+ mynetwork = MyNetwork()
43
+ input = torch.ones((64, 3, 32, 32))
44
+ output = mynetwork(input)
45
+ print(output.shape)
46
+ total_params = sum(p.numel() for p in mynetwork.parameters())
47
+ print(f"Total params:{total_params}")
48
+ print(f"Total params:{total_params / 1000000}M")
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ torch
2
+ torchvision
3
+ tqdm
4
+ pillow
test.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torchvision
3
+ from PIL import Image
4
+ from model import *
5
+
6
+ test_data = torchvision.datasets.CIFAR10("CIFAR10", False, download=False)
7
+ print(test_data.class_to_idx)
8
+
9
+ image_path = "" # Your test image
10
+ image = Image.open(image_path)
11
+ print(image)
12
+ image = image.convert("RGB")
13
+
14
+ transform = torchvision.transforms.Compose([torchvision.transforms.Resize((32, 32)),
15
+ torchvision.transforms.ToTensor()])
16
+
17
+ image = transform(image)
18
+ print(image.shape)
19
+
20
+ model = torch.load("./Mini-Vision-V1.pth", weights_only=False)
21
+
22
+ image = torch.reshape(image, (1, 3, 32, 32))
23
+
24
+ model.eval()
25
+ with torch.no_grad():
26
+ output = model(image)
27
+
28
+ print(output)
29
+ print(output.argmax(1))
train.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import torchvision
4
+ from model import *
5
+ import torch.nn as nn
6
+ from torch.utils.data import DataLoader
7
+ from torch.utils.tensorboard import SummaryWriter
8
+ from tqdm import tqdm
9
+
10
+ # dir configs
11
+ save_dir = "mini-vision"
12
+ if not os.path.exists(save_dir):
13
+ os.mkdir(save_dir)
14
+
15
+ # visualization
16
+ writer = SummaryWriter("mini-vision-logs")
17
+
18
+ # training config
19
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
20
+ batchsize = 256
21
+ learning_rate = 7e-3
22
+
23
+ # dataset preprocessing
24
+ train_transforms = torchvision.transforms.Compose([torchvision.transforms.RandomCrop(32, 4),
25
+ torchvision.transforms.RandomHorizontalFlip(),
26
+ torchvision.transforms.ToTensor()])
27
+
28
+ # dataset
29
+ train_data = torchvision.datasets.CIFAR10("CIFAR10", True, train_transforms,
30
+ download=True)
31
+ test_data = torchvision.datasets.CIFAR10("CIFAR10", False, torchvision.transforms.ToTensor(),
32
+ download=True)
33
+
34
+ # dataset length
35
+ train_data_size = len(train_data)
36
+ test_data_size = len(test_data)
37
+ print(train_data_size)
38
+ print(test_data_size)
39
+
40
+ # load dataset
41
+ train_dataloader = DataLoader(train_data, batchsize, True)
42
+ test_dataloader = DataLoader(test_data, batchsize, False)
43
+
44
+ # create model
45
+ mynetwork = MyNetwork().to(device)
46
+
47
+ # loss function
48
+ loss_fn = nn.CrossEntropyLoss().to(device)
49
+
50
+ # optimizer
51
+ optimizer = torch.optim.SGD(mynetwork.parameters(), learning_rate, 0.9)
52
+ schedular = torch.optim.lr_scheduler.StepLR(optimizer, 5, 0.5)
53
+
54
+ # training records
55
+ # record train step
56
+ total_train_step = 0
57
+ # record test step
58
+ total_test_step = 0
59
+ # training epochs
60
+ epoch = 100
61
+
62
+
63
+
64
+ for i in range(epoch):
65
+ print(f"---------------Epoch {i + 1} start, LR:{optimizer.param_groups[0]["lr"]}---------------")
66
+ # start training
67
+ mynetwork.train()
68
+ total_train_loss = 0
69
+ print("Training Progress: ", flush=True)
70
+ for data in tqdm(train_dataloader, file=sys.stdout):
71
+ imgs, targets = data
72
+ imgs = imgs.to(device)
73
+ targets = targets.to(device)
74
+
75
+ output = mynetwork(imgs)
76
+ loss = loss_fn(output, targets)
77
+
78
+ # optim model
79
+ optimizer.zero_grad()
80
+ loss.backward()
81
+ optimizer.step()
82
+
83
+ total_train_loss += loss.item()
84
+ total_train_step += 1
85
+ writer.add_scalar("train_loss", loss.item(), total_train_step + 1)
86
+ train_loss_num = train_data_size / batchsize
87
+ total_train_loss /= train_loss_num
88
+ print(f"Total avg loss on train data: {total_train_loss:.2f}", flush=True)
89
+
90
+ # start testing
91
+ mynetwork.eval()
92
+ total_test_loss = 0
93
+ total_accuracy = 0
94
+ with torch.no_grad():
95
+ print("Testing Progress", flush=True)
96
+ for data in tqdm(test_dataloader, file=sys.stdout):
97
+ imgs, targets = data
98
+ imgs = imgs.to(device)
99
+ targets = targets.to(device)
100
+
101
+ output = mynetwork(imgs)
102
+ loss = loss_fn(output, targets)
103
+ total_test_loss += loss.item()
104
+ accuracy = (output.argmax(1) == targets).sum()
105
+ total_accuracy += accuracy
106
+
107
+ accuracy_percentage = round(float(total_accuracy / test_data_size * 100), 2)
108
+ test_loss_num = test_data_size / batchsize
109
+ total_test_loss /= test_loss_num
110
+ print(f"Total avg loss on test data: {total_test_loss:.2f}", flush=True)
111
+ print(f"Accuracy on test data: {accuracy_percentage}%", flush=True)
112
+ writer.add_scalar("test_loss", total_test_loss, total_test_step + 1)
113
+ writer.add_scalar("test_accuracy", accuracy_percentage, total_test_step + 1)
114
+ total_test_step += 1
115
+
116
+ schedular.step()
117
+ torch.save(mynetwork, f"{save_dir}/Mini-Vision-V1{i + 1}.pth") # save every epoch
118
+ print("Model saved", flush=True)
119
+
120
+ writer.close()