LWWZH commited on
Commit
5b6d90c
·
verified ·
1 Parent(s): 3c3868c

Upload Mini-Vision-V2

Browse files
Files changed (7) hide show
  1. Mini-Vision-V2.pth +3 -0
  2. README.md +116 -0
  3. assets/test_loss.png +0 -0
  4. assets/train_loss.png +0 -0
  5. demo.py +36 -0
  6. model.py +38 -0
  7. train.py +104 -0
Mini-Vision-V2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:956dc9f1d99f82ca47163888b760fcf1080379972c6c1651ca73db49c9956851
3
+ size 3310189
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - image-classification
6
+ - mnist
7
+ - cnn
8
+ - computer-vision
9
+ - pytorch
10
+ - mini-vision
11
+ - mini-vision-series
12
+ metrics:
13
+ - accuracy
14
+ pipeline_tag: image-classification
15
+ datasets:
16
+ - ylecun/mnist
17
+ ---
18
+
19
+ # Mini-Vision-V2: MNIST Handwritten Digit Classifier
20
+
21
+ ![Model Size](https://img.shields.io/badge/Params-0.82M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-99.3%25-green)
22
+
23
+ Welcome to **Mini-Vision-V2**, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.
24
+
25
+ ## Model Description
26
+
27
+ Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only **0.82M parameters** (significantly smaller than V1), it achieves **99.3% accuracy** on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.
28
+
29
+ - **Dataset**: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes)
30
+ - **Framework**: PyTorch
31
+ - **Total Parameters**: 0.82M
32
+
33
+ ## Model Architecture
34
+
35
+ The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.
36
+
37
+ | Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
38
+ | :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
39
+ | **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
40
+ | **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
41
+ | **Flatten** | - | - | - | - | - | - | Output: 3136 |
42
+ | **Linear 1** | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) |
43
+ | **Linear 2** | 256 | 10 | - | - | - | - | - |
44
+
45
+ ## Training Strategy
46
+
47
+ The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.
48
+
49
+ - **Optimizer**: SGD (Momentum=0.8)
50
+ - **Initial Learning Rate**: 0.01
51
+ - **Scheduler**: StepLR (Step size=3, Gamma=0.5)
52
+ - **Loss Function**: CrossEntropyLoss
53
+ - **Batch Size**: 256
54
+ - **Epochs**: 40 (Best model)
55
+ - **Data Augmentation**:
56
+ - Random Crop (28x28 with padding=2)
57
+ - Random Rotation (10 degrees)
58
+
59
+ ## Performance
60
+
61
+ The model achieved outstanding results on the MNIST test set:
62
+
63
+ | Metric | Value |
64
+ | :--- | :---: |
65
+ | **Test Accuracy** | **99.3%** |
66
+ | Test Loss | 0.0235 |
67
+ | Train Loss | 0.0615 |
68
+ | Parameters | 0.82M |
69
+
70
+ ### Training Visualization (TensorBoard)
71
+
72
+ Below are the training and testing curves visualized via TensorBoard.
73
+
74
+ #### 1. Training Loss
75
+
76
+ ![Training Loss](assets/train_loss.png)
77
+ *(Recorded every epoch)*
78
+
79
+ #### 2. Test Loss & Accuracy
80
+ ![Test Loss](assets/test_loss.png)
81
+ *(Recorded every epoch)*
82
+
83
+ ## Quick Start
84
+
85
+ ### Dependencies
86
+ - Python 3.x
87
+ - PyTorch
88
+ - Torchvision
89
+ - Gradio (for demo)
90
+ - Datasets
91
+
92
+ ### Inference / Web Demo
93
+
94
+ Run the Gradio demo to draw numbers and see predictions in real-time:
95
+
96
+ ```bash
97
+ python demo.py
98
+ ```
99
+
100
+ ## File Structure
101
+
102
+ ```
103
+ .
104
+ ├── model.py # Model architecture definition
105
+ ├── train.py # Training script
106
+ ├── demo.py # Gradio Web Interface
107
+ ├── Mini-Vision-V2.pth # Trained model weights
108
+ ├── README.md
109
+ └── assets
110
+ ├── train_loss.png # Visualized train loss graph
111
+ └── test_loss.png # Visualized test loss graph
112
+ ```
113
+
114
+ ## License
115
+
116
+ This project is licensed under the MIT License.
assets/test_loss.png ADDED
assets/train_loss.png ADDED
demo.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from model import MiniVisionV2
2
+ import torch
3
+ import torchvision
4
+ import gradio as gr
5
+ import webbrowser
6
+
7
+ minivisionv2 = torch.load("Mini-Vision-V2.pth", weights_only=False)
8
+ minivisionv2.eval()
9
+
10
+ transform = torchvision.transforms.Compose([torchvision.transforms.Resize(28),
11
+ torchvision.transforms.ToTensor()])
12
+
13
+ def classifier(img):
14
+ input = transform(img["composite"])
15
+ input = 1.0 - input
16
+ tensor = input.unsqueeze(0)
17
+ with torch.no_grad():
18
+ output = minivisionv2(tensor)
19
+ output = torch.softmax(output, dim=1)
20
+
21
+ result = {}
22
+ for i in range(10):
23
+ result[str(i)] = output[0][i].item()
24
+ return result
25
+
26
+
27
+ demo = gr.Interface(fn=classifier,
28
+ inputs=gr.Sketchpad(height=280, width=280, image_mode="L", label="Sketch Pad", type="pil"),
29
+ outputs=gr.Label(label="Classifying Results"),
30
+ title="Mini-Vision-V2",
31
+ description="Write number 0-9 in the sketch pad below"
32
+ )
33
+
34
+ if __name__ == '__main__':
35
+ webbrowser.open("http://127.0.0.1:7860")
36
+ demo.launch(share=True)
model.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+
4
+ class MiniVisionV2(nn.Module):
5
+ def __init__(self):
6
+ super().__init__()
7
+
8
+ self.model = nn.Sequential(
9
+ nn.Conv2d(1, 32, 3, padding=1),
10
+ nn.BatchNorm2d(32),
11
+ nn.ReLU(),
12
+ nn.MaxPool2d(2, 2),
13
+
14
+ nn.Conv2d(32, 64, 3, padding=1),
15
+ nn.BatchNorm2d(64),
16
+ nn.ReLU(),
17
+ nn.MaxPool2d(2, 2),
18
+
19
+ nn.Flatten(),
20
+ nn.Linear(3136, 256),
21
+ nn.ReLU(),
22
+ nn.Dropout(0.3),
23
+ nn.Linear(256, 10)
24
+ )
25
+
26
+ def forward(self, x):
27
+ x = self.model(x)
28
+ return x
29
+
30
+ if __name__ == '__main__':
31
+ minivisionv2 = MiniVisionV2()
32
+ params = sum(p.numel() for p in minivisionv2.parameters())
33
+ print(f"Total params: {params / 1000000:,}M")
34
+
35
+ input = torch.randn(64, 1, 28, 28)
36
+ with torch.no_grad():
37
+ output = minivisionv2(input)
38
+ print(output)
train.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import torch
3
+ import sys
4
+ from torch import nn
5
+ import torchvision
6
+ from datasets import load_dataset
7
+ from torch.utils.data import DataLoader
8
+ from model import MiniVisionV2
9
+ from torch.utils.tensorboard import SummaryWriter
10
+ from tqdm import tqdm
11
+
12
+
13
+ save_path = "minivisionv2_model"
14
+ batchsize = 256
15
+ learningrate = 1e-2
16
+ epoch = 50
17
+
18
+ if not os.path.exists(save_path):
19
+ os.mkdir(save_path)
20
+
21
+ writer = SummaryWriter("minivisionv2_logs")
22
+
23
+ dataset = load_dataset("ylecun/mnist")
24
+
25
+ transform_train = torchvision.transforms.Compose([
26
+ torchvision.transforms.RandomCrop(28, 2),
27
+ torchvision.transforms.RandomRotation(10),
28
+ torchvision.transforms.ToTensor()
29
+ ])
30
+ transform_test = torchvision.transforms.Compose([
31
+ torchvision.transforms.ToTensor(),
32
+ ])
33
+
34
+ def transforms_train(data):
35
+ data["tensor"] = [transform_train(img) for img in data["image"]]
36
+ return data
37
+
38
+ def transforms_test(data):
39
+ data["tensor"] = [transform_test(img) for img in data["image"]]
40
+ return data
41
+
42
+ train_dataset = dataset["train"].with_transform(transforms_train)
43
+ test_dataset = dataset["test"].with_transform(transforms_test)
44
+
45
+ def collate_fn(batch):
46
+ return {
47
+ "tensor": torch.stack([x["tensor"] for x in batch]),
48
+ "label": torch.tensor([x["label"] for x in batch])
49
+ }
50
+
51
+ train_loader = DataLoader(train_dataset, batchsize, True, collate_fn=collate_fn)
52
+ test_loader = DataLoader(test_dataset, batchsize, False, collate_fn=collate_fn)
53
+
54
+
55
+ minivisionv2 = MiniVisionV2()
56
+ loss_fn = nn.CrossEntropyLoss()
57
+ optimizer = torch.optim.SGD(minivisionv2.parameters(), learningrate, 0.8)
58
+ scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, 0.5)
59
+
60
+ for i in range(epoch):
61
+ print(f"=============== Epoch {i} Start | LR: {optimizer.param_groups[0]["lr"]} ===============")
62
+
63
+ minivisionv2.train()
64
+ total_train_loss = 0
65
+ for data in tqdm(train_loader, file=sys.stdout):
66
+ optimizer.zero_grad()
67
+ imgs = data["tensor"]
68
+ labels = data["label"]
69
+ output = minivisionv2(imgs)
70
+ loss = loss_fn(output, labels)
71
+ loss.backward()
72
+ optimizer.step()
73
+
74
+ total_train_loss += loss.item()
75
+ total_avg_train_loss = total_train_loss / len(train_loader)
76
+ print(f"Train loss: {total_avg_train_loss}")
77
+ writer.add_scalar("Train Loss", total_avg_train_loss, i)
78
+
79
+ minivisionv2.eval()
80
+ with torch.no_grad():
81
+
82
+ total_accuracy = 0
83
+ total_test_loss = 0
84
+ for data in tqdm(test_loader, file=sys.stdout):
85
+ imgs = data["tensor"]
86
+ labels = data["label"]
87
+ output = minivisionv2(imgs)
88
+ loss = loss_fn(output, labels)
89
+ total_test_loss += loss
90
+ accuracy = (output.argmax(1) == labels).sum()
91
+ total_accuracy += accuracy.item()
92
+
93
+ total_avg_test_loss = total_test_loss / len(test_loader)
94
+ total_accuracy_percentage = round(float(total_accuracy / len(test_dataset) * 100), 2)
95
+ print(f"Test loss: {total_avg_test_loss}")
96
+ print(f"Test Accuracy Percentage: {total_accuracy_percentage}%")
97
+ writer.add_scalar("Test Loss", total_avg_test_loss, i)
98
+ writer.add_scalar("Test Accuracy Percentage", total_accuracy_percentage, i)
99
+
100
+ torch.save(minivisionv2, f"./{save_path}/Mini-Vision-V2-Epoch-{i}.pth")
101
+ print("Model Saved!")
102
+ scheduler.step()
103
+
104
+ writer.close()