Upload Mini-Vision-V2

Browse files

Files changed (7) hide show

Mini-Vision-V2.pth +3 -0
README.md +116 -0
assets/test_loss.png +0 -0
assets/train_loss.png +0 -0
demo.py +36 -0
model.py +38 -0
train.py +104 -0

Mini-Vision-V2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:956dc9f1d99f82ca47163888b760fcf1080379972c6c1651ca73db49c9956851
+size 3310189

README.md ADDED Viewed

	@@ -0,0 +1,116 @@

+---
+license: mit
+library_name: pytorch
+tags:
+- image-classification
+- mnist
+- cnn
+- computer-vision
+- pytorch
+- mini-vision
+- mini-vision-series
+metrics:
+- accuracy
+pipeline_tag: image-classification
+datasets:
+- ylecun/mnist
+---
+# Mini-Vision-V2: MNIST Handwritten Digit Classifier
+![Model Size](https://img.shields.io/badge/Params-0.82M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-99.3%25-green)
+Welcome to **Mini-Vision-V2**, the second model in the Mini-Vision series. Following the CIFAR-10 classification task in V1, this model focuses on the classic MNIST handwritten digit recognition task. It features a lightweight CNN architecture optimized for grayscale images, achieving high accuracy with extremely low computational cost.
+## Model Description
+Mini-Vision-V2 is a custom 2-layer CNN architecture tailored for 28x28 grayscale images. Despite having only **0.82M parameters** (significantly smaller than V1), it achieves **99.3% accuracy** on the MNIST test set. This project serves as an excellent example of how efficient CNNs can be for simpler, structured datasets.
+- **Dataset**: [MNIST](https://huggingface.co/datasets/ylecun/mnist) (28x28 grayscale images, 10 classes)
+- **Framework**: PyTorch
+- **Total Parameters**: 0.82M
+## Model Architecture
+The network utilizes a compact structure with two convolutional blocks and a fully connected classifier. Batch Normalization and Dropout are used to ensure generalization.
+| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
+| :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
+| **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
+| **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
+| **Flatten** | - | - | - | - | - | - | Output: 3136 |
+| **Linear 1** | 3136 | 256 | - | - | - | ReLU | Dropout(0.3) |
+| **Linear 2** | 256 | 10 | - | - | - | - | - |
+## Training Strategy
+The training strategy focuses on rapid convergence using SGD with momentum and a StepLR scheduler.
+- **Optimizer**: SGD (Momentum=0.8)
+- **Initial Learning Rate**: 0.01
+- **Scheduler**: StepLR (Step size=3, Gamma=0.5)
+- **Loss Function**: CrossEntropyLoss
+- **Batch Size**: 256
+- **Epochs**: 40 (Best model)
+- **Data Augmentation**:
+  - Random Crop (28x28 with padding=2)
+  - Random Rotation (10 degrees)
+## Performance
+The model achieved outstanding results on the MNIST test set:
+| Metric | Value |
+| :--- | :---: |
+| **Test Accuracy** | **99.3%** |
+| Test Loss | 0.0235 |
+| Train Loss | 0.0615 |
+| Parameters | 0.82M |
+### Training Visualization (TensorBoard)
+Below are the training and testing curves visualized via TensorBoard.
+#### 1. Training Loss
+![Training Loss](assets/train_loss.png)
+*(Recorded every epoch)*
+#### 2. Test Loss & Accuracy
+![Test Loss](assets/test_loss.png)
+*(Recorded every epoch)*
+## Quick Start
+### Dependencies
+- Python 3.x
+- PyTorch
+- Torchvision
+- Gradio (for demo)
+- Datasets
+### Inference / Web Demo
+Run the Gradio demo to draw numbers and see predictions in real-time:
+```bash
+python demo.py
+```
+## File Structure
+```
+.
+├── model.py               # Model architecture definition
+├── train.py               # Training script
+├── demo.py                # Gradio Web Interface
+├── Mini-Vision-V2.pth     # Trained model weights
+├── README.md
+└── assets
+      ├── train_loss.png   # Visualized train loss graph
+      └── test_loss.png    # Visualized test loss graph
+```
+## License
+This project is licensed under the MIT License.

assets/test_loss.png ADDED Viewed

assets/train_loss.png ADDED Viewed

demo.py ADDED Viewed

	@@ -0,0 +1,36 @@

+from model import MiniVisionV2
+import torch
+import torchvision
+import gradio as gr
+import webbrowser
+minivisionv2 = torch.load("Mini-Vision-V2.pth", weights_only=False)
+minivisionv2.eval()
+transform = torchvision.transforms.Compose([torchvision.transforms.Resize(28),
+                                            torchvision.transforms.ToTensor()])
+def classifier(img):
+    input = transform(img["composite"])
+    input = 1.0 - input
+    tensor = input.unsqueeze(0)
+    with torch.no_grad():
+        output = minivisionv2(tensor)
+        output = torch.softmax(output, dim=1)
+    result = {}
+    for i in range(10):
+        result[str(i)] = output[0][i].item()
+    return result
+demo = gr.Interface(fn=classifier,
+                    inputs=gr.Sketchpad(height=280, width=280, image_mode="L", label="Sketch Pad", type="pil"),
+                    outputs=gr.Label(label="Classifying Results"),
+                    title="Mini-Vision-V2",
+                    description="Write number 0-9 in the sketch pad below"
+                    )
+if __name__ == '__main__':
+    webbrowser.open("http://127.0.0.1:7860")
+    demo.launch(share=True)

model.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import torch
+import torch.nn as nn
+class MiniVisionV2(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.model = nn.Sequential(
+            nn.Conv2d(1, 32, 3, padding=1),
+            nn.BatchNorm2d(32),
+            nn.ReLU(),
+            nn.MaxPool2d(2, 2),
+            nn.Conv2d(32, 64, 3, padding=1),
+            nn.BatchNorm2d(64),
+            nn.ReLU(),
+            nn.MaxPool2d(2, 2),
+            nn.Flatten(),
+            nn.Linear(3136, 256),
+            nn.ReLU(),
+            nn.Dropout(0.3),
+            nn.Linear(256, 10)
+        )
+    def forward(self, x):
+        x = self.model(x)
+        return x
+if __name__ == '__main__':
+    minivisionv2 = MiniVisionV2()
+    params = sum(p.numel() for p in minivisionv2.parameters())
+    print(f"Total params: {params / 1000000:,}M")
+    input = torch.randn(64, 1, 28, 28)
+    with torch.no_grad():
+        output = minivisionv2(input)
+    print(output)

train.py ADDED Viewed

	@@ -0,0 +1,104 @@

+import os
+import torch
+import sys
+from torch import nn
+import torchvision
+from datasets import load_dataset
+from torch.utils.data import DataLoader
+from model import MiniVisionV2
+from torch.utils.tensorboard import SummaryWriter
+from tqdm import tqdm
+save_path = "minivisionv2_model"
+batchsize = 256
+learningrate = 1e-2
+epoch = 50
+if not os.path.exists(save_path):
+    os.mkdir(save_path)
+writer = SummaryWriter("minivisionv2_logs")
+dataset = load_dataset("ylecun/mnist")
+transform_train = torchvision.transforms.Compose([
+    torchvision.transforms.RandomCrop(28, 2),
+    torchvision.transforms.RandomRotation(10),
+    torchvision.transforms.ToTensor()
+])
+transform_test = torchvision.transforms.Compose([
+    torchvision.transforms.ToTensor(),
+])
+def transforms_train(data):
+    data["tensor"] = [transform_train(img) for img in data["image"]]
+    return data
+def transforms_test(data):
+    data["tensor"] = [transform_test(img) for img in data["image"]]
+    return data
+train_dataset = dataset["train"].with_transform(transforms_train)
+test_dataset = dataset["test"].with_transform(transforms_test)
+def collate_fn(batch):
+    return {
+        "tensor": torch.stack([x["tensor"] for x in batch]),
+        "label": torch.tensor([x["label"] for x in batch])
+    }
+train_loader = DataLoader(train_dataset, batchsize, True, collate_fn=collate_fn)
+test_loader = DataLoader(test_dataset, batchsize, False, collate_fn=collate_fn)
+minivisionv2 = MiniVisionV2()
+loss_fn = nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(minivisionv2.parameters(), learningrate, 0.8)
+scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, 0.5)
+for i in range(epoch):
+    print(f"=============== Epoch {i} Start | LR: {optimizer.param_groups[0]["lr"]} ===============")
+    minivisionv2.train()
+    total_train_loss = 0
+    for data in tqdm(train_loader, file=sys.stdout):
+        optimizer.zero_grad()
+        imgs = data["tensor"]
+        labels = data["label"]
+        output = minivisionv2(imgs)
+        loss = loss_fn(output, labels)
+        loss.backward()
+        optimizer.step()
+        total_train_loss += loss.item()
+    total_avg_train_loss = total_train_loss / len(train_loader)
+    print(f"Train loss: {total_avg_train_loss}")
+    writer.add_scalar("Train Loss", total_avg_train_loss, i)
+    minivisionv2.eval()
+    with torch.no_grad():
+        total_accuracy = 0
+        total_test_loss = 0
+        for data in tqdm(test_loader, file=sys.stdout):
+            imgs = data["tensor"]
+            labels = data["label"]
+            output = minivisionv2(imgs)
+            loss = loss_fn(output, labels)
+            total_test_loss += loss
+            accuracy = (output.argmax(1) == labels).sum()
+            total_accuracy += accuracy.item()
+        total_avg_test_loss = total_test_loss / len(test_loader)
+        total_accuracy_percentage = round(float(total_accuracy / len(test_dataset) * 100), 2)
+        print(f"Test loss: {total_avg_test_loss}")
+        print(f"Test Accuracy Percentage: {total_accuracy_percentage}%")
+        writer.add_scalar("Test Loss", total_avg_test_loss, i)
+        writer.add_scalar("Test Accuracy Percentage", total_accuracy_percentage, i)
+    torch.save(minivisionv2, f"./{save_path}/Mini-Vision-V2-Epoch-{i}.pth")
+    print("Model Saved!")
+    scheduler.step()
+writer.close()