File size: 4,490 Bytes
3d6784f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68ea1b0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---

license: mit
library_name: pytorch
tags:
- image-classification
- emnist
- cnn
- computer-vision
- pytorch
- mini-vision
- mini-vision-series
metrics:
- accuracy
pipeline_tag: image-classification
datasets:
- emnist
---


# Mini-Vision-V3: EMNIST Balanced Handwritten Character Classifier

![Model Size](https://img.shields.io/badge/Params-0.40M-blue) ![Accuracy](https://img.shields.io/badge/Accuracy-90.06%25-green)

Welcome to **Mini-Vision-V3**, the third model in the Mini-Vision series. Following the MNIST digit recognition task in V2, this model expands capabilities to **47 classes** of handwritten characters (Digits & Uppercase & Lowercase letters) using the EMNIST Balanced dataset. It features a deeper yet highly efficient 3-layer CNN architecture, achieving over 90% accuracy with less than half a million parameters.

## Model Description

Mini-Vision-V3 is a custom 3-layer CNN architecture tailored for 28x28 grayscale images. While maintaining a lightweight footprint with only **0.40M parameters** (half the size of V2), it handles the significantly increased complexity of 47 character classes. This project demonstrates how depth and Batch Normalization can improve performance on more complex classification tasks without increasing model size.

- **Dataset**: [EMNIST Balanced](https://www.nist.gov/itl/products-and-services/emnist-dataset) (28x28 grayscale images, 47 classes)
- **Framework**: PyTorch
- **Total Parameters**: 0.40M

## Model Architecture

The network utilizes a deeper structure compared to V2, featuring three convolutional blocks. This allows for better feature extraction in the more complex 47-class task.

| Layer | Input Channels | Output Channels | Kernel Size | Stride | Padding | Activation | Other |
| :--- | :---: | :---: | :---: | :---: | :---: | :--- | :--- |
| **Conv Block 1** | 1 | 32 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 2** | 32 | 64 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Conv Block 3** | 64 | 128 | 3 | 1 | 1 | ReLU | MaxPool(2), BatchNorm |
| **Flatten** | - | - | - | - | - | - | Output: 1152 |
| **Linear 1** | 1152 | 256 | - | - | - | ReLU | Dropout(0.3) |
| **Linear 2** | 256 | 47 | - | - | - | - | - |

## Training Strategy

The training strategy was adjusted for the larger dataset and increased class complexity, utilizing a higher initial learning rate and a StepLR scheduler for convergence.

- **Optimizer**: SGD (Momentum=0.8)
- **Initial Learning Rate**: 0.05
- **Scheduler**: StepLR (Step size=5, Gamma=0.5)
- **Loss Function**: CrossEntropyLoss
- **Batch Size**: 256
- **Epochs**: 50 (Best model at Epoch 40)
- **Data Preprocessing**:
  - EMNIST specific alignment: Rotate -90 degrees and Flip Horizontal (to match standard image orientation).
  - Random Crop (28x28 with padding=2)
  - Random Rotation (10 degrees)

## Performance

The model achieved solid results on the EMNIST Balanced test set (18800 samples), selected based on the best performing epoch (Epoch 40):

| Metric | Value |
| :--- | :---: |
| **Test Accuracy** | **90.06%** |
| Test Loss | 0.28 |
| Train Loss | 0.28 |
| Parameters | 0.40M |

### Training Visualization (TensorBoard)

Below are the training and testing curves visualized via TensorBoard.

#### 1. Training Loss

![Training Loss](assets/train_loss.png)
*(Recorded every epoch)*

#### 2. Test Loss & Accuracy
![Test Loss](assets/test_loss.png)
*(Recorded every epoch)*

## Quick Start

### Dependencies
- Python 3.x
- PyTorch
- Torchvision
- Gradio (for demo)
- Pillow

### Inference / Web Demo

Run the Gradio demo to draw characters and see predictions in real-time:

```bash

python demo.py

```

*Note: The demo supports inverted drawing (white ink on black background) to match the EMNIST format.*

## File Structure

```

.

β”œβ”€β”€ model.py                    # Model architecture definition (MiniVisionV3)

β”œβ”€β”€ train.py                    # Training script

β”œβ”€β”€ demo.py                     # Gradio Web Interface

β”œβ”€β”€ Mini-Vision-V3.pth          # Trained model weights (Epoch 40)

β”œβ”€β”€ Mini-Vision-V3.safetensors  # Safetensors format model weights

β”œβ”€β”€ config.json

β”œβ”€β”€ README.md

└── assets

      β”œβ”€β”€ train_loss.png   # Visualized train loss graph

      └── test_loss.png    # Visualized test loss graph

```

## License

This project is licensed under the MIT License.