compendious commited on
Commit
bba885e
·
verified ·
1 Parent(s): e6f5f64

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +40 -41
  2. backend/model.py +36 -16
  3. frontend/main.js +1 -4
README.md CHANGED
@@ -11,7 +11,7 @@ app_port: 7860
11
 
12
  ![Demonstration](https://varak.dev/host/EMNIST-OCR-DEMO.webp)
13
 
14
- Use the tool live [here](https://ocr.varak.dev).
15
 
16
  This is an optical character recognition (OCR) tool that extracts characters from drawings. It's made with FastAPI for the backend, vanilla JS/HTML for the frontend, and the model was trained using PyTorch.
17
 
@@ -21,7 +21,7 @@ The only two routes for the backend are `/` for the home page and `/predict` for
21
 
22
  ## Usage
23
 
24
- Once again, the tool is hosted [here](https://ocr.varak.dev) for easy access.
25
 
26
  Draw the character you want to recognize on the left canvas. The right canvas will display the top-k predictions, where k can be adjusted using the slider below it. The slider is capped at 15 since, after 15, all of the predictions are basically guaranteed to be at 0% probability.
27
 
@@ -31,67 +31,66 @@ To run this project locally, take the following steps:
31
 
32
  1. Clone this repository.
33
 
34
- ```bash
35
- git clone https://github.com/intelligent-username/OCR
36
- cd OCR
37
- ```
38
 
39
  2. Install the Python dependencies to a virtual environment.
40
 
41
- ```bash
42
- python -m venv OCR-env
43
- OCR-env\Scripts\activate # On Windows
44
- source OCR-env/bin/activate # On mac/Linux
45
- pip install -r requirements.txt
46
- ```
47
 
48
  3. Run the backend.
49
 
50
- <!--
 
 
51
 
52
- Note that, to run the backend from the `backend/` folder, some adjustments to the file paths in `app.py` need to be made, since this version of the project is for the HuggingFace deployment, which uses the root directory as the working directory. The only real difference will be to add `../` to the files paths. Here's the list of changes to make in `app.py`:
53
 
54
- Change lines 5 and 6 to:
55
 
56
- ```python
57
- from utils import predict_image
58
- from model import EMNIST_VGG
59
 
60
- ```
 
 
61
 
62
- - Change line 40 to:
63
 
64
- ```python
65
- model.load_state_dict(t.load("EMNIST_CNN.pth", map_location=device, weights_only=True))
66
 
67
- ```
 
68
 
69
- - Change line 43 to:
70
 
71
- ```python
72
- model = t.load("EMNIST_CNN.pth", map_location=device, weights_only=False)
73
- ```
74
 
75
- - Change line 47 to:
 
 
76
 
77
- ```python
78
- app.mount("/static", StaticFiles(directory="frontend"), name="static")
79
- ```
80
 
81
- - Change line 51 to:
 
 
82
 
83
- ```python
84
- path = os.path.join("..", "frontend", "index.html")
85
- ```
86
 
87
- You may want to run it from the backend folder if you really want to avoid typing `backend.` at the beginning of the uvicorn command.
 
 
88
 
89
- -->
90
 
91
-
92
- ```bash
93
- uvicorn backend.app:app --reload
94
- ```
95
 
96
  4. Once the backend is running, go to [http://127.0.0.1:8000/](http://127.0.0.1:8000/) in your web browser to access the frontend. This link will appear in the terminal when you run the backend.
97
 
 
11
 
12
  ![Demonstration](https://varak.dev/host/EMNIST-OCR-DEMO.webp)
13
 
14
+ Use the tool [live here](https://ocr.varak.dev).
15
 
16
  This is an optical character recognition (OCR) tool that extracts characters from drawings. It's made with FastAPI for the backend, vanilla JS/HTML for the frontend, and the model was trained using PyTorch.
17
 
 
21
 
22
  ## Usage
23
 
24
+ Once again, the tool is hosted [on this webpage](https://ocr.varak.dev) for easy access.
25
 
26
  Draw the character you want to recognize on the left canvas. The right canvas will display the top-k predictions, where k can be adjusted using the slider below it. The slider is capped at 15 since, after 15, all of the predictions are basically guaranteed to be at 0% probability.
27
 
 
31
 
32
  1. Clone this repository.
33
 
34
+ ```bash
35
+ git clone https://github.com/intelligent-username/OCR
36
+ cd OCR
37
+ ```
38
 
39
  2. Install the Python dependencies to a virtual environment.
40
 
41
+ ```bash
42
+ python -m venv OCR-env
43
+ OCR-env\Scripts\activate # On Windows
44
+ source OCR-env/bin/activate # On mac/Linux
45
+ pip install -r requirements.txt
46
+ ```
47
 
48
  3. Run the backend.
49
 
50
+ ```bash
51
+ uvicorn backend.app:app --reload
52
+ ```
53
 
54
+ <!--
55
 
56
+ Note that, to run the backend from the `backend/` folder, some adjustments to the file paths in `app.py` need to be made, since this version of the project is for the HuggingFace deployment, which uses the root directory as the working directory. The only real difference will be to add `../` to the files paths. Here's the list of changes to make in `app.py`:
57
 
58
+ Change lines 5 and 6 to:
 
 
59
 
60
+ ```python
61
+ from utils import predict_image
62
+ from model import EMNIST_VGG
63
 
64
+ ```
65
 
66
+ - Change line 40 to:
 
67
 
68
+ ```python
69
+ model.load_state_dict(t.load("EMNIST_CNN.pth", map_location=device, weights_only=True))
70
 
71
+ ```
72
 
73
+ - Change line 43 to:
 
 
74
 
75
+ ```python
76
+ model = t.load("EMNIST_CNN.pth", map_location=device, weights_only=False)
77
+ ```
78
 
79
+ - Change line 47 to:
 
 
80
 
81
+ ```python
82
+ app.mount("/static", StaticFiles(directory="frontend"), name="static")
83
+ ```
84
 
85
+ - Change line 51 to:
 
 
86
 
87
+ ```python
88
+ path = os.path.join("..", "frontend", "index.html")
89
+ ```
90
 
91
+ You may want to run it from the backend folder if you really want to avoid typing `backend.` at the beginning of the uvicorn command.
92
 
93
+ -->
 
 
 
94
 
95
  4. Once the backend is running, go to [http://127.0.0.1:8000/](http://127.0.0.1:8000/) in your web browser to access the frontend. This link will appear in the terminal when you run the backend.
96
 
backend/model.py CHANGED
@@ -1,28 +1,31 @@
1
  """
2
  VGG-style CNN for EMNIST character classification.
3
  See the README for a more detailed description.
4
- """
5
 
6
- # The .pth file (weights) for this model will be downloaded from HuggingFace by app.py
7
- # It's hosted at https://huggingface.co/compendious/EMNIST-OCR-WEIGHTS/
8
- # The file is EMNIST_CNN.pth
9
- # Go here to download directly:
10
- # https://huggingface.co/compendious/EMNIST-OCR-WEIGHTS/resolve/main/EMNIST_CNN.pth?download=true
 
 
11
 
12
  import torch
13
  import torch.nn as nn
14
  import torch.nn.functional as F
15
 
16
  class ConvBlock(nn.Module):
17
- """Convolutional block: 2 conv layers, ReLU, MaxPool"""
18
  def __init__(self, in_channels, out_channels, padding=1, pool_kernel=2, pool_stride=2):
19
  super(ConvBlock, self).__init__()
20
  self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=padding)
21
  self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=padding)
22
  self.pool = nn.MaxPool2d(kernel_size=pool_kernel, stride=pool_stride)
 
23
  def forward(self, x):
24
- x = F.relu(self.conv1(x))
25
- x = F.relu(self.conv2(x))
 
26
  x = self.pool(x)
27
  return x
28
 
@@ -35,41 +38,58 @@ class EMNIST_VGG(nn.Module):
35
  def __init__(self, num_classes=62):
36
  super(EMNIST_VGG, self).__init__()
37
 
38
- # The two blocks
39
  self.conv1 = ConvBlock(in_channels=1, out_channels=32, pool_kernel=2, pool_stride=2)
40
  self.bn1 = nn.BatchNorm2d(32)
 
41
  self.conv2 = ConvBlock(in_channels=32, out_channels=64, pool_stride=2)
42
  self.bn2 = nn.BatchNorm2d(64)
 
43
  self.conv3 = ConvBlock(in_channels=64, out_channels=128, pool_stride=1)
44
  self.bn3 = nn.BatchNorm2d(128)
 
45
  self.conv4 = ConvBlock(in_channels=128, out_channels=256, pool_stride=1)
46
  self.bn4 = nn.BatchNorm2d(256)
47
 
 
 
 
 
48
  # Flatten layer (no parameters needed, only reshaping)
49
  self.flatten = nn.Flatten()
50
 
51
- # (Since the Dense layers just take flat inputs)
52
-
53
  # Two fully-connected layers
54
 
55
- # For the first layer, notice that, due to the stride and pool sizes, we need to adjust the input size to 256 * 5 * 5
56
- self.fc1 = nn.Linear(256 * 5 * 5, 256)
 
 
57
  self.dropout = nn.Dropout(p=0.5)
58
 
59
  # Classifier
60
- self.fc2 = nn.Linear(256, num_classes)
61
 
62
  def forward(self, x):
63
  x = self.conv1(x)
64
  x = self.bn1(x)
 
65
  x = self.conv2(x)
66
  x = self.bn2(x)
 
 
67
  x = self.conv3(x)
68
  x = self.bn3(x)
 
 
69
  x = self.conv4(x)
70
  x = self.bn4(x)
 
71
  x = self.flatten(x)
72
- x = F.relu(self.fc1(x))
 
 
 
 
73
  x = self.dropout(x)
74
  x = self.fc2(x)
75
  return x
 
1
  """
2
  VGG-style CNN for EMNIST character classification.
3
  See the README for a more detailed description.
 
4
 
5
+ The .pth file (weights) for this model will be downloaded from HuggingFace by app.py
6
+ It's hosted at https://huggingface.co/compendious/EMNIST-OCR-WEIGHTS/
7
+ The file is EMNIST_CNN.pth
8
+ Go here to download directly:
9
+ https://huggingface.co/compendious/EMNIST-OCR-WEIGHTS/resolve/main/EMNIST_CNN.pth?download=true
10
+
11
+ """
12
 
13
  import torch
14
  import torch.nn as nn
15
  import torch.nn.functional as F
16
 
17
  class ConvBlock(nn.Module):
18
+ """Convolutional block: 2 conv layers, LeakyReLU, MaxPool"""
19
  def __init__(self, in_channels, out_channels, padding=1, pool_kernel=2, pool_stride=2):
20
  super(ConvBlock, self).__init__()
21
  self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=padding)
22
  self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=padding)
23
  self.pool = nn.MaxPool2d(kernel_size=pool_kernel, stride=pool_stride)
24
+
25
  def forward(self, x):
26
+ # CHANGE 1: LeakyReLU prevents "dead neurons," critical for 62-class differentiation.
27
+ x = F.leaky_relu(self.conv1(x), negative_slope=0.1)
28
+ x = F.leaky_relu(self.conv2(x), negative_slope=0.1)
29
  x = self.pool(x)
30
  return x
31
 
 
38
  def __init__(self, num_classes=62):
39
  super(EMNIST_VGG, self).__init__()
40
 
41
+ # The four blocks
42
  self.conv1 = ConvBlock(in_channels=1, out_channels=32, pool_kernel=2, pool_stride=2)
43
  self.bn1 = nn.BatchNorm2d(32)
44
+
45
  self.conv2 = ConvBlock(in_channels=32, out_channels=64, pool_stride=2)
46
  self.bn2 = nn.BatchNorm2d(64)
47
+
48
  self.conv3 = ConvBlock(in_channels=64, out_channels=128, pool_stride=1)
49
  self.bn3 = nn.BatchNorm2d(128)
50
+
51
  self.conv4 = ConvBlock(in_channels=128, out_channels=256, pool_stride=1)
52
  self.bn4 = nn.BatchNorm2d(256)
53
 
54
+ # CHANGE 2: Spatial Dropout.
55
+ # Drops entire feature maps to force redundancy, unlike standard dropout.
56
+ self.spatial_drop = nn.Dropout2d(p=0.1)
57
+
58
  # Flatten layer (no parameters needed, only reshaping)
59
  self.flatten = nn.Flatten()
60
 
 
 
61
  # Two fully-connected layers
62
 
63
+ # CHANGE 3: Expanded Width (256 -> 512).
64
+ # Your Keras model used 512; 256 is a bottleneck for 62 classes.
65
+ self.fc1 = nn.Linear(256 * 5 * 5, 512)
66
+ self.bn_fc = nn.BatchNorm1d(512) # Added BN to the dense layer for stability
67
  self.dropout = nn.Dropout(p=0.5)
68
 
69
  # Classifier
70
+ self.fc2 = nn.Linear(512, num_classes)
71
 
72
  def forward(self, x):
73
  x = self.conv1(x)
74
  x = self.bn1(x)
75
+
76
  x = self.conv2(x)
77
  x = self.bn2(x)
78
+ x = self.spatial_drop(x) # Apply mild spatial regularization
79
+
80
  x = self.conv3(x)
81
  x = self.bn3(x)
82
+ x = self.spatial_drop(x)
83
+
84
  x = self.conv4(x)
85
  x = self.bn4(x)
86
+
87
  x = self.flatten(x)
88
+
89
+ # Dense Pass
90
+ x = self.fc1(x)
91
+ x = self.bn_fc(x)
92
+ x = F.leaky_relu(x, negative_slope=0.1)
93
  x = self.dropout(x)
94
  x = self.fc2(x)
95
  return x
frontend/main.js CHANGED
@@ -46,9 +46,6 @@ function performPrediction() {
46
  // console.log("Input Array: ", inputArray);
47
  // console.log("Temp: ", temp);
48
 
49
- // (Preview drawing removed — bar graph will visualize predictions)
50
-
51
- // Send to server with error handling (non-blocking)
52
  fetch("/predict", {
53
  method: "POST",
54
  headers: { "Content-Type": "application/json" },
@@ -85,7 +82,7 @@ canvas.addEventListener("mousedown", (e) => {
85
  ctx.beginPath();
86
  ctx.moveTo(x, y);
87
 
88
- if (e.button === 2) { // Right click
89
  erase = true;
90
  ctx.strokeStyle = "white";
91
  } else {
 
46
  // console.log("Input Array: ", inputArray);
47
  // console.log("Temp: ", temp);
48
 
 
 
 
49
  fetch("/predict", {
50
  method: "POST",
51
  headers: { "Content-Type": "application/json" },
 
82
  ctx.beginPath();
83
  ctx.moveTo(x, y);
84
 
85
+ if (e.button === 2) { // Right click erases
86
  erase = true;
87
  ctx.strokeStyle = "white";
88
  } else {