diff --git "a/static/main/cnn-explained.html" "b/static/main/cnn-explained.html" new file mode 100644--- /dev/null +++ "b/static/main/cnn-explained.html" @@ -0,0 +1,655 @@ +AI in Oral Cancer Diagnosis

Convolutional Neural Network Visualized

Explore how CNNs process images through multiple layers of abstraction

Input Image

The CNN process begins with a raw input image. For our example, we'll use a handwritten digit '5'. The image is represented as a matrix of pixel values.

Handwritten digit 5
Raw Input Image

Digital Representation

Computers see images as arrays of numbers. Each pixel is represented as a value between 0 (black) and 255 (white) for grayscale images.

Matrix Representation
[
  [0, 0, 0, 0, 0, 0, 0, 0],
  [0, 0, 110, 190, 253, 70, 0, 0],
  [0, 0, 191, 40, 0, 191, 0, 0],
  [0, 0, 160, 0, 0, 120, 0, 0],
  [0, 0, 127, 195, 210, 20, 0, 0],
  [0, 0, 0, 0, 40, 173, 0, 0],
  [0, 0, 75, 60, 20, 230, 0, 0],
  [0, 0, 90, 230, 180, 35, 0, 0]
]

Convolution Operation

The convolution operation slides a filter (kernel) across the input image to detect features like edges, textures, or patterns.

Kernel Sliding

Kernel/Filter

-1
-1
-1
-1
8
-1
-1
-1
-1

Edge detection filter

Feature Map

0.7
0.2
0.0
0.5
0.3
0.4
0.8
0.1
0.1
0.7
0.3
0.8
0.1
0.6
0.5
0.8
0.5
0.4
0.7
0.4
0.2
0.1
0.2
0.5
0.3
0.8
0.6
0.0
0.1
0.6
0.0
0.4
0.4
0.3
0.2
0.4

Resulting feature map from convolution operation

ReLU Activation

The Rectified Linear Unit (ReLU) introduces non-linearity to the network by converting all negative values to zero, allowing the network to learn complex patterns.

Before ReLU

0.5
-0.3
0.8
-0.7
0.2
0.9
-0.6
0.4
-0.2
0.5
-0.8
0.1
0.7
-0.5
0.3
-0.9
0.2
-0.4
-0.1
0.6
-0.3
0.8
-0.5
0.2
0.9
-0.2
0.4
-0.7
0.3
-0.6
-0.8
0.1
-0.5
0.3
-0.9
0.7

Feature map contains both positive and negative values

f(x) = max(0, x)
Negative values
Positive values

After ReLU

0.5
0.0
0.8
0.0
0.2
0.9
0.0
0.4
0.0
0.5
0.0
0.1
0.7
0.0
0.3
0.0
0.2
0.0
0.0
0.6
0.0
0.8
0.0
0.2
0.9
0.0
0.4
0.0
0.3
0.0
0.0
0.1
0.0
0.3
0.0
0.7

Negative values are replaced with zeros, introducing non-linearity

Why Non-Linearity Matters

Without non-linear activation functions like ReLU, the neural network would only be able to learn linear relationships in the data, significantly limiting its ability to solve complex problems. ReLU enables the network to model more complex functions while being computationally efficient.

Pooling Layer

Pooling reduces the spatial dimensions of the feature maps, preserving the most important information while reducing computation and preventing overfitting.

Max Pooling (2×2 Window)

0.5
0.8
0.2
0.9
0.3
0.7
0.4
0.0
0.5
0.1
0.6
0.0
0.7
0.3
0.2
0.0
0.4
0.1
0.0
0.6
0.8
0.2
0.0
0.5
0.9
0.4
0.3
0.0
0.2
0.8
0.1
0.0
0.3
0.7
0.0
0.4

Max Pooling

For each 2×2 window, keep
only the maximum value

Pooled Feature Map

0.8
0.9
0.7
0.7
0.8
0.5
0.9
0.7
0.8

Benefits of Pooling

  • Reduces spatial dimensions by 75%
  • Preserves important features
  • Makes detection more robust to position
  • Reduces overfitting

Deep Layer Abstraction

As we progress through deeper layers of the CNN, the network learns increasingly abstract representations of the input image, from simple edges to complex shapes and patterns.

Layer 1: Edges & Corners

Layer 2: Simple Shapes

Layer 3: Complex Features

Hierarchy of Features

Early Layers (e.g., Layer 1)

Detect low-level features like edges, corners, and basic textures. These are the building blocks for more complex pattern recognition.

Middle Layers (e.g., Layer 2)

Combine edges and textures into more complex patterns and shapes like circles, squares, and simple object parts.

Deep Layers (e.g., Layer 3)

Recognize complex, high-level concepts specific to the training dataset, such as eyes, faces, or entire objects.

Flattening and Fully Connected Layer

The final stage of a CNN involves flattening the feature maps into a single vector and passing it through fully connected layers to make predictions.

Feature Maps to Vector

Flattening

The 2D feature maps are converted into a 1D vector by arranging all the values in a single row. This allows the network to transition from convolutional layers to fully connected layers.

Fully Connected Network

Prediction: "5"

Class
Probability
0
1%
1
2%
2
3%
3
5%
4
8%
5
65%
6
5%
7
4%
8
5%
9
2%

Learning via Backpropagation

The CNN learns by comparing its predictions with the true labels, calculating the error, and then propagating this error backward through the network to update weights.

0
1
2
3
4
Error: 0.42
Update weights
Update weights
Forward Pass
Calculate Error
Backward Pass

Backpropagation

Backpropagation calculates how much each neuron's weight contributed to the output error. It then adjusts these weights to minimize the error in future predictions, using the chain rule of calculus to distribute error responsibility throughout the network.

Gradient Descent

The network uses gradient descent to adjust weights in the direction that reduces error. By repeatedly processing many examples and making small weight updates, the model gradually improves its ability to recognize patterns and make accurate predictions.

Key CNN Components Recap

Input Layer
Convolutional Layers
Activation Functions
Pooling Layers
Fully Connected Layers
Output Layer
Backpropagation
\ No newline at end of file