Update Google-ML-Crash-Course_MNIST_model.py
Browse files
Google-ML-Crash-Course_MNIST_model.py
CHANGED
|
@@ -3,6 +3,29 @@
|
|
| 3 |
"""
|
| 4 |
Corrected and upgraded by the Martial Terran, from
|
| 5 |
https://github.com/spiderPan/Google-Machine-Learning-Crash-Course/blob/master/multi-class_classfication_of_handwritten_digits.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
import glob
|
|
|
|
| 3 |
"""
|
| 4 |
Corrected and upgraded by the Martial Terran, from
|
| 5 |
https://github.com/spiderPan/Google-Machine-Learning-Crash-Course/blob/master/multi-class_classfication_of_handwritten_digits.py
|
| 6 |
+
|
| 7 |
+
The architecture of this model is not a CNN (Convolutional Neural Network).
|
| 8 |
+
It is a Dense Neural Network (DNN), also commonly known as a Multilayer Perceptron (MLP).
|
| 9 |
+
Let's break down why and look at the specific architecture.
|
| 10 |
+
Why it's a DNN and Not a CNN
|
| 11 |
+
The defining characteristic of a CNN is its use of convolutional layers (Conv2D). These layers are specifically designed to work with grid-like data, such as images. They use filters (or kernels) to slide across the input image, detecting spatial patterns like edges, textures, and shapes.
|
| 12 |
+
This model does not use any convolutional layers. Instead, its core components are Dense layers (tf.keras.layers.Dense).
|
| 13 |
+
DNN Approach: The 28x28 pixel image is flattened into a single vector of 784 numbers. The Dense layers treat these numbers as a simple list, with no inherent understanding that pixel #29 is directly below pixel #1. It learns patterns from the pixel values themselves, but loses all the spatial relationships between them.
|
| 14 |
+
CNN Approach: A CNN would take the input as a 2D grid (e.g., shape=(28, 28, 1)) and use Conv2D layers to analyze neighboring pixels, preserving the spatial structure of the image.
|
| 15 |
+
The Specific Architecture of this Model
|
| 16 |
+
You can see the exact architecture from the code or by printing the model's summary (model.summary()).
|
| 17 |
+
Based on the code with hidden_units = [100, 100], the architecture is as follows:
|
| 18 |
+
Layer # Layer Type Description Output Shape
|
| 19 |
+
1 Input A flat vector of 784 pixel values (28x28). (None, 784)
|
| 20 |
+
2 Dense First fully-connected hidden layer. Every one of its 100 neurons is connected to all 784 input pixels. (None, 100)
|
| 21 |
+
3 Dense Second fully-connected hidden layer. Every one of its 100 neurons is connected to all 100 neurons before it. (None, 100)
|
| 22 |
+
4 Dropout Regularization layer. Randomly sets 20% of neuron activations to zero during training to prevent overfitting. (None, 100)
|
| 23 |
+
5 Dense The final output layer. It has 10 neurons, one for each class (digits 0-9). (None, 10)
|
| 24 |
+
Softmax The activation function on the output layer that converts the outputs into a probability distribution. (None, 10)
|
| 25 |
+
(Note: "None" in the output shape refers to the batch size, which can vary.)
|
| 26 |
+
In summary:
|
| 27 |
+
It's a DNN/MLP: It uses stacked Dense (fully-connected) layers.
|
| 28 |
+
It's not a CNN: It lacks Conv2D and MaxPooling2D layers, and it flattens the image data, discarding the crucial 2D spatial information that CNNs are built to exploit.
|
| 29 |
"""
|
| 30 |
|
| 31 |
import glob
|