MartialTerran commited on
Commit
c5a62f5
·
verified ·
1 Parent(s): 8331011

Update Google-ML-Crash-Course_MNIST_model.py

Browse files
Google-ML-Crash-Course_MNIST_model.py CHANGED
@@ -3,6 +3,29 @@
3
  """
4
  Corrected and upgraded by the Martial Terran, from
5
  https://github.com/spiderPan/Google-Machine-Learning-Crash-Course/blob/master/multi-class_classfication_of_handwritten_digits.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  """
7
 
8
  import glob
 
3
  """
4
  Corrected and upgraded by the Martial Terran, from
5
  https://github.com/spiderPan/Google-Machine-Learning-Crash-Course/blob/master/multi-class_classfication_of_handwritten_digits.py
6
+
7
+ The architecture of this model is not a CNN (Convolutional Neural Network).
8
+ It is a Dense Neural Network (DNN), also commonly known as a Multilayer Perceptron (MLP).
9
+ Let's break down why and look at the specific architecture.
10
+ Why it's a DNN and Not a CNN
11
+ The defining characteristic of a CNN is its use of convolutional layers (Conv2D). These layers are specifically designed to work with grid-like data, such as images. They use filters (or kernels) to slide across the input image, detecting spatial patterns like edges, textures, and shapes.
12
+ This model does not use any convolutional layers. Instead, its core components are Dense layers (tf.keras.layers.Dense).
13
+ DNN Approach: The 28x28 pixel image is flattened into a single vector of 784 numbers. The Dense layers treat these numbers as a simple list, with no inherent understanding that pixel #29 is directly below pixel #1. It learns patterns from the pixel values themselves, but loses all the spatial relationships between them.
14
+ CNN Approach: A CNN would take the input as a 2D grid (e.g., shape=(28, 28, 1)) and use Conv2D layers to analyze neighboring pixels, preserving the spatial structure of the image.
15
+ The Specific Architecture of this Model
16
+ You can see the exact architecture from the code or by printing the model's summary (model.summary()).
17
+ Based on the code with hidden_units = [100, 100], the architecture is as follows:
18
+ Layer # Layer Type Description Output Shape
19
+ 1 Input A flat vector of 784 pixel values (28x28). (None, 784)
20
+ 2 Dense First fully-connected hidden layer. Every one of its 100 neurons is connected to all 784 input pixels. (None, 100)
21
+ 3 Dense Second fully-connected hidden layer. Every one of its 100 neurons is connected to all 100 neurons before it. (None, 100)
22
+ 4 Dropout Regularization layer. Randomly sets 20% of neuron activations to zero during training to prevent overfitting. (None, 100)
23
+ 5 Dense The final output layer. It has 10 neurons, one for each class (digits 0-9). (None, 10)
24
+ Softmax The activation function on the output layer that converts the outputs into a probability distribution. (None, 10)
25
+ (Note: "None" in the output shape refers to the batch size, which can vary.)
26
+ In summary:
27
+ It's a DNN/MLP: It uses stacked Dense (fully-connected) layers.
28
+ It's not a CNN: It lacks Conv2D and MaxPooling2D layers, and it flattens the image data, discarding the crucial 2D spatial information that CNNs are built to exploit.
29
  """
30
 
31
  import glob