EverythingIsAFont / deep_networks.md
taellinglin's picture
Upload 61 files
9dce563 verified
๐ŸŽต **Music Playing**
๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **Deep Neural Networks**โ€”a cool way computers learn! ๐Ÿง ๐Ÿ’ก
## ๐Ÿค– What is a Neural Network?
Imagine a brain made of tiny switches called **neurons**. These neurons work together to make smart decisions!
### ๐ŸŸข Input Layer
This is where we give the network information, like pictures or numbers.
### ๐Ÿ”ต Hidden Layers
These layers are like **magic helpers** that figure out patterns!
- More neurons = better learning ๐Ÿค“
- Too many neurons = can be **confusing** (overfitting) ๐Ÿ˜ต
### ๐Ÿ”ด Output Layer
This is where the network **gives us answers!** ๐Ÿ†
---
## ๐Ÿ— Building a Deep Neural Network in PyTorch
We can **build a deep neural network** using PyTorch, a tool that helps computers learn. ๐Ÿ–ฅ๏ธ
### ๐Ÿ›  Layers of Our Network
1๏ธโƒฃ **First Hidden Layer:** Has `H1` neurons.
2๏ธโƒฃ **Second Hidden Layer:** Has `H2` neurons.
3๏ธโƒฃ **Output Layer:** Decides the final answer! ๐ŸŽฏ
---
## ๐Ÿ”„ How Does It Work?
1๏ธโƒฃ **Start with an input (x).**
2๏ธโƒฃ **Pass through each layer:**
- Apply **math functions** (like `sigmoid`, `tanh`, or `ReLU`).
- These help the network understand better! ๐Ÿงฉ
3๏ธโƒฃ **Get the final answer!** โœ…
---
## ๐ŸŽจ Different Activation Functions
Activation functions help the network **think better!** ๐Ÿง 
- **Sigmoid** โž Good for small problems ๐Ÿค
- **Tanh** โž Works better for deeper networks ๐ŸŒŠ
- **ReLU** โž Super strong for big tasks! ๐Ÿš€
---
## ๐Ÿ”ข Example: Recognizing Handwritten Numbers
We train the network with **MNIST**, a dataset of handwritten numbers. ๐Ÿ“๐Ÿ”ข
- **Input:** 784 pixels (28x28 images) ๐Ÿ“ธ
- **Hidden Layers:** 50 neurons each ๐Ÿค–
- **Output:** 10 neurons (digits 0-9) ๐Ÿ”Ÿ
---
## ๐Ÿš€ Training the Network
We use **Stochastic Gradient Descent (SGD)** to teach the network! ๐Ÿ“š
- **Loss Function:** Helps the network learn from mistakes. โŒโžกโœ…
- **Validation Accuracy:** Checks how well the network is doing! ๐ŸŽฏ
---
## ๐Ÿ† What We Learned
โœ… Deep Neural Networks have **many hidden layers**.
โœ… Different **activation functions** help improve performance.
โœ… The more layers we add, the **smarter** the network becomes! ๐Ÿ’ก
---
๐ŸŽ‰ **Great job!** Now, let's build and train our own deep neural networks! ๐Ÿ—๏ธ๐Ÿค–โœจ
-----------------------------------------------------------------------------------
๐ŸŽต **Music Playing**
๐Ÿ‘‹ **Welcome!** Today, weโ€™ll learn how to **build a deep neural network** in PyTorch using `nn.ModuleList`. ๐Ÿง ๐Ÿ’ก
## ๐Ÿค– Why Use `nn.ModuleList`?
Instead of adding layers **one by one** (which takes a long time โณ), we can **automate** the process! ๐Ÿš€
---
## ๐Ÿ— Building the Neural Network
We create a **list** called `layers` ๐Ÿ“‹:
- **First item:** Input size (e.g., `2` features).
- **Second item:** Neurons in the **first hidden layer** (e.g., `3`).
- **Third item:** Neurons in the **second hidden layer** (e.g., `4`).
- **Fourth item:** Output size (number of classes, e.g., `3`).
---
## ๐Ÿ”„ Constructing the Network
### ๐Ÿ”น Step 1: Create Layers
- We loop through the list, taking **two elements at a time**:
- **First element:** Input size ๐ŸŽฏ
- **Second element:** Output size (number of neurons) ๐Ÿงฉ
### ๐Ÿ”น Step 2: Connecting Layers
- First **hidden layer** โž Input size = `2`, Neurons = `3`
- Second **hidden layer** โž Input size = `3`, Neurons = `4`
- **Output layer** โž Input size = `4`, Output size = `3`
---
## โšก Forward Function
We **pass data** through the network:
1๏ธโƒฃ **Apply linear transformation** to each layer โž Makes calculations ๐Ÿงฎ
2๏ธโƒฃ **Apply activation function** (`ReLU`) โž Helps the network learn ๐Ÿ“ˆ
3๏ธโƒฃ **For the last layer**, we only apply **linear transformation** (since it's a classification task ๐ŸŽฏ).
---
## ๐ŸŽฏ Training the Network
The **training process** is similar to before! We:
- Use a **dataset** ๐Ÿ“Š
- Try **different combinations** of neurons and layers ๐Ÿค–
- See which setup gives the **best performance**! ๐Ÿ†
---
๐ŸŽ‰ **Awesome!** Now, letโ€™s explore ways to make these networks even **better!** ๐Ÿš€
-----------------------------------------------------------------------------------
๐ŸŽต **Music Playing**
๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **weight initialization** in Neural Networks! ๐Ÿง โšก
## ๐Ÿค” Why Does Weight Initialization Matter?
If we **donโ€™t** choose good starting weights, our neural network **wonโ€™t learn properly**! ๐Ÿšจ
Sometimes, **all neurons** in a layer get the **same weights**, which causes problems.
---
## ๐Ÿš€ How PyTorch Handles Weights
PyTorch **automatically** picks starting weights, but we can also set them **ourselves**! ๐Ÿ”ง
Letโ€™s see what happens when we:
- Set **all weights to 1** and **bias to 0** โž โŒ **Bad idea!**
- Randomly choose weights from a **uniform distribution** โž โœ… **Better!**
---
## ๐Ÿ”„ The Problem with Random Weights
We use a **uniform distribution** (random values between -1 and 1). But:
- **Too small?** โž Weights donโ€™t change much ๐Ÿค
- **Too large?** โž **Vanishing gradient** problem ๐Ÿ˜ต
### ๐Ÿ“‰ Whatโ€™s a Vanishing Gradient?
If weights are **too big**, activations get **too large**, and the **gradient shrinks to zero**.
That means the network **stops learning**! ๐Ÿšซ
---
## ๐Ÿ›  Fixing the Problem
### ๐ŸŽฏ Solution: Scale Weights Based on Neurons
We scale the weight range based on **how many neurons** we have:
- **2 neurons?** โž Scale by **1/2**
- **4 neurons?** โž Scale by **1/4**
- **100 neurons?** โž Scale by **1/100**
This prevents the vanishing gradient issue! โœ…
---
## ๐Ÿ”ฌ Different Weight Initialization Methods
### ๐Ÿ— **1. Default PyTorch Method**
- PyTorch **automatically** picks a range:
- **Lower bound:** `-1 / sqrt(L_in)`
- **Upper bound:** `+1 / sqrt(L_in)`
### ๐Ÿ”ต **2. Xavier Initialization**
- Best for **tanh** activation
- Uses the **number of input and output neurons**
- We apply `xavier_uniform_()` to set the weights
### ๐Ÿ”ด **3. He Initialization**
- Best for **ReLU** activation
- Uses the **He initialization method**
- We apply `he_uniform_()` to set the weights
---
## ๐Ÿ† Which One is Best?
We compare:
โœ… **PyTorch Default**
โœ… **Xavier Method** (tanh)
โœ… **He Method** (ReLU)
The **Xavier and He methods** help the network **learn faster**! ๐Ÿš€
---
๐ŸŽ‰ **Great job!** Now, letโ€™s try different weight initializations and see what works best! ๐Ÿ—๏ธ๐Ÿ”ฌ
------------------------------------------------------------------------------------------------
๐ŸŽต **Music Playing**
๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **Gradient Descent with Momentum**! ๐Ÿš€๐Ÿ”„
## ๐Ÿค” Whatโ€™s the Problem?
Sometimes, when training a neural network, the model can get **stuck**:
- **Saddle Points** โž Flat areas where learning stops ๐Ÿ”๏ธ
- **Local Minima** โž Not the best solution, but we get trapped ๐Ÿ˜ž
---
## ๐Ÿƒโ€โ™‚๏ธ What is Momentum?
Momentum helps the model **keep moving** even when it gets stuck! ๐Ÿ’จ
Itโ€™s like rolling a ball downhill:
- **Gradient (Force)** โž Tells us where to go ๐Ÿ€
- **Momentum (Mass)** โž Helps us keep moving even on flat surfaces โšก
---
## ๐Ÿ”„ How Does It Work?
### ๐Ÿ”น Step 1: Compute Velocity
- Velocity (`v`) = Old velocity (`v_k`) + Learning step (`gradient * learning rate`)
- The **momentum term** (๐œŒ) controls how much we keep from the past.
### ๐Ÿ”น Step 2: Update Weights
- New weight (`w_k+1`) = Old weight (`w_k`) - Learning rate * Velocity
The bigger the **momentum**, the harder it is to stop moving! ๐Ÿƒโ€โ™‚๏ธ๐Ÿ’จ
---
## โš ๏ธ Why Does It Help?
### ๐Ÿ”๏ธ **Saddle Points**
- **Without Momentum** โž Model **stops** moving in flat areas โŒ
- **With Momentum** โž Keeps moving **past** the flat spots โœ…
### โฌ‡ **Local Minima**
- **Without Momentum** โž Gets **stuck** in a bad spot ๐Ÿ˜–
- **With Momentum** โž Pushes through and **finds a better solution!** ๐ŸŽฏ
---
## ๐Ÿ† Picking the Right Momentum
- **Too Small?** โž Model gets **stuck** ๐Ÿ˜•
- **Too Large?** โž Model **overshoots** the best answer ๐Ÿš€
- **Best Choice?** โž We test different values and pick what works! ๐Ÿ”ฌ
---
## ๐Ÿ›  Using Momentum in PyTorch
Just add the **momentum** value to the optimizer!
```python
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
```
In the lab, we test **different momentum values** on a dataset and see how they affect learning! ๐Ÿ“Š
---
๐ŸŽ‰ **Great job!** Now, letโ€™s experiment with momentum and see how it helps our model! ๐Ÿ—๏ธโšก
------------------------------------------------------------------------------------------
๐ŸŽต **Music Playing**
๐Ÿ‘‹ **Welcome!** Today, weโ€™re learning about **Batch Normalization**! ๐Ÿš€๐Ÿ”„
## ๐Ÿค” Whatโ€™s the Problem?
When training a neural network, the activations (outputs) can vary a lot, making learning **slower** and **unstable**. ๐Ÿ˜–
Batch Normalization **fixes this** by:
โœ… Making activations more consistent
โœ… Helping the network learn faster
โœ… Reducing problems like vanishing gradients
---
## ๐Ÿ”„ How Does Batch Normalization Work?
### ๐Ÿ— Step 1: Normalize Each Mini-Batch
For each neuron in a layer:
1๏ธโƒฃ Compute the **mean** and **standard deviation** of its activations. ๐Ÿ“Š
2๏ธโƒฃ Normalize the outputs using:
\[
z' = \frac{z - \text{mean}}{\text{std dev} + \epsilon}
\]
(We add a **small** value `ฮต` to avoid division by zero.)
### ๐Ÿ— Step 2: Scale and Shift
- Instead of leaving activations at 0 and 1, we **scale** and **shift** them:
\[
z'' = \gamma \cdot z' + \beta
\]
- **ฮณ (scale) and ฮฒ (shift)** are **learned** during training! ๐Ÿ‹๏ธโ€โ™‚๏ธ
---
## ๐Ÿ”ฌ Example: Normalizing Activations
- **First Mini-Batch (X1)** โž Compute mean & std for each neuron, normalize, then scale & shift
- **Second Mini-Batch (X2)** โž Repeat for new batch! โ™ป
- **Next Layer** โž Apply batch normalization again! ๐Ÿ”„
### ๐Ÿ† Prediction Time
- During **training**, we compute the mean & std for **each batch**.
- During **testing**, we use the **population mean & std** instead. ๐Ÿ“Š
---
## ๐Ÿ›  Using Batch Normalization in PyTorch
```python
import torch.nn as nn
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.fc1 = nn.Linear(10, 3) # First layer (10 inputs, 3 neurons)
self.bn1 = nn.BatchNorm1d(3) # Batch Norm for first layer
self.fc2 = nn.Linear(3, 4) # Second layer (3 inputs, 4 neurons)
self.bn2 = nn.BatchNorm1d(4) # Batch Norm for second layer
def forward(self, x):
x = self.bn1(self.fc1(x)) # Apply Batch Norm
x = self.bn2(self.fc2(x)) # Apply Batch Norm again
return x
```
- **Training?** Set the model to **train mode** ๐Ÿ‹๏ธโ€โ™‚๏ธ
```python
model.train()
```
- **Predicting?** Use **evaluation mode** ๐Ÿ“ˆ
```python
model.eval()
```
---
## ๐Ÿš€ Why Does Batch Normalization Work?
### โœ… Helps Gradient Descent Work Better
- Normalized data = **smoother** loss function ๐ŸŽฏ
- Gradients point in the **right** direction = Faster learning! ๐Ÿš€
### โœ… Reduces Vanishing Gradient Problem
- Sigmoid & Tanh activations suffer from small gradients ๐Ÿ˜ข
- Normalization **keeps activations in a good range** ๐Ÿ“Š
### โœ… Allows Higher Learning Rates
- Networks can **train faster** without getting unstable โฉ
### โœ… Reduces Need for Dropout
- Some studies show **Batch Norm can replace Dropout** ๐Ÿคฏ
---
๐ŸŽ‰ **Great job!** Now, letโ€™s try batch normalization in our own models! ๐Ÿ—๏ธ๐Ÿ“ˆ