Spaces:
Sleeping
Sleeping
| ๐ต **Music Playing** | |
| ๐ **Welcome!** Today, weโre learning about **Deep Neural Networks**โa cool way computers learn! ๐ง ๐ก | |
| ## ๐ค What is a Neural Network? | |
| Imagine a brain made of tiny switches called **neurons**. These neurons work together to make smart decisions! | |
| ### ๐ข Input Layer | |
| This is where we give the network information, like pictures or numbers. | |
| ### ๐ต Hidden Layers | |
| These layers are like **magic helpers** that figure out patterns! | |
| - More neurons = better learning ๐ค | |
| - Too many neurons = can be **confusing** (overfitting) ๐ต | |
| ### ๐ด Output Layer | |
| This is where the network **gives us answers!** ๐ | |
| --- | |
| ## ๐ Building a Deep Neural Network in PyTorch | |
| We can **build a deep neural network** using PyTorch, a tool that helps computers learn. ๐ฅ๏ธ | |
| ### ๐ Layers of Our Network | |
| 1๏ธโฃ **First Hidden Layer:** Has `H1` neurons. | |
| 2๏ธโฃ **Second Hidden Layer:** Has `H2` neurons. | |
| 3๏ธโฃ **Output Layer:** Decides the final answer! ๐ฏ | |
| --- | |
| ## ๐ How Does It Work? | |
| 1๏ธโฃ **Start with an input (x).** | |
| 2๏ธโฃ **Pass through each layer:** | |
| - Apply **math functions** (like `sigmoid`, `tanh`, or `ReLU`). | |
| - These help the network understand better! ๐งฉ | |
| 3๏ธโฃ **Get the final answer!** โ | |
| --- | |
| ## ๐จ Different Activation Functions | |
| Activation functions help the network **think better!** ๐ง | |
| - **Sigmoid** โ Good for small problems ๐ค | |
| - **Tanh** โ Works better for deeper networks ๐ | |
| - **ReLU** โ Super strong for big tasks! ๐ | |
| --- | |
| ## ๐ข Example: Recognizing Handwritten Numbers | |
| We train the network with **MNIST**, a dataset of handwritten numbers. ๐๐ข | |
| - **Input:** 784 pixels (28x28 images) ๐ธ | |
| - **Hidden Layers:** 50 neurons each ๐ค | |
| - **Output:** 10 neurons (digits 0-9) ๐ | |
| --- | |
| ## ๐ Training the Network | |
| We use **Stochastic Gradient Descent (SGD)** to teach the network! ๐ | |
| - **Loss Function:** Helps the network learn from mistakes. โโกโ | |
| - **Validation Accuracy:** Checks how well the network is doing! ๐ฏ | |
| --- | |
| ## ๐ What We Learned | |
| โ Deep Neural Networks have **many hidden layers**. | |
| โ Different **activation functions** help improve performance. | |
| โ The more layers we add, the **smarter** the network becomes! ๐ก | |
| --- | |
| ๐ **Great job!** Now, let's build and train our own deep neural networks! ๐๏ธ๐คโจ | |
| ----------------------------------------------------------------------------------- | |
| ๐ต **Music Playing** | |
| ๐ **Welcome!** Today, weโll learn how to **build a deep neural network** in PyTorch using `nn.ModuleList`. ๐ง ๐ก | |
| ## ๐ค Why Use `nn.ModuleList`? | |
| Instead of adding layers **one by one** (which takes a long time โณ), we can **automate** the process! ๐ | |
| --- | |
| ## ๐ Building the Neural Network | |
| We create a **list** called `layers` ๐: | |
| - **First item:** Input size (e.g., `2` features). | |
| - **Second item:** Neurons in the **first hidden layer** (e.g., `3`). | |
| - **Third item:** Neurons in the **second hidden layer** (e.g., `4`). | |
| - **Fourth item:** Output size (number of classes, e.g., `3`). | |
| --- | |
| ## ๐ Constructing the Network | |
| ### ๐น Step 1: Create Layers | |
| - We loop through the list, taking **two elements at a time**: | |
| - **First element:** Input size ๐ฏ | |
| - **Second element:** Output size (number of neurons) ๐งฉ | |
| ### ๐น Step 2: Connecting Layers | |
| - First **hidden layer** โ Input size = `2`, Neurons = `3` | |
| - Second **hidden layer** โ Input size = `3`, Neurons = `4` | |
| - **Output layer** โ Input size = `4`, Output size = `3` | |
| --- | |
| ## โก Forward Function | |
| We **pass data** through the network: | |
| 1๏ธโฃ **Apply linear transformation** to each layer โ Makes calculations ๐งฎ | |
| 2๏ธโฃ **Apply activation function** (`ReLU`) โ Helps the network learn ๐ | |
| 3๏ธโฃ **For the last layer**, we only apply **linear transformation** (since it's a classification task ๐ฏ). | |
| --- | |
| ## ๐ฏ Training the Network | |
| The **training process** is similar to before! We: | |
| - Use a **dataset** ๐ | |
| - Try **different combinations** of neurons and layers ๐ค | |
| - See which setup gives the **best performance**! ๐ | |
| --- | |
| ๐ **Awesome!** Now, letโs explore ways to make these networks even **better!** ๐ | |
| ----------------------------------------------------------------------------------- | |
| ๐ต **Music Playing** | |
| ๐ **Welcome!** Today, weโre learning about **weight initialization** in Neural Networks! ๐ง โก | |
| ## ๐ค Why Does Weight Initialization Matter? | |
| If we **donโt** choose good starting weights, our neural network **wonโt learn properly**! ๐จ | |
| Sometimes, **all neurons** in a layer get the **same weights**, which causes problems. | |
| --- | |
| ## ๐ How PyTorch Handles Weights | |
| PyTorch **automatically** picks starting weights, but we can also set them **ourselves**! ๐ง | |
| Letโs see what happens when we: | |
| - Set **all weights to 1** and **bias to 0** โ โ **Bad idea!** | |
| - Randomly choose weights from a **uniform distribution** โ โ **Better!** | |
| --- | |
| ## ๐ The Problem with Random Weights | |
| We use a **uniform distribution** (random values between -1 and 1). But: | |
| - **Too small?** โ Weights donโt change much ๐ค | |
| - **Too large?** โ **Vanishing gradient** problem ๐ต | |
| ### ๐ Whatโs a Vanishing Gradient? | |
| If weights are **too big**, activations get **too large**, and the **gradient shrinks to zero**. | |
| That means the network **stops learning**! ๐ซ | |
| --- | |
| ## ๐ Fixing the Problem | |
| ### ๐ฏ Solution: Scale Weights Based on Neurons | |
| We scale the weight range based on **how many neurons** we have: | |
| - **2 neurons?** โ Scale by **1/2** | |
| - **4 neurons?** โ Scale by **1/4** | |
| - **100 neurons?** โ Scale by **1/100** | |
| This prevents the vanishing gradient issue! โ | |
| --- | |
| ## ๐ฌ Different Weight Initialization Methods | |
| ### ๐ **1. Default PyTorch Method** | |
| - PyTorch **automatically** picks a range: | |
| - **Lower bound:** `-1 / sqrt(L_in)` | |
| - **Upper bound:** `+1 / sqrt(L_in)` | |
| ### ๐ต **2. Xavier Initialization** | |
| - Best for **tanh** activation | |
| - Uses the **number of input and output neurons** | |
| - We apply `xavier_uniform_()` to set the weights | |
| ### ๐ด **3. He Initialization** | |
| - Best for **ReLU** activation | |
| - Uses the **He initialization method** | |
| - We apply `he_uniform_()` to set the weights | |
| --- | |
| ## ๐ Which One is Best? | |
| We compare: | |
| โ **PyTorch Default** | |
| โ **Xavier Method** (tanh) | |
| โ **He Method** (ReLU) | |
| The **Xavier and He methods** help the network **learn faster**! ๐ | |
| --- | |
| ๐ **Great job!** Now, letโs try different weight initializations and see what works best! ๐๏ธ๐ฌ | |
| ------------------------------------------------------------------------------------------------ | |
| ๐ต **Music Playing** | |
| ๐ **Welcome!** Today, weโre learning about **Gradient Descent with Momentum**! ๐๐ | |
| ## ๐ค Whatโs the Problem? | |
| Sometimes, when training a neural network, the model can get **stuck**: | |
| - **Saddle Points** โ Flat areas where learning stops ๐๏ธ | |
| - **Local Minima** โ Not the best solution, but we get trapped ๐ | |
| --- | |
| ## ๐โโ๏ธ What is Momentum? | |
| Momentum helps the model **keep moving** even when it gets stuck! ๐จ | |
| Itโs like rolling a ball downhill: | |
| - **Gradient (Force)** โ Tells us where to go ๐ | |
| - **Momentum (Mass)** โ Helps us keep moving even on flat surfaces โก | |
| --- | |
| ## ๐ How Does It Work? | |
| ### ๐น Step 1: Compute Velocity | |
| - Velocity (`v`) = Old velocity (`v_k`) + Learning step (`gradient * learning rate`) | |
| - The **momentum term** (๐) controls how much we keep from the past. | |
| ### ๐น Step 2: Update Weights | |
| - New weight (`w_k+1`) = Old weight (`w_k`) - Learning rate * Velocity | |
| The bigger the **momentum**, the harder it is to stop moving! ๐โโ๏ธ๐จ | |
| --- | |
| ## โ ๏ธ Why Does It Help? | |
| ### ๐๏ธ **Saddle Points** | |
| - **Without Momentum** โ Model **stops** moving in flat areas โ | |
| - **With Momentum** โ Keeps moving **past** the flat spots โ | |
| ### โฌ **Local Minima** | |
| - **Without Momentum** โ Gets **stuck** in a bad spot ๐ | |
| - **With Momentum** โ Pushes through and **finds a better solution!** ๐ฏ | |
| --- | |
| ## ๐ Picking the Right Momentum | |
| - **Too Small?** โ Model gets **stuck** ๐ | |
| - **Too Large?** โ Model **overshoots** the best answer ๐ | |
| - **Best Choice?** โ We test different values and pick what works! ๐ฌ | |
| --- | |
| ## ๐ Using Momentum in PyTorch | |
| Just add the **momentum** value to the optimizer! | |
| ```python | |
| optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5) | |
| ``` | |
| In the lab, we test **different momentum values** on a dataset and see how they affect learning! ๐ | |
| --- | |
| ๐ **Great job!** Now, letโs experiment with momentum and see how it helps our model! ๐๏ธโก | |
| ------------------------------------------------------------------------------------------ | |
| ๐ต **Music Playing** | |
| ๐ **Welcome!** Today, weโre learning about **Batch Normalization**! ๐๐ | |
| ## ๐ค Whatโs the Problem? | |
| When training a neural network, the activations (outputs) can vary a lot, making learning **slower** and **unstable**. ๐ | |
| Batch Normalization **fixes this** by: | |
| โ Making activations more consistent | |
| โ Helping the network learn faster | |
| โ Reducing problems like vanishing gradients | |
| --- | |
| ## ๐ How Does Batch Normalization Work? | |
| ### ๐ Step 1: Normalize Each Mini-Batch | |
| For each neuron in a layer: | |
| 1๏ธโฃ Compute the **mean** and **standard deviation** of its activations. ๐ | |
| 2๏ธโฃ Normalize the outputs using: | |
| \[ | |
| z' = \frac{z - \text{mean}}{\text{std dev} + \epsilon} | |
| \] | |
| (We add a **small** value `ฮต` to avoid division by zero.) | |
| ### ๐ Step 2: Scale and Shift | |
| - Instead of leaving activations at 0 and 1, we **scale** and **shift** them: | |
| \[ | |
| z'' = \gamma \cdot z' + \beta | |
| \] | |
| - **ฮณ (scale) and ฮฒ (shift)** are **learned** during training! ๐๏ธโโ๏ธ | |
| --- | |
| ## ๐ฌ Example: Normalizing Activations | |
| - **First Mini-Batch (X1)** โ Compute mean & std for each neuron, normalize, then scale & shift | |
| - **Second Mini-Batch (X2)** โ Repeat for new batch! โป | |
| - **Next Layer** โ Apply batch normalization again! ๐ | |
| ### ๐ Prediction Time | |
| - During **training**, we compute the mean & std for **each batch**. | |
| - During **testing**, we use the **population mean & std** instead. ๐ | |
| --- | |
| ## ๐ Using Batch Normalization in PyTorch | |
| ```python | |
| import torch.nn as nn | |
| class NeuralNetwork(nn.Module): | |
| def __init__(self): | |
| super(NeuralNetwork, self).__init__() | |
| self.fc1 = nn.Linear(10, 3) # First layer (10 inputs, 3 neurons) | |
| self.bn1 = nn.BatchNorm1d(3) # Batch Norm for first layer | |
| self.fc2 = nn.Linear(3, 4) # Second layer (3 inputs, 4 neurons) | |
| self.bn2 = nn.BatchNorm1d(4) # Batch Norm for second layer | |
| def forward(self, x): | |
| x = self.bn1(self.fc1(x)) # Apply Batch Norm | |
| x = self.bn2(self.fc2(x)) # Apply Batch Norm again | |
| return x | |
| ``` | |
| - **Training?** Set the model to **train mode** ๐๏ธโโ๏ธ | |
| ```python | |
| model.train() | |
| ``` | |
| - **Predicting?** Use **evaluation mode** ๐ | |
| ```python | |
| model.eval() | |
| ``` | |
| --- | |
| ## ๐ Why Does Batch Normalization Work? | |
| ### โ Helps Gradient Descent Work Better | |
| - Normalized data = **smoother** loss function ๐ฏ | |
| - Gradients point in the **right** direction = Faster learning! ๐ | |
| ### โ Reduces Vanishing Gradient Problem | |
| - Sigmoid & Tanh activations suffer from small gradients ๐ข | |
| - Normalization **keeps activations in a good range** ๐ | |
| ### โ Allows Higher Learning Rates | |
| - Networks can **train faster** without getting unstable โฉ | |
| ### โ Reduces Need for Dropout | |
| - Some studies show **Batch Norm can replace Dropout** ๐คฏ | |
| --- | |
| ๐ **Great job!** Now, letโs try batch normalization in our own models! ๐๏ธ๐ | |