Aashish34 commited on
Commit
18550fb
·
1 Parent(s): 6b48067

add deeplearnin

Browse files
DeepLearning/Deep Learning Curriculum.html CHANGED
@@ -995,6 +995,48 @@
995
  W_out = floor((W_in + 2×padding - kernel_size) / stride) + 1<br>
996
  H_out = floor((H_in + 2×padding - kernel_size) / stride) + 1
997
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
998
  `
999
  },
1000
  "yolo": {
@@ -1113,6 +1155,28 @@
1113
  Tumor localization, cell counting, anatomical structure detection in X-rays/CT scans
1114
  </div>
1115
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1116
  `
1117
  },
1118
  "transformers": {
@@ -1854,11 +1918,34 @@
1854
  `,
1855
  applications: `
1856
  <div class="info-box">
1857
- <div class="box-title">📸 All Computer Vision Tasks</div>
1858
  <div class="box-content">
1859
- Image classification, object detection, segmentation, face recognition, OCR, medical imaging
1860
  </div>
1861
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1862
  `
1863
  },
1864
  "pooling": {
@@ -1918,6 +2005,37 @@
1918
  Used after conv layers in AlexNet, VGG, and most classic CNNs to progressively reduce spatial dimensions
1919
  </div>
1920
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1921
  `
1922
  },
1923
  "cnn-basics": {
@@ -2124,24 +2242,56 @@
2124
  Sparked AI renaissance - Google, Facebook, Microsoft pivoted to deep learning after AlexNet
2125
  </div>
2126
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2127
  `
2128
  },
2129
  "vgg": {
2130
  overview: `
2131
  <h3>VGGNet (2014) - The Power of Depth</h3>
2132
  <p>VGG showed that depth matters - 16-19 layers using only small 3×3 filters.</p>
2133
-
2134
- <h3>Key Insight: Stacking Small Filters</h3>
2135
- <p>Two 3×3 conv layers = same receptive field as one 5×5, but:</p>
2136
- <ul>
2137
- <li><strong>Fewer Parameters:</strong> 2×(3²) = 18 vs 5² = 25</li>
2138
- <li><strong>More Non-linearity:</strong> Two ReLUs instead of one</li>
2139
- <li><strong>Deeper Network:</strong> Better feature learning</li>
2140
- </ul>
2141
-
2142
- <div class="callout warning">
2143
- <div class="callout-title">⚠️ Limitation</div>
2144
- 138M parameters (VGG-16) - very memory intensive for deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2145
  </div>
2146
  `
2147
  },
@@ -2169,6 +2319,35 @@
2169
  • Won ImageNet 2015<br>
2170
  • Skip connections now used everywhere (U-Net, Transformers, etc.)
2171
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2172
  `
2173
  },
2174
  "inception": {
@@ -2180,17 +2359,35 @@
2180
  <div class="formula">
2181
  Input → [1×1 conv] ⊕ [3×3 conv] ⊕ [5×5 conv] ⊕ [3×3 pool] → Concatenate
2182
  </div>
2183
-
2184
- <h3>Key Innovation: 1×1 Convolutions</h3>
2185
- <ul>
2186
- <li><strong>Dimensionality Reduction:</strong> Reduce channels before expensive 3×3, 5×5</li>
2187
- <li><strong>Non-linearity:</strong> Add extra ReLU</li>
2188
- <li><strong>Bottleneck Design:</strong> Reduces FLOPs by 10×</li>
2189
- </ul>
2190
-
 
 
 
 
 
 
 
2191
  <div class="callout insight">
2192
- <div class="callout-title">💡 Efficiency</div>
2193
- 22 layers but only 5M parameters (27× less than AlexNet!)
 
 
 
 
 
 
 
 
 
 
 
2194
  </div>
2195
  `
2196
  },
@@ -2230,6 +2427,33 @@
2230
  • Latency-critical systems<br>
2231
  • Good accuracy with 10-20× speedup
2232
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2233
  `
2234
  },
2235
  "transfer-learning": {
 
995
  W_out = floor((W_in + 2×padding - kernel_size) / stride) + 1<br>
996
  H_out = floor((H_in + 2×padding - kernel_size) / stride) + 1
997
  </div>
998
+ `,
999
+ math: `
1000
+ <h3>The Mathematical Operation: Cross-Correlation</h3>
1001
+ <p>In deep learning, what we call "convolution" is mathematically "cross-correlation". It is a local dot product of the kernel and image patch.</p>
1002
+
1003
+ <div class="formula">
1004
+ S(i, j) = (I * K)(i, j) = Σ_m Σ_n I(i+m, j+n) K(m, n)
1005
+ </div>
1006
+
1007
+ <div class="callout insight">
1008
+ <div class="callout-title">📝 Paper & Pain: Manual Convolution</div>
1009
+ **Input (3x3):**<br>
1010
+ [1 2 0]<br>
1011
+ [0 1 1]<br>
1012
+ [1 0 2]<br>
1013
+ <br>
1014
+ **Kernel (2x2):**<br>
1015
+ [1 0]<br>
1016
+ [0 1]<br>
1017
+ <br>
1018
+ **Calculation:**<br>
1019
+ Step 1 (Top-Left): (1x1) + (2x0) + (0x0) + (1x1) = <strong>2</strong><br>
1020
+ Step 2 (Top-Right): (2x1) + (0x0) + (1x0) + (1x1) = <strong>3</strong><br>
1021
+ ... Output is a 2x2 matrix.
1022
+ </div>
1023
+
1024
+ <h3>Backprop through Conv</h3>
1025
+ <p>Calculated using the same formula but with the kernel flipped vertically and horizontally (true convolution)!</p>
1026
+ `,
1027
+ applications: `
1028
+ <div class="info-box">
1029
+ <div class="box-title">🔍 Feature Extraction</div>
1030
+ <div class="box-content">
1031
+ Early layers learn edges (Gabor-like filters), middle layers learn textures, deep layers learn specific object parts (eyes, wheels).
1032
+ </div>
1033
+ </div>
1034
+ <div class="info-box">
1035
+ <div class="box-title">🎨 Image Processing</div>
1036
+ <div class="box-content">
1037
+ Blurring, sharpening, and edge detection in Photoshop/GIMP are all done with 2D convolutions using fixed kernels.
1038
+ </div>
1039
+ </div>
1040
  `
1041
  },
1042
  "yolo": {
 
1155
  Tumor localization, cell counting, anatomical structure detection in X-rays/CT scans
1156
  </div>
1157
  </div>
1158
+ `,
1159
+ math: `
1160
+ <h3>Intersection over Union (IoU)</h3>
1161
+ <p>How do we measure if a predicted box is correct? We use the geometric ratio of intersection and union.</p>
1162
+ <div class="formula">
1163
+ IoU = Area of Overlap / Area of Union
1164
+ </div>
1165
+
1166
+ <div class="callout insight">
1167
+ <div class="callout-title">📝 Paper & Pain: Manual IoU</div>
1168
+ **Box A (GT):** [0,0,10,10] (Area=100)<br>
1169
+ **Box B (Pred):** [5,5,15,15] (Area=100)<br>
1170
+ 1. **Intersection:** Area between [5,5] and [10,10] = 5x5 = 25<br>
1171
+ 2. **Union:** Area A + Area B - Intersection = 100 + 100 - 25 = 175<br>
1172
+ 3. **IoU:** 25 / 175 ≈ <strong>0.142</strong> (Poor match!)
1173
+ </div>
1174
+
1175
+ <h3>YOLO Multi-Part Loss</h3>
1176
+ <p>YOLO uses a composite loss function combining localization, confidence, and classification errors.</p>
1177
+ <div class="formula">
1178
+ L = λ_coord Σ(Localization Loss) + Σ(Confidence Loss) + Σ(Classification Loss)
1179
+ </div>
1180
  `
1181
  },
1182
  "transformers": {
 
1918
  `,
1919
  applications: `
1920
  <div class="info-box">
1921
+ <div class="box-title">📸 Real-World CV</div>
1922
  <div class="box-content">
1923
+ Face ID, medical imaging (MRI/CT), autonomous drone navigation, manufacturing defect detection, and satellite imagery analysis
1924
  </div>
1925
  </div>
1926
+ `,
1927
+ math: `
1928
+ <h3>The Parameter Explosion Problem</h3>
1929
+ <p>Why do standard Neural Networks fail on images? Let's calculate the parameters for a small image.</p>
1930
+
1931
+ <div class="callout insight">
1932
+ <div class="callout-title">📝 Paper & Pain: MLP vs Images</div>
1933
+ 1. **Input:** 224 × 224 pixels with 3 color channels (RGB)<br>
1934
+ 2. **Input Size:** 224 × 224 × 3 = <strong>150,528 features</strong><br>
1935
+ 3. **Hidden Layer:** Suppose we want just 1000 neurons.<br>
1936
+ 4. **Matrix size:** [1000, 150528]<br>
1937
+ 5. **Total Weights:** 1000 × 150528 ≈ <strong>150 Million parameters</strong> for just ONE layer!
1938
+ </div>
1939
+
1940
+ <h3>The CNN Solution: Weight Sharing</h3>
1941
+ <p>Instead of every neuron looking at every pixel, we use <strong>translation invariance</strong>. If an edge detector works in the top-left, it should work in the bottom-right.</p>
1942
+
1943
+ <div class="formula">
1944
+ Total Params = (Kernel_H × Kernel_W × Input_Channels) × Num_Filters<br>
1945
+ <br>
1946
+ For a 3x3 filter: (3 × 3 × 3) × 64 = <strong>1,728 parameters</strong><br>
1947
+ Reduction: 150M / 1.7k ≈ <strong>86,000× more efficient!</strong>
1948
+ </div>
1949
  `
1950
  },
1951
  "pooling": {
 
2005
  Used after conv layers in AlexNet, VGG, and most classic CNNs to progressively reduce spatial dimensions
2006
  </div>
2007
  </div>
2008
+ `,
2009
+ math: `
2010
+ <h3>Max Pooling: Winning Signal Selection</h3>
2011
+ <p>Pooling operations are non-parametric (no weights). They simply select or average values within a local window.</p>
2012
+
2013
+ <div class="callout insight">
2014
+ <div class="callout-title">📝 Paper & Pain: 2x2 Max Pooling</div>
2015
+ **Input (4x4):**<br>
2016
+ [1 3 | 2 1]<br>
2017
+ [5 1 | 0 2]<br>
2018
+ -----------<br>
2019
+ [1 1 | 8 2]<br>
2020
+ [0 2 | 4 1]<br>
2021
+ <br>
2022
+ **Output (2x2):**<br>
2023
+ Step 1: max(1, 3, 5, 1) = <strong>5</strong><br>
2024
+ Step 2: max(2, 1, 0, 2) = <strong>2</strong><br>
2025
+ Step 3: max(1, 1, 0, 2) = <strong>2</strong><br>
2026
+ Step 4: max(8, 2, 4, 1) = <strong>8</strong><br>
2027
+ **Final:** [5 2] / [2 8]
2028
+ </div>
2029
+
2030
+ <h3>Backprop through Pooling</h3>
2031
+ <div class="list-item">
2032
+ <div class="list-num">💡</div>
2033
+ <div><strong>Max Pooling:</strong> Gradient is routed ONLY to the neuron that had the maximum value. All others get 0.</div>
2034
+ </div>
2035
+ <div class="list-item">
2036
+ <div class="list-num">💡</div>
2037
+ <div><strong>Average Pooling:</strong> Gradient is distributed evenly among all neurons in the window.</div>
2038
+ </div>
2039
  `
2040
  },
2041
  "cnn-basics": {
 
2242
  Sparked AI renaissance - Google, Facebook, Microsoft pivoted to deep learning after AlexNet
2243
  </div>
2244
  </div>
2245
+ `,
2246
+ math: `
2247
+ <h3>Paper & Pain: Parameter Counting</h3>
2248
+ <p>Understanding AlexNet's 60M parameters:</p>
2249
+ <div class="list-item">
2250
+ <div class="list-num">01</div>
2251
+ <div><strong>Conv Layers:</strong> Only ~2.3 Million parameters. They do most of the work with small memory!</div>
2252
+ </div>
2253
+ <div class="list-item">
2254
+ <div class="list-num">02</div>
2255
+ <div><strong>FC Layers:</strong> Over **58 Million parameters**. The first FC layer (FC6) takes 4096 * (6*6*256) ≈ 37M params!</div>
2256
+ </div>
2257
+ <div class="callout warning">
2258
+ <div class="callout-title">⚠️ The Design Flaw</div>
2259
+ FC layers are the memory bottleneck. Modern models (ResNet, Inception) replace these with Global Average Pooling to save 90% parameters.
2260
+ </div>
2261
  `
2262
  },
2263
  "vgg": {
2264
  overview: `
2265
  <h3>VGGNet (2014) - The Power of Depth</h3>
2266
  <p>VGG showed that depth matters - 16-19 layers using only small 3×3 filters.</p>
2267
+ `,
2268
+ concepts: `
2269
+ <h3>Small Filters, Receptive Field</h3>
2270
+ <div class="list-item">
2271
+ <div class="list-num">01</div>
2272
+ <div><strong>Uniformity:</strong> Uses 3×3 filters everywhere with stride 1, padding 1.</div>
2273
+ </div>
2274
+ <div class="list-item">
2275
+ <div class="list-num">02</div>
2276
+ <div><strong>Pooling Pattern:</strong> 2×2 max pooling after every 2-3 conv layers.</div>
2277
+ </div>
2278
+ `,
2279
+ math: `
2280
+ <h3>The 5×5 vs 3×3+3×3 Equivalence</h3>
2281
+ <p>Why stack 3x3 filters instead of one large filter?</p>
2282
+ <div class="callout insight">
2283
+ <div class="callout-title">📝 Paper & Pain: Paramount Efficiency</div>
2284
+ 1. **Receptive Field:** Two 3x3 layers cover 5x5 area. Three 3x3 layers cover 7x7 area.<br>
2285
+ 2. **Param Count (C filters):**<br>
2286
+ • One 7x7 layer: 7² × C² = 49C² parameters.<br>
2287
+ • Three 3x3 layers: 3 × (3² × C²) = 27C² parameters.<br>
2288
+ **Result:** 45% reduction in weights for the SAME "view" of the image!
2289
+ </div>
2290
+ `,
2291
+ applications: `
2292
+ <div class="info-box">
2293
+ <div class="box-title">🖼️ Feature Backbone</div>
2294
+ VGG is the preferred architectural backbone for Neural Style Transfer and early GANs due to its simple, clean feature extraction properties.
2295
  </div>
2296
  `
2297
  },
 
2319
  • Won ImageNet 2015<br>
2320
  • Skip connections now used everywhere (U-Net, Transformers, etc.)
2321
  </div>
2322
+ `,
2323
+ concepts: `
2324
+ <h3>Identity & Projection Shortcuts</h3>
2325
+ <div class="list-item">
2326
+ <div class="list-num">01</div>
2327
+ <div><strong>Identity Shortcut:</strong> Used when dimensions match. y = F(x, {W}) + x</div>
2328
+ </div>
2329
+ <div class="list-item">
2330
+ <div class="list-num">02</div>
2331
+ <div><strong>Projection Shortcut (1×1 Conv):</strong> Used when dimensions change. y = F(x, {W}) + W_s x</div>
2332
+ </div>
2333
+ `,
2334
+ math: `
2335
+ <h3>The Vanishing Gradient Solution</h3>
2336
+ <p>Why do skip connections help? Let's differentiate the output y = F(x) + x:</p>
2337
+ <div class="formula">
2338
+ ∂y/∂x = ∂F/∂x + 1
2339
+ </div>
2340
+ <div class="callout insight">
2341
+ <div class="callout-title">📝 Paper & Pain: Gradient Flow</div>
2342
+ The "+1" term acts as a **gradient highway**. Even if the weights in F(x) are small (causing ∂F/∂x → 0), the gradient can still flow through the +1 term. <br>
2343
+ This prevents the gradient from vanishing even in networks with 1000+ layers!
2344
+ </div>
2345
+ `,
2346
+ applications: `
2347
+ <div class="info-box">
2348
+ <div class="box-title">🏗️ Modern Vision Backbones</div>
2349
+ <div class="box-content">ResNet is the default starting point for nearly all computer vision tasks today (Mask R-CNN, YOLO, etc.).</div>
2350
+ </div>
2351
  `
2352
  },
2353
  "inception": {
 
2359
  <div class="formula">
2360
  Input → [1×1 conv] ⊕ [3×3 conv] ⊕ [5×5 conv] ⊕ [3×3 pool] → Concatenate
2361
  </div>
2362
+ `,
2363
+ concepts: `
2364
+ <h3>Core Innovations</h3>
2365
+ <div class="list-item">
2366
+ <div class="list-num">01</div>
2367
+ <div><strong>1×1 Bottlenecks:</strong> Dimensionality reduction before expensive convolutions.</div>
2368
+ </div>
2369
+ <div class="list-item">
2370
+ <div class="list-num">02</div>
2371
+ <div><strong>Auxiliary Classifiers:</strong> Used during training to combat gradient vanishing in middle layers.</div>
2372
+ </div>
2373
+ `,
2374
+ math: `
2375
+ <h3>1×1 Convolution Math (Network-in-Network)</h3>
2376
+ <p>A 1×1 convolution acts like a channel-wise MLP. It maps input channels C to output channels C' using 1×1×C parameters per filter.</p>
2377
  <div class="callout insight">
2378
+ <div class="callout-title">📝 Paper & Pain: Compression</div>
2379
+ Input: 28x28x256 | Target: 28x28x512 with 3x3 Filters.<br>
2380
+ **Direct:** 512 * (3*3*256) ≈ 1.1 Million params.<br>
2381
+ **Inception (1x1 bottleneck to 64):**<br>
2382
+ Step 1 (1x1): 64 * (1*1*256) = 16k params.<br>
2383
+ Step 2 (3x3): 512 * (3*3*64) = 294k params.<br>
2384
+ **Total:** 310k params. **~3.5× reduction in parameters!**
2385
+ </div>
2386
+ `,
2387
+ applications: `
2388
+ <div class="info-box">
2389
+ <div class="box-title">🏎️ Computational Efficiency</div>
2390
+ Inception designs are optimized for running deep networks on limited compute budgets.
2391
  </div>
2392
  `
2393
  },
 
2427
  • Latency-critical systems<br>
2428
  • Good accuracy with 10-20× speedup
2429
  </div>
2430
+ `,
2431
+ concepts: `
2432
+ <h3>Efficiency Factors</h3>
2433
+ <div class="list-item">
2434
+ <div class="list-num">01</div>
2435
+ <div><strong>Width Multiplier (α):</strong> Thins the network by reducing channels.</div>
2436
+ </div>
2437
+ <div class="list-item">
2438
+ <div class="list-num">02</div>
2439
+ <div><strong>Resolution Multiplier (ρ):</strong> Reduces input image size.</div>
2440
+ </div>
2441
+ `,
2442
+ math: `
2443
+ <h3>Depthwise Separable Math</h3>
2444
+ <p>Standard convolution complexity: F² × C_in × C_out × H × W</p>
2445
+ <p>Separable complexity: (F² × C_in + C_in × C_out) × H × W</p>
2446
+ <div class="callout insight">
2447
+ <div class="callout-title">📝 Paper & Pain: The 9× Speedup</div>
2448
+ Reduction ratio is roughly: 1/C_out + 1/F². <br>
2449
+ For 3x3 filters (F=3): Reduction is roughly **1/9th** the computation of standard conv!
2450
+ </div>
2451
+ `,
2452
+ applications: `
2453
+ <div class="info-box">
2454
+ <div class="box-title">📱 Edge Devices</div>
2455
+ <div class="box-content">Real-time object detection on smartphones, web browsers (TensorFlow.js), and IoT devices.</div>
2456
+ </div>
2457
  `
2458
  },
2459
  "transfer-learning": {