Aashish34 commited on
Commit
6b48067
Β·
1 Parent(s): 6d6a6aa

add deeplearning

Browse files
Files changed (2) hide show
  1. DeepLearning/Deep Learning Curriculum.html +1649 -95
  2. README.md +18 -0
DeepLearning/Deep Learning Curriculum.html CHANGED
@@ -880,6 +880,52 @@
880
  β€’ Try <strong>Leaky ReLU</strong> or <strong>ELU</strong> if ReLU neurons are dying<br>
881
  β€’ Avoid Sigmoid/Tanh in deep networks (gradient vanishing)
882
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
883
  `
884
  },
885
  "conv-layer": {
@@ -1214,116 +1260,1624 @@
1214
  <strong>CodeGen, AlphaCode:</strong> Automated coding, bug detection
1215
  </div>
1216
  </div>
1217
- `
1218
- }
1219
- };
1220
-
1221
- function createModuleHTML(module) {
1222
- const content = MODULE_CONTENT[module.id] || {};
 
 
1223
 
1224
- return `
1225
- <div class="module" id="${module.id}-module">
1226
- <button class="btn-back" onclick="switchTo('dashboard')">← Back to Dashboard</button>
1227
- <header>
1228
- <h1>${module.icon} ${module.title}</h1>
1229
- <p class="subtitle">${module.description}</p>
1230
- </header>
 
 
 
 
 
 
 
 
 
 
1231
 
1232
- <div class="tabs">
1233
- <button class="tab-btn active" onclick="switchTab(event, '${module.id}-overview')">Overview</button>
1234
- <button class="tab-btn" onclick="switchTab(event, '${module.id}-concepts')">Key Concepts</button>
1235
- <button class="tab-btn" onclick="switchTab(event, '${module.id}-visualization')">πŸ“Š Visualization</button>
1236
- <button class="tab-btn" onclick="switchTab(event, '${module.id}-math')">Math</button>
1237
- <button class="tab-btn" onclick="switchTab(event, '${module.id}-applications')">Applications</button>
1238
- <button class="tab-btn" onclick="switchTab(event, '${module.id}-summary')">Summary</button>
1239
  </div>
1240
 
1241
- <div id="${module.id}-overview" class="tab active">
1242
- <div class="section">
1243
- <h2>πŸ“– Overview</h2>
1244
- ${content.overview || `
1245
- <p>Complete coverage of ${module.title.toLowerCase()}. Learn the fundamentals, mathematics, real-world applications, and implementation details.</p>
1246
- <div class="info-box">
1247
- <div class="box-title">Learning Objectives</div>
1248
- <div class="box-content">
1249
- βœ“ Understand core concepts and theory<br>
1250
- βœ“ Master mathematical foundations<br>
1251
- βœ“ Learn practical applications<br>
1252
- βœ“ Implement and experiment
1253
- </div>
1254
- </div>
1255
- `}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1256
  </div>
1257
  </div>
1258
-
1259
- <div id="${module.id}-concepts" class="tab">
1260
- <div class="section">
1261
- <h2>🎯 Key Concepts</h2>
1262
- ${content.concepts || `
1263
- <p>Fundamental concepts and building blocks for ${module.title.toLowerCase()}.</p>
1264
- <div class="callout insight">
1265
- <div class="callout-title">πŸ’‘ Main Ideas</div>
1266
- This section covers the core ideas you need to understand before diving into mathematics.
1267
- </div>
1268
- `}
1269
  </div>
1270
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1271
 
1272
- <div id="${module.id}-visualization" class="tab">
1273
- <div class="section">
1274
- <h2>πŸ“Š Interactive Visualization</h2>
1275
- <p>Visual representation to help understand ${module.title.toLowerCase()} concepts intuitively.</p>
1276
- <div id="${module.id}-viz" class="viz-container">
1277
- <canvas id="${module.id}-canvas" width="800" height="400" style="border: 1px solid rgba(0, 212, 255, 0.3); border-radius: 8px; background: rgba(0, 212, 255, 0.02);"></canvas>
1278
- </div>
1279
- <div class="viz-controls">
1280
- <button onclick="drawVisualization('${module.id}')" class="btn-viz">πŸ”„ Refresh Visualization</button>
1281
- <button onclick="toggleVizAnimation('${module.id}')" class="btn-viz">▢️ Animate</button>
1282
- <button onclick="downloadViz('${module.id}')" class="btn-viz">⬇️ Save Image</button>
1283
- </div>
1284
- </div>
1285
  </div>
1286
 
1287
- <div id="${module.id}-math" class="tab">
1288
- <div class="section">
1289
- <h2>πŸ“ Mathematical Foundation</h2>
1290
- <p>Rigorous mathematical treatment of ${module.title.toLowerCase()}.</p>
1291
- <div class="formula">
1292
- Mathematical formulas and derivations go here
1293
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1294
  </div>
1295
- <div class="section">
1296
- <h2>πŸ“Š Mathematical Visualization</h2>
1297
- <div id="${module.id}-math-viz" class="viz-container">
1298
- <canvas id="${module.id}-math-canvas" width="800" height="400" style="border: 1px solid rgba(0, 212, 255, 0.3); border-radius: 8px; background: rgba(0, 212, 255, 0.02);"></canvas>
1299
- </div>
1300
- <div class="viz-controls">
1301
- <button onclick="drawMathVisualization('${module.id}')" class="btn-viz">πŸ”„ Visualize Equations</button>
1302
- </div>
1303
  </div>
1304
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
1305
 
1306
- <div id="${module.id}-applications" class="tab">
1307
- <div class="section">
1308
- <h2>🌍 Real-World Applications</h2>
1309
- ${content.applications || `
1310
- <p>How ${module.title.toLowerCase()} is used in practice across different industries.</p>
1311
- <div class="info-box">
1312
- <div class="box-title">Use Cases</div>
1313
- <div class="box-content">
1314
- Common applications and practical examples
1315
- </div>
1316
- </div>
1317
- `}
1318
- </div>
1319
- <div class="section">
1320
- <h2>πŸ“Š Application Scenarios Visualization</h2>
1321
- <div id="${module.id}-app-viz" class="viz-container">
1322
- <canvas id="${module.id}-app-canvas" width="800" height="400" style="border: 1px solid rgba(0, 212, 255, 0.3); border-radius: 8px; background: rgba(0, 212, 255, 0.02);"></canvas>
1323
- </div>
1324
- <div class="viz-controls">
1325
- <button onclick="drawApplicationVisualization('${module.id}')" class="btn-viz">πŸ”„ Show Applications</button>
1326
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1327
  </div>
1328
  </div>
1329
 
 
880
  β€’ Try <strong>Leaky ReLU</strong> or <strong>ELU</strong> if ReLU neurons are dying<br>
881
  β€’ Avoid Sigmoid/Tanh in deep networks (gradient vanishing)
882
  </div>
883
+ `,
884
+ applications: `
885
+ <div class="info-box">
886
+ <div class="box-title">🧠 Neural Network Design</div>
887
+ <div class="box-content">
888
+ Critical choice for every neural network - affects training speed, convergence, and final accuracy
889
+ </div>
890
+ </div>
891
+ <div class="info-box">
892
+ <div class="box-title">🎯 Task-Specific Selection</div>
893
+ <div class="box-content">
894
+ Different tasks need different outputs: Sigmoid for binary, Softmax for multi-class, Linear for regression
895
+ </div>
896
+ </div>
897
+ `,
898
+ math: `
899
+ <h3>Derivatives: The Backprop Fuel</h3>
900
+ <p>Activation functions must be differentiable for backpropagation to work. Let's look at the derivatives on paper:</p>
901
+
902
+ <div class="list-item">
903
+ <div class="list-num">01</div>
904
+ <div><strong>Sigmoid:</strong> Οƒ(z) = 1 / (1 + e⁻ᢻ)<br>
905
+ <strong>Derivative:</strong> Οƒ'(z) = Οƒ(z)(1 - Οƒ(z))<br>
906
+ <span class="formula-caption">Max gradient is 0.25 (at z=0). This is why deep networks vanish!</span></div>
907
+ </div>
908
+
909
+ <div class="list-item">
910
+ <div class="list-num">02</div>
911
+ <div><strong>Tanh:</strong> tanh(z) = (eᢻ - e⁻ᢻ) / (eᢻ + e⁻ᢻ)<br>
912
+ <strong>Derivative:</strong> tanh'(z) = 1 - tanhΒ²(z)<br>
913
+ <span class="formula-caption">Max gradient is 1.0 (at z=0). Better than Sigmoid, but still vanishes.</span></div>
914
+ </div>
915
+
916
+ <div class="list-item">
917
+ <div class="list-num">03</div>
918
+ <div><strong>ReLU:</strong> max(0, z)<br>
919
+ <strong>Derivative:</strong> 1 if z > 0, else 0<br>
920
+ <span class="formula-caption">Gradient is 1.0 for all positive z. No vanishing! But 0 for negative (Dying ReLU).</span></div>
921
+ </div>
922
+
923
+ <div class="callout insight">
924
+ <div class="callout-title">πŸ“ Paper & Pain: The Chain Effect</div>
925
+ Each layer multiplies the gradient by Οƒ'(z). <br>
926
+ For 10 Sigmoid layers: Total gradient β‰ˆ (0.25)¹⁰ β‰ˆ <strong>0.00000095</strong><br>
927
+ This is the mathematical proof of the Vanishing Gradient Problem!
928
+ </div>
929
  `
930
  },
931
  "conv-layer": {
 
1260
  <strong>CodeGen, AlphaCode:</strong> Automated coding, bug detection
1261
  </div>
1262
  </div>
1263
+ `,
1264
+ math: `
1265
+ <h3>Scaled Dot-Product Attention</h3>
1266
+ <p>The "heart" of the Transformer. It computes how much "attention" to pay to different parts of the input sequence.</p>
1267
+
1268
+ <div class="formula" style="font-size: 1.3rem; text-align: center; margin: 20px 0; background: rgba(0, 212, 255, 0.05); padding: 20px; border-radius: 8px;">
1269
+ Attention(Q, K, V) = softmax( (QKα΅€) / √dβ‚– ) V
1270
+ </div>
1271
 
1272
+ <h3>Step-by-Step Derivation</h3>
1273
+ <div class="list-item">
1274
+ <div class="list-num">01</div>
1275
+ <div><strong>Dot Product (QKα΅€):</strong> Compute raw similarity scores between Queries (what we want) and Keys (what we have)</div>
1276
+ </div>
1277
+ <div class="list-item">
1278
+ <div class="list-num">02</div>
1279
+ <div><strong>Scaling (1/√dβ‚–):</strong> Divide by square root of key dimension. <strong>Why?</strong> With high dimensions, dot products grow large, pushing softmax into regions with vanishing gradients. Scaling prevents this.</div>
1280
+ </div>
1281
+ <div class="list-item">
1282
+ <div class="list-num">03</div>
1283
+ <div><strong>Softmax:</strong> Convert similarity scores into probabilities (attention weights) that sum to 1</div>
1284
+ </div>
1285
+ <div class="list-item">
1286
+ <div class="list-num">04</div>
1287
+ <div><strong>Weighted Sum (Γ—V):</strong> Use attention weights to pull information from Values.</div>
1288
+ </div>
1289
 
1290
+ <div class="callout insight">
1291
+ <div class="callout-title">πŸ“ Paper & Pain: Multi-Head Attention</div>
1292
+ Instead of one big attention, we split Q, K, V into <em>h</em> heads:<br>
1293
+ 1. Heads learn <strong>different aspects</strong> (e.g., syntax vs semantics)<br>
1294
+ 2. Concat all heads: MultiHead = Concat(head₁, ..., headβ‚•)Wα΄Ό<br>
1295
+ 3. Complexity: <strong>O(nΒ² Β· d)</strong> - This is why long sequences are hard!
 
1296
  </div>
1297
 
1298
+ <div class="callout warning">
1299
+ <div class="callout-title">πŸ“ Sinusoidal Positional Encoding</div>
1300
+ PE(pos, 2i) = sin(pos / 10000^{2i/d})<br>
1301
+ PE(pos, 2i+1) = cos(pos / 10000^{2i/d})<br>
1302
+ This allows the model to learn relative positions since PE(pos+k) is a linear function of PE(pos).
1303
+ </div>
1304
+ `
1305
+ },
1306
+ "perceptron": {
1307
+ overview: `
1308
+ <h3>What is a Perceptron?</h3>
1309
+ <p>The perceptron is the simplest neural network, invented in 1958. It's a binary linear classifier that makes predictions based on weighted inputs.</p>
1310
+
1311
+ <div class="callout tip">
1312
+ <div class="callout-title">βœ… Advantages</div>
1313
+ β€’ Simple and fast<br>
1314
+ β€’ Guaranteed convergence for linearly separable data<br>
1315
+ β€’ Interpretable weights
1316
+ </div>
1317
+
1318
+ <div class="callout warning">
1319
+ <div class="callout-title">⚠️ Key Limitation</div>
1320
+ <strong>Cannot solve XOR:</strong> Limited to linear decision boundaries only
1321
+ </div>
1322
+ `,
1323
+ concepts: `
1324
+ <h3>How Perceptron Works</h3>
1325
+ <div class="list-item">
1326
+ <div class="list-num">01</div>
1327
+ <div><strong>Weighted Sum:</strong> z = w₁x₁ + wβ‚‚xβ‚‚ + ... + b</div>
1328
+ </div>
1329
+ <div class="list-item">
1330
+ <div class="list-num">02</div>
1331
+ <div><strong>Step Function:</strong> Output = 1 if z β‰₯ 0, else 0</div>
1332
+ </div>
1333
+ <div class="formula">
1334
+ Learning Rule: w_new = w_old + Ξ±(y_true - y_pred)x
1335
+ </div>
1336
+ `,
1337
+ applications: `
1338
+ <div class="info-box">
1339
+ <div class="box-title">πŸ“š Educational</div>
1340
+ <div class="box-content">
1341
+ Historical importance - first trainable neural model. Perfect for teaching ML fundamentals
1342
  </div>
1343
  </div>
1344
+ <div class="info-box">
1345
+ <div class="box-title">πŸ”¬ Simple Classification</div>
1346
+ <div class="box-content">
1347
+ Linearly separable problems: basic pattern recognition, simple binary decisions
 
 
 
 
 
 
 
1348
  </div>
1349
  </div>
1350
+ `
1351
+ },
1352
+ "mlp": {
1353
+ overview: `
1354
+ <h3>Multi-Layer Perceptron (MLP)</h3>
1355
+ <p>MLP adds hidden layers between input and output, enabling non-linear decision boundaries and solving the XOR problem that single perceptrons cannot.</p>
1356
+
1357
+ <h3>Why MLPs?</h3>
1358
+ <ul>
1359
+ <li><strong>Universal Approximation:</strong> Can approximate any continuous function</li>
1360
+ <li><strong>Non-Linear Learning:</strong> Solves complex problems</li>
1361
+ <li><strong>Feature Extraction:</strong> Hidden layers learn hierarchical features</li>
1362
+ </ul>
1363
+
1364
+ <div class="callout insight">
1365
+ <div class="callout-title">πŸ’‘ The XOR Breakthrough</div>
1366
+ Single perceptron: Cannot solve XOR<br>
1367
+ MLP with 1 hidden layer (2 neurons): Solves XOR!<br>
1368
+ This proves the power of depth.
1369
+ </div>
1370
+ `,
1371
+ concepts: `
1372
+ <h3>Architecture Components</h3>
1373
+ <div class="list-item">
1374
+ <div class="list-num">01</div>
1375
+ <div><strong>Input Layer:</strong> Raw features (no computation)</div>
1376
+ </div>
1377
+ <div class="list-item">
1378
+ <div class="list-num">02</div>
1379
+ <div><strong>Hidden Layers:</strong> Extract progressively abstract features</div>
1380
+ </div>
1381
+ <div class="list-item">
1382
+ <div class="list-num">03</div>
1383
+ <div><strong>Output Layer:</strong> Final predictions</div>
1384
+ </div>
1385
+ `,
1386
+ applications: `
1387
+ <div class="info-box">
1388
+ <div class="box-title">πŸ“Š Tabular Data</div>
1389
+ <div class="box-content">Credit scoring, fraud detection, customer churn, sales forecasting</div>
1390
+ </div>
1391
+ <div class="info-box">
1392
+ <div class="box-title">🏭 Manufacturing</div>
1393
+ <div class="box-content">Quality control, predictive maintenance, demand forecasting</div>
1394
+ </div>
1395
+ `,
1396
+ math: `
1397
+ <h3>Neural Network Forward Pass (Matrix Form)</h3>
1398
+ <p>Vectorization is key to modern deep learning. We process entire layers as matrix multiplications.</p>
1399
+
1400
+ <div class="formula">
1401
+ Layer 1: z⁽¹⁾ = W⁽¹⁾x + b⁽¹⁾ | a⁽¹⁾ = Οƒ(z⁽¹⁾)<br>
1402
+ Layer 2: z⁽²⁾ = W⁽²⁾a⁽¹⁾ + b⁽²⁾ | a⁽²⁾ = Οƒ(z⁽²⁾)<br>
1403
+ ...<br>
1404
+ Layer L: ŷ = Softmax(W⁽ᴸ⁾a⁽ᴸ⁻¹⁾ + b⁽ᴸ⁾)
1405
+ </div>
1406
 
1407
+ <h3>Paper & Pain: Dimensionality Audit</h3>
1408
+ <p>Understanding tensor shapes is the #1 skill for debugging neural networks.</p>
1409
+ <div class="list-item">
1410
+ <div class="list-num">01</div>
1411
+ <div><strong>Input x:</strong> [n_features, 1]</div>
1412
+ </div>
1413
+ <div class="list-item">
1414
+ <div class="list-num">02</div>
1415
+ <div><strong>Weights W⁽¹⁾:</strong> [n_hidden, n_features]</div>
1416
+ </div>
1417
+ <div class="list-item">
1418
+ <div class="list-num">03</div>
1419
+ <div><strong>Bias b⁽¹⁾:</strong> [n_hidden, 1]</div>
1420
  </div>
1421
 
1422
+ <div class="callout insight">
1423
+ <div class="callout-title">πŸ“ Paper & Pain: Solving XOR</div>
1424
+ Input: [0,1], Target: 1<br>
1425
+ Layer 1 (2 neurons):<br>
1426
+ z₁ = 10x₁ + 10xβ‚‚ - 5 &nbsp; | &nbsp; a₁ = Οƒ(z₁)<br>
1427
+ zβ‚‚ = 10x₁ + 10xβ‚‚ - 15 | &nbsp; aβ‚‚ = Οƒ(zβ‚‚)<br>
1428
+ Layer 2 (1 neuron):<br>
1429
+ y = Οƒ(20a₁ - 20aβ‚‚ - 10)<br>
1430
+ <strong>Try it on paper!</strong> This specific configuration correctly outputs XOR values.
1431
+ </div>
1432
+ `
1433
+ },
1434
+ "weight-init": {
1435
+ overview: `
1436
+ <h3>Weight Initialization Strategies</h3>
1437
+ <table>
1438
+ <tr>
1439
+ <th>Method</th>
1440
+ <th>Best For</th>
1441
+ <th>Formula</th>
1442
+ </tr>
1443
+ <tr>
1444
+ <td>Xavier/Glorot</td>
1445
+ <td>Sigmoid, Tanh</td>
1446
+ <td>N(0, √(2/(n_in+n_out)))</td>
1447
+ </tr>
1448
+ <tr>
1449
+ <td>He/Kaiming</td>
1450
+ <td>ReLU</td>
1451
+ <td>N(0, √(2/n_in))</td>
1452
+ </tr>
1453
+ </table>
1454
+
1455
+ <div class="callout warning">
1456
+ <div class="callout-title">⚠️ Never Initialize to Zero!</div>
1457
+ All neurons learn identical features (symmetry problem)
1458
+ </div>
1459
+ `,
1460
+ concepts: `
1461
+ <h3>Key Principles</h3>
1462
+ <div class="list-item">
1463
+ <div class="list-num">01</div>
1464
+ <div><strong>Variance Preservation:</strong> Keep activation variance similar across layers</div>
1465
+ </div>
1466
+ <div class="list-item">
1467
+ <div class="list-num">02</div>
1468
+ <div><strong>Symmetry Breaking:</strong> Different weights force different features</div>
1469
+ </div>
1470
+ `,
1471
+ applications: `
1472
+ <div class="info-box">
1473
+ <div class="box-title">🎯 Critical for Deep Networks</div>
1474
+ <div class="box-content">
1475
+ Proper initialization is essential for training networks >10 layers. Wrong init = training failure
1476
  </div>
1477
+ </div>
1478
+ <div class="info-box">
1479
+ <div class="box-title">⚑ Faster Convergence</div>
1480
+ <div class="box-content">
1481
+ Good initialization reduces training time by 2-10Γ—, especially with modern optimizers
 
 
 
1482
  </div>
1483
  </div>
1484
+ `,
1485
+ math: `
1486
+ <h3>The Variance Preservation Principle</h3>
1487
+ <p>To prevent gradients from vanishing or exploding, we want the variance of the activations to remain constant across layers.</p>
1488
+
1489
+ <div class="formula">
1490
+ For a linear layer: y = Ξ£ wα΅’xα΅’<br>
1491
+ Var(y) = Var(Ξ£ wα΅’xα΅’) = Ξ£ Var(wα΅’xα΅’)<br>
1492
+ Assuming w and x are independent with mean 0:<br>
1493
+ Var(wα΅’xα΅’) = E[wα΅’Β²]E[xα΅’Β²] - E[wα΅’]Β²E[xα΅’]Β² = Var(wα΅’)Var(xα΅’)<br>
1494
+ So, Var(y) = n_in Γ— Var(w) Γ— Var(x)
1495
+ </div>
1496
 
1497
+ <h3>1. Xavier (Glorot) Initialization</h3>
1498
+ <p>Goal: Var(y) = Var(x) and Var(grad_out) = Var(grad_in)</p>
1499
+ <div class="list-item">
1500
+ <div class="list-num">01</div>
1501
+ <div><strong>Forward Pass:</strong> n_in Γ— Var(w) = 1 β‡’ Var(w) = 1/n_in</div>
1502
+ </div>
1503
+ <div class="list-item">
1504
+ <div class="list-num">02</div>
1505
+ <div><strong>Backward Pass:</strong> n_out Γ— Var(w) = 1 β‡’ Var(w) = 1/n_out</div>
1506
+ </div>
1507
+ <div class="list-item">
1508
+ <div class="list-num">03</div>
1509
+ <div><strong>Compromise:</strong> Var(w) = 2 / (n_in + n_out)</div>
1510
+ </div>
1511
+
1512
+ <h3>2. He (Kaiming) Initialization</h3>
1513
+ <p>For ReLU activation, half the neurons are inactive (output 0), which halves the variance. We must compensate.</p>
1514
+ <div class="formula">
1515
+ Var(ReLU(y)) = 1/2 Γ— Var(y)<br>
1516
+ To keep Var(ReLU(y)) = Var(x):<br>
1517
+ 1/2 Γ— n_in Γ— Var(w) = 1<br>
1518
+ <strong>Var(w) = 2 / n_in</strong>
1519
+ </div>
1520
+
1521
+ <div class="callout insight">
1522
+ <div class="callout-title">πŸ“ Paper & Pain Calculation</div>
1523
+ If n_in = 256 and you use ReLU:<br>
1524
+ Weight Std Dev = √(2/256) = √(1/128) β‰ˆ <strong>0.088</strong><br>
1525
+ Initializing with std=1.0 or std=0.01 would cause immediate failure in a deep net!
1526
+ </div>
1527
+ `
1528
+ },
1529
+ "loss": {
1530
+ overview: `
1531
+ <h3>Loss Functions Guide</h3>
1532
+ <table>
1533
+ <tr>
1534
+ <th>Task</th>
1535
+ <th>Loss Function</th>
1536
+ </tr>
1537
+ <tr>
1538
+ <td>Binary Classification</td>
1539
+ <td>Binary Cross-Entropy</td>
1540
+ </tr>
1541
+ <tr>
1542
+ <td>Multi-class</td>
1543
+ <td>Categorical Cross-Entropy</td>
1544
+ </tr>
1545
+ <tr>
1546
+ <td>Regression</td>
1547
+ <td>MSE or MAE</td>
1548
+ </tr>
1549
+ </table>
1550
+ `,
1551
+ concepts: `
1552
+ <h3>Common Loss Functions</h3>
1553
+ <div class="list-item">
1554
+ <div class="list-num">01</div>
1555
+ <div><strong>MSE:</strong> (1/n)Ξ£(y - Ε·)Β² - Penalizes large errors</div>
1556
+ </div>
1557
+ <div class="list-item">
1558
+ <div class="list-num">02</div>
1559
+ <div><strong>Cross-Entropy:</strong> -Ξ£(yΒ·log(Ε·)) - For classification</div>
1560
+ </div>
1561
+ `,
1562
+ applications: `
1563
+ <div class="info-box">
1564
+ <div class="box-title">🎯 Task-Dependent Selection</div>
1565
+ <div class="box-content">
1566
+ Every ML task needs appropriate loss: classification (cross-entropy), regression (MSE/MAE), ranking (triplet loss)
1567
+ </div>
1568
+ </div>
1569
+ <div class="info-box">
1570
+ <div class="box-title">πŸ“Š Custom Losses</div>
1571
+ <div class="box-content">
1572
+ Business-specific objectives: Focal Loss (imbalanced data), Dice Loss (segmentation), Contrastive Loss (similarity learning)
1573
+ </div>
1574
+ </div>
1575
+ `,
1576
+ math: `
1577
+ <h3>Binary Cross-Entropy (BCE) Derivation</h3>
1578
+ <p>Why do we use logs? BCE is derived from Maximum Likelihood Estimation (MLE) assuming a Bernoulli distribution.</p>
1579
+
1580
+ <div class="formula">
1581
+ L(Ε·, y) = -(y log(Ε·) + (1-y) log(1-Ε·))
1582
+ </div>
1583
+
1584
+ <h3>Paper & Pain: Why not MSE for Classification?</h3>
1585
+ <p>If we use MSE for sigmoid output, the gradient is:</p>
1586
+ <div class="formula">
1587
+ βˆ‚L/βˆ‚w = (Ε· - y) <strong>Οƒ'(z)</strong> x
1588
+ </div>
1589
+ <div class="callout warning">
1590
+ <div class="callout-title">⚠️ The Saturation Problem</div>
1591
+ If the model is very wrong (e.g., target 1, output 0.001), Οƒ'(z) is near 0. <br>
1592
+ The gradient vanishes, and the model <strong>stops learning!</strong>.
1593
+ </div>
1594
+
1595
+ <h3>The BCE Advantage</h3>
1596
+ <p>When using BCE, the Οƒ'(z) term cancels out! The gradient becomes:</p>
1597
+ <div class="formula" style="font-size: 1.2rem; color: #00d4ff;">
1598
+ βˆ‚L/βˆ‚w = (Ε· - y) x
1599
+ </div>
1600
+ <div class="list-item">
1601
+ <div class="list-num">πŸ’‘</div>
1602
+ <div>This is beautiful: the gradient depends <strong>only on the error</strong> (Ε·-y), not on how saturated the neuron is. This enables much faster training.</div>
1603
+ </div>
1604
+ `
1605
+ },
1606
+ "optimizers": {
1607
+ overview: `
1608
+ <h3>Optimizer Selection Guide</h3>
1609
+ <table>
1610
+ <tr>
1611
+ <th>Optimizer</th>
1612
+ <th>When to Use</th>
1613
+ </tr>
1614
+ <tr>
1615
+ <td>Adam/AdamW</td>
1616
+ <td><strong>Default choice</strong> - works 90% of time</td>
1617
+ </tr>
1618
+ <tr>
1619
+ <td>SGD + Momentum</td>
1620
+ <td>CNNs (better final accuracy with patience)</td>
1621
+ </tr>
1622
+ <tr>
1623
+ <td>RMSprop</td>
1624
+ <td>RNNs</td>
1625
+ </tr>
1626
+ </table>
1627
+
1628
+ <div class="formula">
1629
+ Adam: m_t = β₁·m + (1-β₁)Β·βˆ‡L<br>
1630
+ v_t = Ξ²β‚‚Β·v + (1-Ξ²β‚‚)Β·(βˆ‡L)Β²<br>
1631
+ w = w - α·m_t/√(v_t)
1632
+ </div>
1633
+ `,
1634
+ concepts: `
1635
+ <h3>Optimizer Evolution</h3>
1636
+ <div class="list-item">
1637
+ <div class="list-num">01</div>
1638
+ <div><strong>SGD:</strong> Simple but requires careful learning rate tuning</div>
1639
+ </div>
1640
+ <div class="list-item">
1641
+ <div class="list-num">02</div>
1642
+ <div><strong>Adam:</strong> Adaptive rates + momentum = works out-of-box</div>
1643
+ </div>
1644
+ `,
1645
+ applications: `
1646
+ <div class="info-box">
1647
+ <div class="box-title">πŸš€ Training Acceleration</div>
1648
+ <div class="box-content">
1649
+ Modern optimizers (Adam) reduce training time by 5-10Γ— compared to basic SGD
1650
+ </div>
1651
+ </div>
1652
+ <div class="info-box">
1653
+ <div class="box-title">🎯 Architecture-Specific</div>
1654
+ <div class="box-content">
1655
+ CNNs: SGD+Momentum | Transformers: AdamW | RNNs: RMSprop | Default: Adam
1656
+ </div>
1657
+ </div>
1658
+ `
1659
+ },
1660
+ "backprop": {
1661
+ overview: `
1662
+ <h3>Backpropagation Algorithm</h3>
1663
+ <p>Backprop efficiently computes gradients by applying the chain rule from output to input, enabling training of deep networks.</p>
1664
+
1665
+ <h3>Why Backpropagation?</h3>
1666
+ <ul>
1667
+ <li><strong>Efficient:</strong> Computes all gradients in single backward pass</li>
1668
+ <li><strong>Scalable:</strong> Works for networks of any depth</li>
1669
+ <li><strong>Automatic:</strong> Modern frameworks do it automatically</li>
1670
+ </ul>
1671
+ `,
1672
+ concepts: `
1673
+ <div class="formula">
1674
+ Chain Rule:<br>
1675
+ βˆ‚L/βˆ‚w = βˆ‚L/βˆ‚y Γ— βˆ‚y/βˆ‚z Γ— βˆ‚z/βˆ‚w<br>
1676
+ <br>
1677
+ For layer l:<br>
1678
+ Ξ΄Λ‘ = (W^(l+1))^T Ξ΄^(l+1) βŠ™ Οƒ'(z^l)<br>
1679
+ βˆ‚L/βˆ‚W^l = Ξ΄^l (a^(l-1))^T
1680
+ </div>
1681
+ `,
1682
+ applications: `
1683
+ <div class="info-box">
1684
+ <div class="box-title">🧠 Universal Training Method</div>
1685
+ <div class="box-content">
1686
+ Every modern neural network uses backprop - from CNNs to Transformers to GANs
1687
+ </div>
1688
+ </div>
1689
+ <div class="info-box">
1690
+ <div class="box-title">πŸ”§ Automatic Differentiation</div>
1691
+ <div class="box-content">
1692
+ PyTorch, TensorFlow implement automatic backprop - you define forward pass, framework does backward
1693
+ </div>
1694
+ </div>
1695
+ `,
1696
+ math: `
1697
+ <h3>The 4 Fundamental Equations of Backprop</h3>
1698
+ <p>Backpropagation is essentially the chain rule applied iteratively. We define the error signal Ξ΄ = βˆ‚L/βˆ‚z.</p>
1699
+
1700
+ <div class="list-item">
1701
+ <div class="list-num">01</div>
1702
+ <div><strong>Error at Output Layer (L):</strong><br>
1703
+ Ξ΄α΄Έ = βˆ‡β‚L βŠ™ Οƒ'(zα΄Έ)<br>
1704
+ <span class="formula-caption">Example for MSE: (aα΄Έ - y) βŠ™ Οƒ'(zα΄Έ)</span></div>
1705
+ </div>
1706
+
1707
+ <div class="list-item">
1708
+ <div class="list-num">02</div>
1709
+ <div><strong>Error at Layer l (Backwards):</strong><br>
1710
+ Ξ΄Λ‘ = ((Wˑ⁺¹)α΅€ δˑ⁺¹) βŠ™ Οƒ'(zΛ‘)</div>
1711
+ </div>
1712
+
1713
+ <div class="list-item">
1714
+ <div class="list-num">03</div>
1715
+ <div><strong>Gradient w.r.t Bias:</strong><br>
1716
+ βˆ‚L / βˆ‚bΛ‘ = Ξ΄Λ‘</div>
1717
+ </div>
1718
+
1719
+ <div class="list-item">
1720
+ <div class="list-num">04</div>
1721
+ <div><strong>Gradient w.r.t Weights:</strong><br>
1722
+ βˆ‚L / βˆ‚WΛ‘ = Ξ΄Λ‘ (aˑ⁻¹)α΅€</div>
1723
+ </div>
1724
+
1725
+ <div class="callout insight">
1726
+ <div class="callout-title">πŸ“ Paper & Pain Walkthrough</div>
1727
+ Suppose single neuron: z = wx + b, Loss L = (Οƒ(z) - y)Β²/2<br>
1728
+ 1. <strong>Forward:</strong> z=2, a=Οƒ(2)β‰ˆ0.88, y=1, L=0.007<br>
1729
+ 2. <strong>Backward:</strong><br>
1730
+ &nbsp;&nbsp;&nbsp;βˆ‚L/βˆ‚a = (a-y) = -0.12<br>
1731
+ &nbsp;&nbsp;&nbsp;βˆ‚a/βˆ‚z = Οƒ(z)(1-Οƒ(z)) = 0.88 * 0.12 = 0.1056<br>
1732
+ &nbsp;&nbsp;&nbsp;Ξ΄ = βˆ‚L/βˆ‚z = -0.12 * 0.1056 = -0.01267<br>
1733
+ &nbsp;&nbsp;&nbsp;<strong>βˆ‚L/βˆ‚w = Ξ΄ * x</strong> | <strong>βˆ‚L/βˆ‚b = Ξ΄</strong>
1734
+ </div>
1735
+ `
1736
+ },
1737
+ "regularization": {
1738
+ overview: `
1739
+ <h3>Regularization Techniques</h3>
1740
+ <table>
1741
+ <tr>
1742
+ <th>Method</th>
1743
+ <th>How It Works</th>
1744
+ <th>When to Use</th>
1745
+ </tr>
1746
+ <tr>
1747
+ <td>L2 (Ridge)</td>
1748
+ <td>Adds λΣw² to loss</td>
1749
+ <td>Keeps all features, reduces magnitude</td>
1750
+ </tr>
1751
+ <tr>
1752
+ <td>L1 (Lasso)</td>
1753
+ <td>Adds λΣ|w| to loss</td>
1754
+ <td>Feature selection (zeros out weights)</td>
1755
+ </tr>
1756
+ <tr>
1757
+ <td>Dropout</td>
1758
+ <td>Randomly drops neurons (p=0.5 typical)</td>
1759
+ <td><strong>Most effective for deep networks</strong></td>
1760
+ </tr>
1761
+ <tr>
1762
+ <td>Early Stopping</td>
1763
+ <td>Stop when validation loss increases</td>
1764
+ <td>Prevents overfitting during training</td>
1765
+ </tr>
1766
+ <tr>
1767
+ <td>Data Augmentation</td>
1768
+ <td>Artificially expand dataset</td>
1769
+ <td>Computer vision (rotations, flips, crops)</td>
1770
+ </tr>
1771
+ </table>
1772
+ `,
1773
+ applications: `
1774
+ <div class="info-box">
1775
+ <div class="box-title">🎯 Best Practices</div>
1776
+ <div class="box-content">
1777
+ β€’ Start with Dropout (0.5) for hidden layers<br>
1778
+ β€’ Add L2 if still overfitting (Ξ»=0.01, 0.001)<br>
1779
+ β€’ Always use Early Stopping<br>
1780
+ β€’ Data Augmentation for images
1781
+ </div>
1782
+ </div>
1783
+ `
1784
+ },
1785
+ "batch-norm": {
1786
+ overview: `
1787
+ <h3>Batch Normalization</h3>
1788
+ <p>Normalizes layer inputs to have mean=0 and variance=1, stabilizing and accelerating training.</p>
1789
+
1790
+ <div class="callout tip">
1791
+ <div class="callout-title">βœ… Benefits</div>
1792
+ β€’ <strong>Faster Training:</strong> Allows higher learning rates<br>
1793
+ β€’ <strong>Reduces Vanishing Gradients:</strong> Better gradient flow<br>
1794
+ β€’ <strong>Regularization Effect:</strong> Adds slight noise<br>
1795
+ β€’ <strong>Less Sensitive to Init:</strong> Reduces initialization impact
1796
+ </div>
1797
+ `,
1798
+ math: `
1799
+ <h3>The 4 Steps of Batch Normalization</h3>
1800
+ <p>Calculated per mini-batch B = {x₁, ..., xβ‚˜}:</p>
1801
+
1802
+ <div class="list-item">
1803
+ <div class="list-num">01</div>
1804
+ <div><strong>Mini-Batch Mean:</strong> ΞΌ_B = (1/m) Ξ£ xα΅’</div>
1805
+ </div>
1806
+ <div class="list-item">
1807
+ <div class="list-num">02</div>
1808
+ <div><strong>Mini-Batch Variance:</strong> σ²_B = (1/m) Ξ£ (xα΅’ - ΞΌ_B)Β²</div>
1809
+ </div>
1810
+ <div class="list-item">
1811
+ <div class="list-num">03</div>
1812
+ <div><strong>Normalize:</strong> xΜ‚α΅’ = (xα΅’ - ΞΌ_B) / √(σ²_B + Ξ΅)</div>
1813
+ </div>
1814
+ <div class="list-item">
1815
+ <div class="list-num">04</div>
1816
+ <div><strong>Scale and Shift:</strong> yα΅’ = Ξ³ xΜ‚α΅’ + Ξ²</div>
1817
+ </div>
1818
+
1819
+ <div class="callout insight">
1820
+ <div class="callout-title">πŸ“ Paper & Pain: Why Ξ³ and Ξ²?</div>
1821
+ If we only normalized to (0,1), we might restrict the representation power of the network. <br>
1822
+ Ξ³ and Ξ² allow the network to <strong>undo</strong> the normalization if that's optimal: <br>
1823
+ If Ξ³ = √(σ²) and Ξ² = ΞΌ, we get the original data back!
1824
+ </div>
1825
+ `
1826
+ },
1827
+ "cv-intro": {
1828
+ overview: `
1829
+ <h3>Why Computer Vision Needs Special Architectures</h3>
1830
+ <p><strong>Problem:</strong> Images have huge dimensionality</p>
1831
+ <ul>
1832
+ <li>224Γ—224 RGB image = 150,528 input features</li>
1833
+ <li>Fully connected layer with 1000 neurons = 150M parameters!</li>
1834
+ <li>Result: Overfitting, slow training, memory issues</li>
1835
+ </ul>
1836
+
1837
+ <h3>Solution: Convolutional Neural Networks</h3>
1838
+ <ul>
1839
+ <li><strong>Weight Sharing:</strong> Same filter applied everywhere (1000x fewer parameters)</li>
1840
+ <li><strong>Local Connectivity:</strong> Neurons see small patches</li>
1841
+ <li><strong>Translation Invariance:</strong> Detect cat anywhere in image</li>
1842
+ </ul>
1843
+ `,
1844
+ concepts: `
1845
+ <h3>Why CNNs Beat Fully Connected</h3>
1846
+ <div class="list-item">
1847
+ <div class="list-num">01</div>
1848
+ <div><strong>Parameter Efficiency:</strong> 1000Γ— fewer parameters through weight sharing</div>
1849
+ </div>
1850
+ <div class="list-item">
1851
+ <div class="list-num">02</div>
1852
+ <div><strong>Translation Equivariance:</strong> Same object β†’ same activation regardless of position</div>
1853
+ </div>
1854
+ `,
1855
+ applications: `
1856
+ <div class="info-box">
1857
+ <div class="box-title">πŸ“Έ All Computer Vision Tasks</div>
1858
+ <div class="box-content">
1859
+ Image classification, object detection, segmentation, face recognition, OCR, medical imaging
1860
+ </div>
1861
+ </div>
1862
+ `
1863
+ },
1864
+ "pooling": {
1865
+ overview: `
1866
+ <h3>Pooling Layers</h3>
1867
+ <p>Pooling reduces spatial dimensions while retaining important information.</p>
1868
+
1869
+ <table>
1870
+ <tr>
1871
+ <th>Type</th>
1872
+ <th>Operation</th>
1873
+ <th>Use Case</th>
1874
+ </tr>
1875
+ <tr>
1876
+ <td>Max Pooling</td>
1877
+ <td>Take maximum value</td>
1878
+ <td><strong>Most common</strong> - preserves strong activations</td>
1879
+ </tr>
1880
+ <tr>
1881
+ <td>Average Pooling</td>
1882
+ <td>Take average</td>
1883
+ <td>Smoother, less common (used in final layers)</td>
1884
+ </tr>
1885
+ <tr>
1886
+ <td>Global Pooling</td>
1887
+ <td>Pool entire feature map</td>
1888
+ <td>Replace FC layers (reduces parameters)</td>
1889
+ </tr>
1890
+ </table>
1891
+
1892
+ <div class="callout tip">
1893
+ <div class="callout-title">βœ… Benefits</div>
1894
+ β€’ Reduces spatial size (faster computation)<br>
1895
+ β€’ Adds translation invariance<br>
1896
+ β€’ Prevents overfitting<br>
1897
+ β€’ Typical: 2Γ—2 window, stride 2 (halves dimensions)
1898
+ </div>
1899
+ `,
1900
+ concepts: `
1901
+ <h3>Pooling Mechanics</h3>
1902
+ <div class="list-item">
1903
+ <div class="list-num">01</div>
1904
+ <div><strong>Downsampling:</strong> Reduces HΓ—W by pooling factor (typically 2Γ—)</div>
1905
+ </div>
1906
+ <div class="list-item">
1907
+ <div class="list-num">02</div>
1908
+ <div><strong>No Learnable Parameters:</strong> Fixed operation (max/average)</div>
1909
+ </div>
1910
+ <div class="formula">
1911
+ Example: 4Γ—4 input β†’ 2Γ—2 max pooling β†’ 2Γ—2 output
1912
+ </div>
1913
+ `,
1914
+ applications: `
1915
+ <div class="info-box">
1916
+ <div class="box-title">🎯 Standard CNN Component</div>
1917
+ <div class="box-content">
1918
+ Used after conv layers in AlexNet, VGG, and most classic CNNs to progressively reduce spatial dimensions
1919
+ </div>
1920
+ </div>
1921
+ `
1922
+ },
1923
+ "cnn-basics": {
1924
+ overview: `
1925
+ <h3>CNN Architecture Pattern</h3>
1926
+ <div class="formula">
1927
+ Input β†’ [Conv β†’ ReLU β†’ Pool] Γ— N β†’ Flatten β†’ FC β†’ Softmax
1928
+ </div>
1929
+
1930
+ <h3>Typical Layering Strategy</h3>
1931
+ <ul>
1932
+ <li><strong>Early Layers:</strong> Detect low-level features (edges, textures) - small filters (3Γ—3)</li>
1933
+ <li><strong>Middle Layers:</strong> Combine into patterns, parts - more filters, same size</li>
1934
+ <li><strong>Deep Layers:</strong> High-level concepts (faces, objects) - many filters</li>
1935
+ <li><strong>Final FC Layers:</strong> Classification based on learned features</li>
1936
+ </ul>
1937
+
1938
+ <div class="callout insight">
1939
+ <div class="callout-title">πŸ’‘ Filter Progression</div>
1940
+ Layer 1: 32 filters (edges)<br>
1941
+ Layer 2: 64 filters (textures)<br>
1942
+ Layer 3: 128 filters (patterns)<br>
1943
+ Layer 4: 256 filters (parts)<br>
1944
+ Common pattern: double filters after each pooling
1945
+ </div>
1946
+ `,
1947
+ concepts: `
1948
+ <h3>Module Design Principles</h3>
1949
+ <div class="list-item">
1950
+ <div class="list-num">01</div>
1951
+ <div><strong>Spatial Reduction:</strong> Progressively downsample (224β†’112β†’56β†’28...)</div>
1952
+ </div>
1953
+ <div class="list-item">
1954
+ <div class="list-num">02</div>
1955
+ <div><strong>Channel Expansion:</strong> Increase filters as spatial dims decrease</div>
1956
+ </div>
1957
+ `,
1958
+ applications: `
1959
+ <div class="info-box">
1960
+ <div class="box-title">🎯 All Modern Vision Models</div>
1961
+ <div class="box-content">
1962
+ This pattern forms the backbone of ResNet, MobileNet, EfficientNet - fundamental CNN design
1963
+ </div>
1964
+ </div>
1965
+ `,
1966
+ math: `
1967
+ <h3>1. The Golden Formula for Output Size</h3>
1968
+ <p>Given Input (W), Filter Size (F), Padding (P), and Stride (S):</p>
1969
+ <div class="formula" style="font-size: 1.2rem; text-align: center; margin: 20px 0;">
1970
+ Output Size = ⌊(W - F + 2P) / SβŒ‹ + 1
1971
+ </div>
1972
+
1973
+ <h3>2. Parameter Count Calculation</h3>
1974
+ <div class="list-item">
1975
+ <div class="list-num">01</div>
1976
+ <div><strong>Parameters PER Filter:</strong> (F Γ— F Γ— C_in) + 1 (bias)</div>
1977
+ </div>
1978
+ <div class="list-item">
1979
+ <div class="list-num">02</div>
1980
+ <div><strong>Total Parameters:</strong> N_filters Γ— ((F Γ— F Γ— C_in) + 1)</div>
1981
+ </div>
1982
+
1983
+ <div class="callout insight">
1984
+ <div class="callout-title">πŸ“ Paper & Pain Calculation</div>
1985
+ <strong>Input:</strong> 224x224x3 | <strong>Layer:</strong> 64 filters of 3x3 | <strong>Stride:</strong> 1 | <strong>Padding:</strong> 1<br>
1986
+ 1. <strong>Output Size:</strong> (224 - 3 + 2(1))/1 + 1 = 224 (Same Padding)<br>
1987
+ 2. <strong>Params:</strong> 64 * (3 * 3 * 3 + 1) = 64 * 28 = <strong>1,792 parameters</strong><br>
1988
+ 3. <strong>FLOPs:</strong> 224 * 224 * 1792 β‰ˆ <strong>90 Million operations</strong> per image!
1989
+ </div>
1990
+ `
1991
+ },
1992
+ "viz-filters": {
1993
+ overview: `
1994
+ <h3>What CNNs Learn</h3>
1995
+ <p>CNN filters automatically learn hierarchical visual features:</p>
1996
+
1997
+ <h3>Layer-by-Layer Visualization</h3>
1998
+ <div class="list-item">
1999
+ <div class="list-num">01</div>
2000
+ <div><strong>Layer 1:</strong> Edges and colors (horizontal, vertical, diagonal lines)</div>
2001
+ </div>
2002
+ <div class="list-item">
2003
+ <div class="list-num">02</div>
2004
+ <div><strong>Layer 2:</strong> Textures and patterns (corners, curves, simple shapes)</div>
2005
+ </div>
2006
+ <div class="list-item">
2007
+ <div class="list-num">03</div>
2008
+ <div><strong>Layer 3:</strong> Object parts (eyes, wheels, windows)</div>
2009
+ </div>
2010
+ <div class="list-item">
2011
+ <div class="list-num">04</div>
2012
+ <div><strong>Layer 4-5:</strong> Whole objects (faces, cars, animals)</div>
2013
+ </div>
2014
+ `,
2015
+ concepts: `
2016
+ <h3>Visualization Techniques</h3>
2017
+ <div class="list-item">
2018
+ <div class="list-num">01</div>
2019
+ <div><strong>Activation Maximization:</strong> Find input that maximizes filter response</div>
2020
+ </div>
2021
+ <div class="list-item">
2022
+ <div class="list-num">02</div>
2023
+ <div><strong>Grad-CAM:</strong> Highlight important regions for predictions</div>
2024
+ </div>
2025
+ `,
2026
+ applications: `
2027
+ <div class="info-box">
2028
+ <div class="box-title">πŸ” Model Interpretability</div>
2029
+ <div class="box-content">
2030
+ Understanding what CNNs learn helps debug failures, build trust, and improve architecture design
2031
+ </div>
2032
+ </div>
2033
+ <div class="info-box">
2034
+ <div class="box-title">🎨 Art & Style Transfer</div>
2035
+ <div class="box-content">
2036
+ Filter visualizations inspired neural style transfer (VGG features)
2037
+ </div>
2038
+ </div>
2039
+ `
2040
+ },
2041
+ "lenet": {
2042
+ overview: `
2043
+ <h3>LeNet-5 (1998) - The Pioneer</h3>
2044
+ <p>First successful CNN for digit recognition (MNIST). Introduced the Conv β†’ Pool β†’ Conv β†’ Pool pattern still used today.</p>
2045
+
2046
+ <h3>Architecture</h3>
2047
+ <div class="formula">
2048
+ Input 32Γ—32 β†’ Conv(6 filters, 5Γ—5) β†’ AvgPool β†’ Conv(16 filters, 5Γ—5) β†’ AvgPool β†’ FC(120) β†’ FC(84)β†’ FC(10)
2049
+ </div>
2050
+
2051
+ <div class="callout insight">
2052
+ <div class="callout-title">πŸ† Historical Impact</div>
2053
+ β€’ Used by US Postal Service for zip code recognition<br>
2054
+ β€’ Proved CNNs work for real-world tasks<br>
2055
+ β€’ Template for modern architectures
2056
+ </div>
2057
+ `,
2058
+ concepts: `
2059
+ <h3>Key Innovations</h3>
2060
+ <div class="list-item">
2061
+ <div class="list-num">01</div>
2062
+ <div><strong>Layered Architecture:</strong> Hierarchical feature extraction</div>
2063
+ </div>
2064
+ <div class="list-item">
2065
+ <div class="list-num">02</div>
2066
+ <div><strong>Shared Weights:</strong> Convolutional parameter sharing</div>
2067
+ </div>
2068
+ `,
2069
+ applications: `
2070
+ <div class="info-box">
2071
+ <div class="box-title">βœ‰οΈ Handwriting Recognition</div>
2072
+ <div class="box-content">
2073
+ USPS mail sorting, check processing, form digitization
2074
+ </div>
2075
+ </div>
2076
+ <div class="info-box">
2077
+ <div class="box-title">πŸ“š Educational Foundation</div>
2078
+ <div class="box-content">
2079
+ Perfect starting point for learning CNNs - simple enough to understand, complex enough to be useful
2080
+ </div>
2081
+ </div>
2082
+ `
2083
+ },
2084
+ "alexnet": {
2085
+ overview: `
2086
+ <h3>AlexNet (2012) - The Deep Learning Revolution</h3>
2087
+ <p>Won ImageNet 2012 by huge margin (15.3% vs 26.2% error), igniting the deep learning revolution.</p>
2088
+
2089
+ <h3>Key Innovations</h3>
2090
+ <ul>
2091
+ <li><strong>ReLU Activation:</strong> Faster training than sigmoid/tanh</li>
2092
+ <li><strong>Dropout:</strong> Prevents overfitting (p=0.5)</li>
2093
+ <li><strong>Data Augmentation:</strong> Random crops/flips</li>
2094
+ <li><strong>GPU Training:</strong> Used 2 GTX580 GPUs</li>
2095
+ <li><strong>Deep:</strong> 8 layers (5 conv + 3 FC), 60M parameters</li>
2096
+ </ul>
2097
+
2098
+ <div class="callout tip">
2099
+ <div class="callout-title">πŸ’‘ Why So Important?</div>
2100
+ First to show that deeper networks + more data + GPU compute = breakthrough performance
2101
+ </div>
2102
+ `,
2103
+ concepts: `
2104
+ <h3>Technical Contributions</h3>
2105
+ <div class="list-item">
2106
+ <div class="list-num">01</div>
2107
+ <div><strong>ReLU:</strong> Solved vanishing gradients, enabled deeper networks</div>
2108
+ </div>
2109
+ <div class="list-item">
2110
+ <div class="list-num">02</div>
2111
+ <div><strong>Dropout:</strong> First major regularization for deep nets</div>
2112
+ </div>
2113
+ `,
2114
+ applications: `
2115
+ <div class="info-box">
2116
+ <div class="box-title">🎯 ImageNet Challenge</div>
2117
+ <div class="box-content">
2118
+ Shattered records on 1000-class classification, proving deep learning superiority
2119
+ </div>
2120
+ </div>
2121
+ <div class="info-box">
2122
+ <div class="box-title">πŸš€ Industry Catalyst</div>
2123
+ <div class="box-content">
2124
+ Sparked AI renaissance - Google, Facebook, Microsoft pivoted to deep learning after AlexNet
2125
+ </div>
2126
+ </div>
2127
+ `
2128
+ },
2129
+ "vgg": {
2130
+ overview: `
2131
+ <h3>VGGNet (2014) - The Power of Depth</h3>
2132
+ <p>VGG showed that depth matters - 16-19 layers using only small 3Γ—3 filters.</p>
2133
+
2134
+ <h3>Key Insight: Stacking Small Filters</h3>
2135
+ <p>Two 3Γ—3 conv layers = same receptive field as one 5Γ—5, but:</p>
2136
+ <ul>
2137
+ <li><strong>Fewer Parameters:</strong> 2Γ—(3Β²) = 18 vs 5Β² = 25</li>
2138
+ <li><strong>More Non-linearity:</strong> Two ReLUs instead of one</li>
2139
+ <li><strong>Deeper Network:</strong> Better feature learning</li>
2140
+ </ul>
2141
+
2142
+ <div class="callout warning">
2143
+ <div class="callout-title">⚠️ Limitation</div>
2144
+ 138M parameters (VGG-16) - very memory intensive for deployment
2145
+ </div>
2146
+ `
2147
+ },
2148
+ "resnet": {
2149
+ overview: `
2150
+ <h3>ResNet (2015) - Residual Connections</h3>
2151
+ <p><strong>Problem:</strong> Very deep networks (>20 layers) had degradation - training accuracy got worse!</p>
2152
+
2153
+ <h3>Solution: Skip Connections</h3>
2154
+ <div class="formula">
2155
+ Instead of learning H(x), learn residual F(x) = H(x) - x<br>
2156
+ Output: y = F(x) + x (shortcut connection)
2157
+ </div>
2158
+
2159
+ <h3>Why Skip Connections Work</h3>
2160
+ <ul>
2161
+ <li><strong>Gradient Flow:</strong> Gradients flow directly through shortcuts</li>
2162
+ <li><strong>Identity Mapping:</strong> Easy to learn identity (just set F(x)=0)</li>
2163
+ <li><strong>Feature Reuse:</strong> Earlier features directly available to later layers</li>
2164
+ </ul>
2165
+
2166
+ <div class="callout tip">
2167
+ <div class="callout-title">πŸ† Impact</div>
2168
+ β€’ Enabled training of 152-layer networks (even 1000+ layers)<br>
2169
+ β€’ Won ImageNet 2015<br>
2170
+ β€’ Skip connections now used everywhere (U-Net, Transformers, etc.)
2171
+ </div>
2172
+ `
2173
+ },
2174
+ "inception": {
2175
+ overview: `
2176
+ <h3>Inception/GoogLeNet (2014) - Going Wider</h3>
2177
+ <p>Instead of going deeper, Inception modules go wider - using multiple filter sizes in parallel.</p>
2178
+
2179
+ <h3>Inception Module</h3>
2180
+ <div class="formula">
2181
+ Input β†’ [1Γ—1 conv] βŠ• [3Γ—3 conv] βŠ• [5Γ—5 conv] βŠ• [3Γ—3 pool] β†’ Concatenate
2182
+ </div>
2183
+
2184
+ <h3>Key Innovation: 1Γ—1 Convolutions</h3>
2185
+ <ul>
2186
+ <li><strong>Dimensionality Reduction:</strong> Reduce channels before expensive 3Γ—3, 5Γ—5</li>
2187
+ <li><strong>Non-linearity:</strong> Add extra ReLU</li>
2188
+ <li><strong>Bottleneck Design:</strong> Reduces FLOPs by 10Γ—</li>
2189
+ </ul>
2190
+
2191
+ <div class="callout insight">
2192
+ <div class="callout-title">πŸ’‘ Efficiency</div>
2193
+ 22 layers but only 5M parameters (27Γ— less than AlexNet!)
2194
+ </div>
2195
+ `
2196
+ },
2197
+ "mobilenet": {
2198
+ overview: `
2199
+ <h3>MobileNet - CNNs for Mobile Devices</h3>
2200
+ <p>Designed for mobile/embedded vision using depthwise separable convolutions.</p>
2201
+
2202
+ <h3>Depthwise Separable Convolution</h3>
2203
+ <div class="formula">
2204
+ Standard Conv = Depthwise Conv + Pointwise (1Γ—1) Conv
2205
+ </div>
2206
+
2207
+ <h3>Computation Reduction</h3>
2208
+ <table>
2209
+ <tr>
2210
+ <th>Method</th>
2211
+ <th>Parameters</th>
2212
+ <th>FLOPs</th>
2213
+ </tr>
2214
+ <tr>
2215
+ <td>Standard 3Γ—3 Conv</td>
2216
+ <td>3Γ—3Γ—C_inΓ—C_out</td>
2217
+ <td>High</td>
2218
+ </tr>
2219
+ <tr>
2220
+ <td>Depthwise Separable</td>
2221
+ <td>3Γ—3Γ—C_in + C_inΓ—C_out</td>
2222
+ <td><strong>8-9Γ— less!</strong></td>
2223
+ </tr>
2224
+ </table>
2225
+
2226
+ <div class="callout tip">
2227
+ <div class="callout-title">βœ… Applications</div>
2228
+ β€’ Real-time mobile apps (camera filters, AR)<br>
2229
+ β€’ Edge devices (drones, IoT)<br>
2230
+ β€’ Latency-critical systems<br>
2231
+ β€’ Good accuracy with 10-20Γ— speedup
2232
+ </div>
2233
+ `
2234
+ },
2235
+ "transfer-learning": {
2236
+ overview: `
2237
+ <h3>Transfer Learning - Don't Train from Scratch!</h3>
2238
+ <p>Use pre-trained models (ImageNet) as feature extractors for your custom task.</p>
2239
+
2240
+ <h3>Two Strategies</h3>
2241
+ <table>
2242
+ <tr>
2243
+ <th>Approach</th>
2244
+ <th>When to Use</th>
2245
+ <th>How</th>
2246
+ </tr>
2247
+ <tr>
2248
+ <td>Feature Extraction</td>
2249
+ <td><strong>Small dataset</strong> (<10K images)</td>
2250
+ <td>Freeze all layers, train only final FC layer</td>
2251
+ </tr>
2252
+ <tr>
2253
+ <td>Fine-tuning</td>
2254
+ <td><strong>Medium dataset</strong> (10K-100K)</td>
2255
+ <td>Freeze early layers, train last few + FC</td>
2256
+ </tr>
2257
+ <tr>
2258
+ <td>Full Training</td>
2259
+ <td><strong>Large dataset</strong> (>1M images)</td>
2260
+ <td>Use pre-trained as initialization, train all</td>
2261
+ </tr>
2262
+ </table>
2263
+
2264
+ <div class="callout tip">
2265
+ <div class="callout-title">πŸ’‘ Best Practices</div>
2266
+ β€’ Use pre-trained models when dataset < 100K images<br>
2267
+ β€’ Start with low learning rate (1e-4) for fine-tuning<br>
2268
+ β€’ Popular backbones: ResNet50, EfficientNet, ViT
2269
+ </div>
2270
+ `
2271
+ },
2272
+ "localization": {
2273
+ overview: `
2274
+ <h3>Object Localization</h3>
2275
+ <p>Predict both class and bounding box for a single object in image.</p>
2276
+
2277
+ <h3>Multi-Task Loss</h3>
2278
+ <div class="formula">
2279
+ Total Loss = L_classification + Ξ» Γ— L_bbox<br>
2280
+ <br>
2281
+ Where:<br>
2282
+ L_classification = Cross-Entropy<br>
2283
+ L_bbox = Smooth L1 or IoU loss<br>
2284
+ Ξ» = balance term (typically 1-10)
2285
+ </div>
2286
+
2287
+ <h3>Bounding Box Representation</h3>
2288
+ <ul>
2289
+ <li><strong>Option 1:</strong> (x_min, y_min, x_max, y_max)</li>
2290
+ <li><strong>Option 2:</strong> (x_center, y_center, width, height) ← Most common</li>
2291
+ </ul>
2292
+ `
2293
+ },
2294
+ "rcnn": {
2295
+ overview: `
2296
+ <h3>R-CNN Family Evolution</h3>
2297
+ <table>
2298
+ <tr>
2299
+ <th>Model</th>
2300
+ <th>Year</th>
2301
+ <th>Speed (FPS)</th>
2302
+ <th>Key Innovation</th>
2303
+ </tr>
2304
+ <tr>
2305
+ <td>R-CNN</td>
2306
+ <td>2014</td>
2307
+ <td>0.05</td>
2308
+ <td>Selective Search + CNN features</td>
2309
+ </tr>
2310
+ <tr>
2311
+ <td>Fast R-CNN</td>
2312
+ <td>2015</td>
2313
+ <td>0.5</td>
2314
+ <td>RoI Pooling (share conv features)</td>
2315
+ </tr>
2316
+ <tr>
2317
+ <td>Faster R-CNN</td>
2318
+ <td>2015</td>
2319
+ <td>7</td>
2320
+ <td>Region Proposal Network (RPN)</td>
2321
+ </tr>
2322
+ <tr>
2323
+ <td>Mask R-CNN</td>
2324
+ <td>2017</td>
2325
+ <td>5</td>
2326
+ <td>+ Instance Segmentation masks</td>
2327
+ </tr>
2328
+ </table>
2329
+
2330
+ <div class="callout tip">
2331
+ <div class="callout-title">πŸ’‘ When to Use</div>
2332
+ Faster R-CNN: Best accuracy for detection (not real-time)<br>
2333
+ Mask R-CNN: Detection + instance segmentation
2334
+ </div>
2335
+ `
2336
+ },
2337
+ "ssd": {
2338
+ overview: `
2339
+ <h3>SSD (Single Shot MultiBox Detector)</h3>
2340
+ <p>Balances speed and accuracy by predicting boxes at multiple scales.</p>
2341
+
2342
+ <h3>Key Ideas</h3>
2343
+ <ul>
2344
+ <li><strong>Multi-Scale:</strong> Predictions from different layers (early = small objects, deep = large)</li>
2345
+ <li><strong>Default Boxes (Anchors):</strong> Pre-defined boxes of various aspects ratios</li>
2346
+ <li><strong>Single Pass:</strong> No separate region proposal step</li>
2347
+ </ul>
2348
+
2349
+ <div class="callout insight">
2350
+ <div class="callout-title">πŸ“Š Performance</div>
2351
+ SSD300: 59 FPS, 74.3% mAP<br>
2352
+ SSD512: 22 FPS, 76.8% mAP<br>
2353
+ <br>
2354
+ Sweet spot between YOLO (faster) and Faster R-CNN (more accurate)
2355
+ </div>
2356
+ `
2357
+ },
2358
+ "semantic-seg": {
2359
+ overview: `
2360
+ <h3>Semantic Segmentation</h3>
2361
+ <p>Classify every pixel in the image (pixel-wise classification).</p>
2362
+
2363
+ <h3>Popular Architectures</h3>
2364
+ <table>
2365
+ <tr>
2366
+ <th>Model</th>
2367
+ <th>Key Feature</th>
2368
+ </tr>
2369
+ <tr>
2370
+ <td>FCN</td>
2371
+ <td>Fully Convolutional (no FC layers)</td>
2372
+ </tr>
2373
+ <tr>
2374
+ <td>U-Net</td>
2375
+ <td>Skip connections from encoder to decoder</td>
2376
+ </tr>
2377
+ <tr>
2378
+ <td>DeepLab</td>
2379
+ <td>Atrous (dilated) convolutions + ASPP</td>
2380
+ </tr>
2381
+ </table>
2382
+
2383
+ <div class="formula">
2384
+ U-Net Pattern:<br>
2385
+ Input β†’ Encoder (downsample) β†’ Bottleneck β†’ Decoder (upsample) β†’ Pixel-wise Output<br>
2386
+ With skip connections from encoder to decoder at each level
2387
+ </div>
2388
+ `,
2389
+ applications: `
2390
+ <div class="info-box">
2391
+ <div class="box-title">πŸ₯ Medical Imaging</div>
2392
+ <div class="box-content">Tumor segmentation, organ delineation, cell analysis</div>
2393
+ </div>
2394
+ <div class="info-box">
2395
+ <div class="box-title">πŸš— Autonomous Driving</div>
2396
+ <div class="box-content">Road segmentation, free space detection, drivable area</div>
2397
+ </div>
2398
+ `
2399
+ },
2400
+ "instance-seg": {
2401
+ overview: `
2402
+ <h3>Instance Segmentation</h3>
2403
+ <p>Detect AND segment each individual object (combines object detection + semantic segmentation).</p>
2404
+
2405
+ <h3>Difference from Semantic Segmentation</h3>
2406
+ <ul>
2407
+ <li><strong>Semantic:</strong> All "person" pixels get same label</li>
2408
+ <li><strong>Instance:</strong> Person #1, Person #2, Person #3 (separate instances)</li>
2409
+ </ul>
2410
+
2411
+ <h3>Main Approach: Mask R-CNN</h3>
2412
+ <div class="formula">
2413
+ Faster R-CNN + Segmentation Branch<br>
2414
+ <br>
2415
+ For each RoI:<br>
2416
+ 1. Bounding box regression<br>
2417
+ 2. Class prediction<br>
2418
+ 3. <strong>Binary mask for the object</strong>
2419
+ </div>
2420
+ `
2421
+ },
2422
+ "face-recog": {
2423
+ overview: `
2424
+ <h3>Face Recognition with Siamese Networks</h3>
2425
+ <p>Learn similarity between faces using metric learning instead of classification.</p>
2426
+
2427
+ <h3>Triplet Loss Training</h3>
2428
+ <div class="formula">
2429
+ Loss = max(||f(A) - f(P)||Β² - ||f(A) - f(N)||Β² + margin, 0)<br>
2430
+ <br>
2431
+ Where:<br>
2432
+ A = Anchor (reference face)<br>
2433
+ P = Positive (same person)<br>
2434
+ N = Negative (different person)<br>
2435
+ margin = minimum separation (e.g., 0.2)
2436
+ </div>
2437
+
2438
+ <div class="callout tip">
2439
+ <div class="callout-title">πŸ’‘ One-Shot Learning</div>
2440
+ After training, recognize new people with just 1-2 photos!<br>
2441
+ No retraining needed - just compare embeddings.
2442
+ </div>
2443
+ `,
2444
+ applications: `
2445
+ <div class="info-box">
2446
+ <div class="box-title">πŸ“± Phone Unlock</div>
2447
+ <div class="box-content">Face ID, biometric authentication</div>
2448
+ </div>
2449
+ <div class="info-box">
2450
+ <div class="box-title">πŸ”’ Security</div>
2451
+ <div class="box-content">Access control, surveillance, identity verification</div>
2452
+ </div>
2453
+ `
2454
+ },
2455
+ "autoencoders": {
2456
+ overview: `
2457
+ <h3>Autoencoders</h3>
2458
+ <p>Unsupervised learning to compress data into latent representation and reconstruct it.</p>
2459
+
2460
+ <h3>Architecture</h3>
2461
+ <div class="formula">
2462
+ Input β†’ Encoder β†’ Latent Code (bottleneck) β†’ Decoder β†’ Reconstruction<br>
2463
+ <br>
2464
+ Loss = ||Input - Reconstruction||Β² (MSE)
2465
+ </div>
2466
+
2467
+ <h3>Variants</h3>
2468
+ <ul>
2469
+ <li><strong>Vanilla:</strong> Basic autoencoder</li>
2470
+ <li><strong>Denoising:</strong> Input corrupted, output clean (learns robust features)</li>
2471
+ <li><strong>Variational (VAE):</strong> Probabilistic latent space (for generation)</li>
2472
+ <li><strong>Sparse:</strong> Encourage sparse activations</li>
2473
+ </ul>
2474
+ `,
2475
+ applications: `
2476
+ <div class="info-box">
2477
+ <div class="box-title">πŸ—œοΈ Compression</div>
2478
+ <div class="box-content">Dimensionality reduction, data compression, feature extraction</div>
2479
+ </div>
2480
+ <div class="info-box">
2481
+ <div class="box-title">πŸ” Anomaly Detection</div>
2482
+ <div class="box-content">High reconstruction error = anomaly (fraud detection, defect detection)</div>
2483
+ </div>
2484
+ `
2485
+ },
2486
+ "gans": {
2487
+ overview: `
2488
+ <h3>GANs (Generative Adversarial Networks)</h3>
2489
+ <p>Two networks compete: Generator creates fake data, Discriminator tries to detect fakes.</p>
2490
+
2491
+ <h3>The GAN Game</h3>
2492
+ <div class="formula">
2493
+ Generator: Creates fake images from random noise<br>
2494
+ Goal: Fool discriminator<br>
2495
+ <br>
2496
+ Discriminator: Classifies real vs fake<br>
2497
+ Goal: Correctly identify fakes<br>
2498
+ <br>
2499
+ Minimax Loss:<br>
2500
+ min_G max_D E[log D(x)] + E[log(1 - D(G(z)))]
2501
+ </div>
2502
+
2503
+ <div class="callout warning">
2504
+ <div class="callout-title">⚠️ Training Challenges</div>
2505
+ β€’ Mode collapse (Generator produces limited variety)<br>
2506
+ β€’ Training instability (careful tuning needed)<br>
2507
+ β€’ Convergence issues<br>
2508
+ β€’ Solutions: Wasserstein GAN, Spectral Normalization, StyleGAN improvements
2509
+ </div>
2510
+ `,
2511
+ applications: `
2512
+ <div class="info-box">
2513
+ <div class="box-title">🎨 Image Generation</div>
2514
+ <div class="box-content">
2515
+ <strong>StyleGAN:</strong> Photorealistic faces, art generation<br>
2516
+ <strong>DCGAN:</strong> Bedroom images, object generation
2517
+ </div>
2518
+ </div>
2519
+ `,
2520
+ math: `
2521
+ <h3>The Minimax Game Objective</h3>
2522
+ <p>The original GAN objective from Ian Goodfellow (2014) is a zero-sum game between Discriminator (D) and Generator (G).</p>
2523
+
2524
+ <div class="formula" style="font-size: 1.1rem; padding: 20px;">
2525
+ min_G max_D V(D, G) = E_x∼p_data[log D(x)] + E_z∼p_z[log(1 - D(G(z)))]
2526
+ </div>
2527
+
2528
+ <h3>Paper & Pain: Finding the Optimal Discriminator</h3>
2529
+ <p>For a fixed Generator, the optimal Discriminator D* is:</p>
2530
+ <div class="formula">
2531
+ D*(x) = p_data(x) / (p_data(x) + p_g(x))
2532
+ </div>
2533
+
2534
+ <div class="callout insight">
2535
+ <div class="callout-title">πŸ“ Theoretical Insight</div>
2536
+ When the Discriminator is optimal, the Generator's task is essentially to minimize the <strong>Jensen-Shannon Divergence (JSD)</strong> between the data distribution and the model distribution. <br>
2537
+ <strong>Problem:</strong> JSD is "flat" when distributions don't overlap, leading to vanishing gradients. This is why <strong>Wasserstein GAN (WGAN)</strong> was inventedβ€”using Earth Mover's distance instead!
2538
+ </div>
2539
+
2540
+ <h3>Generator Gradient Problem</h3>
2541
+ <p>Early in training, D(G(z)) is near 0. The term log(1-D(G(z))) has a very small gradient. </p>
2542
+ <div class="list-item">
2543
+ <div class="list-num">πŸ’‘</div>
2544
+ <div><strong>Heuristic Fix:</strong> Instead of minimizing log(1-D(G(z))), we maximize <strong>log D(G(z))</strong>. This provides much stronger gradients early on!</div>
2545
+ </div>
2546
+ `
2547
+ },
2548
+ "diffusion": {
2549
+ overview: `
2550
+ <h3>Diffusion Models</h3>
2551
+ <p>Learn to reverse a gradual noising process, generating high-quality images.</p>
2552
+
2553
+ <h3>How Diffusion Works</h3>
2554
+ <div class="list-item">
2555
+ <div class="list-num">01</div>
2556
+ <div><strong>Forward Process:</strong> Gradually add Gaussian noise over T steps (xβ‚€ β†’ x₁ β†’ ... β†’ x_T = pure noise)</div>
2557
+ </div>
2558
+ <div class="list-item">
2559
+ <div class="list-num">02</div>
2560
+ <div><strong>Reverse Process:</strong> Train neural network to denoise (x_T β†’ x_{T-1} β†’ ... β†’ xβ‚€ = clean image)</div>
2561
+ </div>
2562
+ <div class="list-item">
2563
+ <div class="list-num">03</div>
2564
+ <div><strong>Generation:</strong> Start from random noise, iteratively denoise T steps</div>
2565
+ </div>
2566
+
2567
+ <div class="callout tip">
2568
+ <div class="callout-title">βœ… Advantages over GANs</div>
2569
+ β€’ More stable training (no adversarial dynamics)<br>
2570
+ β€’ Better sample quality and diversity<br>
2571
+ β€’ Mode coverage (no mode collapse)<br>
2572
+ β€’ Controllable generation (text-to-image)
2573
+ </div>
2574
+ `,
2575
+ applications: `
2576
+ <div class="info-box">
2577
+ <div class="box-title">πŸ–ΌοΈ Text-to-Image</div>
2578
+ <div class="box-content">
2579
+ <strong>Stable Diffusion:</strong> Open-source, runs on consumer GPUs<br>
2580
+ <strong>DALL-E 2:</strong> OpenAI's photorealistic generator<br>
2581
+ <strong>Midjourney:</strong> Artistic image generation
2582
+ </div>
2583
+ </div>
2584
+ `
2585
+ },
2586
+ "rnn": {
2587
+ overview: `
2588
+ <h3>RNNs & LSTMs</h3>
2589
+ <p>Process sequences by maintaining hidden state that captures past information.</p>
2590
+
2591
+ <h3>The Vanishing Gradient Problem</h3>
2592
+ <p><strong>Problem:</strong> Standard RNNs can't learn long-term dependencies (gradients vanish over many time steps)</p>
2593
+ <p><strong>Solution:</strong> LSTM (Long Short-Term Memory) with gating mechanisms</p>
2594
+
2595
+ <h3>LSTM Gates</h3>
2596
+ <ul>
2597
+ <li><strong>Forget Gate:</strong> What to remove from cell state</li>
2598
+ <li><strong>Input Gate:</strong> What new information to add</li>
2599
+ <li><strong>Output Gate:</strong> What to output as hidden state</li>
2600
+ </ul>
2601
+
2602
+ <div class="callout warning">
2603
+ <div class="callout-title">⚠️ Limitation</div>
2604
+ Sequential processing (can't parallelize) - Transformers solved this!
2605
+ </div>
2606
+ `,
2607
+ applications: `
2608
+ <div class="info-box">
2609
+ <div class="box-title">πŸ“ Text Generation</div>
2610
+ <div class="box-content">Character-level generation, autocomplete (before Transformers)</div>
2611
+ </div>
2612
+ <div class="info-box">
2613
+ <div class="box-title">🎡 Time Series</div>
2614
+ <div class="box-content">Stock prediction, weather forecasting, music generation</div>
2615
+ </div>
2616
+ `,
2617
+ math: `
2618
+ <h3>RNN State Equations</h3>
2619
+ <p>Standard RNN processes a sequence x₁, xβ‚‚, ..., xβ‚œ using a recurring hidden state hβ‚œ.</p>
2620
+
2621
+ <div class="formula">
2622
+ hβ‚œ = tanh(Wβ‚•β‚•hβ‚œβ‚‹β‚ + Wβ‚“β‚•xβ‚œ + bβ‚•)<br>
2623
+ yβ‚œ = Wβ‚•α΅§hβ‚œ + bα΅§
2624
+ </div>
2625
+
2626
+ <h3>Paper & Pain: The Vanishing Gradient Derivation</h3>
2627
+ <p>Why do RNNs fail on long sequences? Let's check the gradient βˆ‚L/βˆ‚h₁:</p>
2628
+ <div class="formula">
2629
+ βˆ‚L/βˆ‚h₁ = (βˆ‚L/βˆ‚hβ‚œ) Γ— (βˆ‚hβ‚œ/βˆ‚hβ‚œβ‚‹β‚) Γ— (βˆ‚hβ‚œβ‚‹β‚/βˆ‚hβ‚œβ‚‹β‚‚) Γ— ... Γ— (βˆ‚hβ‚‚/βˆ‚h₁)<br>
2630
+ <br>
2631
+ Where βˆ‚hβ±Ό/βˆ‚hⱼ₋₁ = Wβ‚•β‚•α΅€ diag(tanh'(zβ±Ό))
2632
+ </div>
2633
+ <div class="callout warning">
2634
+ <div class="callout-title">⚠️ The Power Effect</div>
2635
+ If the largest eigenvalue of Wβ‚•β‚• < 1: Gradients <strong>shrink exponentially</strong> (0.9¹⁰⁰ β‰ˆ 0.00002).<br>
2636
+ If > 1: Gradients <strong>explode</strong>.<br>
2637
+ <strong>LSTM Solution:</strong> The "Constant Error Carousel" (CEC) ensures gradients flow via the cell state without multiplication.
2638
+ </div>
2639
+
2640
+ <h3>LSTM Gating Math</h3>
2641
+ <div class="list-item">
2642
+ <div class="list-num">01</div>
2643
+ <div>Forget Gate: fβ‚œ = Οƒ(W_f[hβ‚œβ‚‹β‚, xβ‚œ] + b_f)</div>
2644
+ </div>
2645
+ <div class="list-item">
2646
+ <div class="list-num">02</div>
2647
+ <div>Input Gate: iβ‚œ = Οƒ(W_i[hβ‚œβ‚‹β‚, xβ‚œ] + b_i)</div>
2648
+ </div>
2649
+ <div class="list-item">
2650
+ <div class="list-num">03</div>
2651
+ <div>Cell State Update: cβ‚œ = fβ‚œcβ‚œβ‚‹β‚ + iβ‚œtanh(W_c[hβ‚œβ‚‹β‚, xβ‚œ] + b_c)</div>
2652
+ </div>
2653
+ `
2654
+ },
2655
+ "bert": {
2656
+ overview: `
2657
+ <h3>BERT (Bidirectional Encoder Representations from Transformers)</h3>
2658
+ <p>Pre-trained encoder-only Transformer for understanding language (not generation).</p>
2659
+
2660
+ <h3>Key Innovation: Bidirectional Context</h3>
2661
+ <p>Unlike GPT (left-to-right), BERT sees both left AND right context simultaneously.</p>
2662
+
2663
+ <h3>Pre-training Tasks</h3>
2664
+ <ul>
2665
+ <li><strong>Masked Language Modeling:</strong> Mask 15% of tokens, predict them (e.g., "The cat [MASK] on the mat" β†’ predict "sat")</li>
2666
+ <li><strong>Next Sentence Prediction:</strong> Predict if sentence B follows A</li>
2667
+ </ul>
2668
+
2669
+ <div class="callout tip">
2670
+ <div class="callout-title">πŸ’‘ Fine-tuning BERT</div>
2671
+ 1. Start with pre-trained BERT (trained on billions of words)<br>
2672
+ 2. Add task-specific head (classification, QA, NER)<br>
2673
+ 3. Fine-tune on your dataset (10K-100K examples)<br>
2674
+ 4. Achieves SOTA with minimal data!
2675
+ </div>
2676
+ `,
2677
+ applications: `
2678
+ <div class="info-box">
2679
+ <div class="box-title">πŸ” Search & QA</div>
2680
+ <div class="box-content">
2681
+ <strong>Google Search:</strong> Uses BERT for understanding queries<br>
2682
+ Question answering systems, document retrieval
2683
+ </div>
2684
+ </div>
2685
+ <div class="info-box">
2686
+ <div class="box-title">πŸ“Š Text Classification</div>
2687
+ <div class="box-content">Sentiment analysis, topic classification, spam detection</div>
2688
+ </div>
2689
+ `
2690
+ },
2691
+ "gpt": {
2692
+ overview: `
2693
+ <h3>GPT (Generative Pre-trained Transformer)</h3>
2694
+ <p>Decoder-only Transformer trained to predict next token (autoregressive language modeling).</p>
2695
+
2696
+ <h3>GPT Evolution</h3>
2697
+ <table>
2698
+ <tr>
2699
+ <th>Model</th>
2700
+ <th>Params</th>
2701
+ <th>Training Data</th>
2702
+ <th>Capability</th>
2703
+ </tr>
2704
+ <tr>
2705
+ <td>GPT-1</td>
2706
+ <td>117M</td>
2707
+ <td>BooksCorpus</td>
2708
+ <td>Basic text generation</td>
2709
+ </tr>
2710
+ <tr>
2711
+ <td>GPT-2</td>
2712
+ <td>1.5B</td>
2713
+ <td>WebText (40GB)</td>
2714
+ <td>Coherent paragraphs</td>
2715
+ </tr>
2716
+ <tr>
2717
+ <td>GPT-3</td>
2718
+ <td>175B</td>
2719
+ <td>570GB text</td>
2720
+ <td>Few-shot learning</td>
2721
+ </tr>
2722
+ <tr>
2723
+ <td>GPT-4</td>
2724
+ <td>~1.8T</td>
2725
+ <td>Multi-modal</td>
2726
+ <td>Reasoning, coding, images</td>
2727
+ </tr>
2728
+ </table>
2729
+
2730
+ <div class="callout insight">
2731
+ <div class="callout-title">πŸš€ Emergent Abilities</div>
2732
+ As models scale, new capabilities emerge:<br>
2733
+ β€’ In-context learning (learn from prompts)<br>
2734
+ β€’ Chain-of-thought reasoning<br>
2735
+ β€’ Code generation<br>
2736
+ β€’ Multi-step problem solving
2737
+ </div>
2738
+ `,
2739
+ applications: `
2740
+ <div class="info-box">
2741
+ <div class="box-title">πŸ’¬ ChatGPT & Assistants</div>
2742
+ <div class="box-content">
2743
+ Conversational AI, customer support, tutoring, brainstorming
2744
+ </div>
2745
+ </div>
2746
+ <div class="info-box">
2747
+ <div class="box-title">πŸ’» Code Generation</div>
2748
+ <div class="box-content">
2749
+ GitHub Copilot, code completion, bug fixing, documentation
2750
+ </div>
2751
+ </div>
2752
+ `
2753
+ },
2754
+ "vit": {
2755
+ overview: `
2756
+ <h3>Vision Transformer (ViT)</h3>
2757
+ <p>Apply Transformer architecture directly to images by treating them as sequences of patches.</p>
2758
+
2759
+ <h3>How ViT Works</h3>
2760
+ <div class="list-item">
2761
+ <div class="list-num">01</div>
2762
+ <div><strong>Patchify:</strong> Split 224Γ—224 image into 16Γ—16 patches (14Γ—14 = 196 patches)</div>
2763
+ </div>
2764
+ <div class="list-item">
2765
+ <div class="list-num">02</div>
2766
+ <div><strong>Linear Projection:</strong> Flatten each patch β†’ linear embedding (like word embeddings)</div>
2767
+ </div>
2768
+ <div class="list-item">
2769
+ <div class="list-num">03</div>
2770
+ <div><strong>Positional Encoding:</strong> Add position information</div>
2771
+ </div>
2772
+ <div class="list-item">
2773
+ <div class="list-num">04</div>
2774
+ <div><strong>Transformer Encoder:</strong> Standard Transformer (self-attention, FFN)</div>
2775
+ </div>
2776
+ <div class="list-item">
2777
+ <div class="list-num">05</div>
2778
+ <div><strong>Classification:</strong> Use [CLS] token for final prediction</div>
2779
+ </div>
2780
+
2781
+ <div class="callout tip">
2782
+ <div class="callout-title">πŸ’‘ When ViT Shines</div>
2783
+ β€’ <strong>Large Datasets:</strong> Needs 10M+ images (or pre-training on ImageNet-21K)<br>
2784
+ β€’ <strong>Transfer Learning:</strong> Pre-trained ViT beats CNNs on many tasks<br>
2785
+ β€’ <strong>Long-Range Dependencies:</strong> Global attention vs CNN's local receptive field
2786
+ </div>
2787
+ `
2788
+ }
2789
+ };
2790
+
2791
+ function createModuleHTML(module) {
2792
+ const content = MODULE_CONTENT[module.id] || {};
2793
+
2794
+ return `
2795
+ <div class="module" id="${module.id}-module">
2796
+ <button class="btn-back" onclick="switchTo('dashboard')">← Back to Dashboard</button>
2797
+ <header>
2798
+ <h1>${module.icon} ${module.title}</h1>
2799
+ <p class="subtitle">${module.description}</p>
2800
+ </header>
2801
+
2802
+ <div class="tabs">
2803
+ <button class="tab-btn active" onclick="switchTab(event, '${module.id}-overview')">Overview</button>
2804
+ <button class="tab-btn" onclick="switchTab(event, '${module.id}-concepts')">Key Concepts</button>
2805
+ <button class="tab-btn" onclick="switchTab(event, '${module.id}-visualization')">πŸ“Š Visualization</button>
2806
+ <button class="tab-btn" onclick="switchTab(event, '${module.id}-math')">Math</button>
2807
+ <button class="tab-btn" onclick="switchTab(event, '${module.id}-applications')">Applications</button>
2808
+ <button class="tab-btn" onclick="switchTab(event, '${module.id}-summary')">Summary</button>
2809
+ </div>
2810
+
2811
+ <div id="${module.id}-overview" class="tab active">
2812
+ <div class="section">
2813
+ <h2>πŸ“– Overview</h2>
2814
+ ${content.overview || `
2815
+ <p>Complete coverage of ${module.title.toLowerCase()}. Learn the fundamentals, mathematics, real-world applications, and implementation details.</p>
2816
+ <div class="info-box">
2817
+ <div class="box-title">Learning Objectives</div>
2818
+ <div class="box-content">
2819
+ βœ“ Understand core concepts and theory<br>
2820
+ βœ“ Master mathematical foundations<br>
2821
+ βœ“ Learn practical applications<br>
2822
+ βœ“ Implement and experiment
2823
+ </div>
2824
+ </div>
2825
+ `}
2826
+ </div>
2827
+ </div>
2828
+
2829
+ <div id="${module.id}-concepts" class="tab">
2830
+ <div class="section">
2831
+ <h2>🎯 Key Concepts</h2>
2832
+ ${content.concepts || `
2833
+ <p>Fundamental concepts and building blocks for ${module.title.toLowerCase()}.</p>
2834
+ <div class="callout insight">
2835
+ <div class="callout-title">πŸ’‘ Main Ideas</div>
2836
+ This section covers the core ideas you need to understand before diving into mathematics.
2837
+ </div>
2838
+ `}
2839
+ </div>
2840
+ </div>
2841
+
2842
+ <div id="${module.id}-visualization" class="tab">
2843
+ <div class="section">
2844
+ <h2>πŸ“Š Interactive Visualization</h2>
2845
+ <p>Visual representation to help understand ${module.title.toLowerCase()} concepts intuitively.</p>
2846
+ <div id="${module.id}-viz" class="viz-container">
2847
+ <canvas id="${module.id}-canvas" width="800" height="400" style="border: 1px solid rgba(0, 212, 255, 0.3); border-radius: 8px; background: rgba(0, 212, 255, 0.02);"></canvas>
2848
+ </div>
2849
+ <div class="viz-controls">
2850
+ <button onclick="drawVisualization('${module.id}')" class="btn-viz">πŸ”„ Refresh Visualization</button>
2851
+ <button onclick="toggleVizAnimation('${module.id}')" class="btn-viz">▢️ Animate</button>
2852
+ <button onclick="downloadViz('${module.id}')" class="btn-viz">⬇️ Save Image</button>
2853
+ </div>
2854
+ </div>
2855
+ </div>
2856
+
2857
+ <div id="${module.id}-math" class="tab">
2858
+ <div class="section">
2859
+ <h2>πŸ“ Mathematical Foundation</h2>
2860
+ ${content.math || `
2861
+ <p>Rigorous mathematical treatment of ${module.title.toLowerCase()}.</p>
2862
+ <div class="formula">
2863
+ Mathematical formulas and derivations go here
2864
+ </div>
2865
+ `}
2866
+ </div>
2867
+ </div>
2868
+
2869
+ <div id="${module.id}-applications" class="tab">
2870
+ <div class="section">
2871
+ <h2>🌍 Real-World Applications</h2>
2872
+ ${content.applications || `
2873
+ <p>How ${module.title.toLowerCase()} is used in practice across different industries.</p>
2874
+ <div class="info-box">
2875
+ <div class="box-title">Use Cases</div>
2876
+ <div class="box-content">
2877
+ Common applications and practical examples
2878
+ </div>
2879
+ </div>
2880
+ `}
2881
  </div>
2882
  </div>
2883
 
README.md CHANGED
@@ -8,6 +8,7 @@ Visit our courses directly in your browser:
8
 
9
  - [πŸ“ˆ Interactive Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
10
  - [πŸ€– Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
 
11
  - [πŸ“Š Data Visualization](https://aashishgarg13.github.io/DataScience/Visualization/)
12
  - [πŸ”’ Mathematics for Data Science](https://aashishgarg13.github.io/DataScience/math-ds-complete/)
13
  - [βš™οΈ Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)
@@ -42,6 +43,16 @@ Essential resources for mastering AI prompt engineering:
42
  - Visual Learning Aids
43
  - Step-by-Step Explanations
44
 
 
 
 
 
 
 
 
 
 
 
45
  ### πŸ“Š Data Visualization
46
  - **Location:** `Visualization/`
47
  - **Features:**
@@ -82,6 +93,7 @@ The repository supports automatic updates for:
82
  Visit our GitHub Pages hosted versions:
83
  1. [Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
84
  2. [Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
 
85
 
86
  ### Option B: Run Locally (Recommended for Development)
87
 
@@ -130,6 +142,12 @@ ml_complete-all-topics/
130
  └── app.js # Interactive components
131
  ```
132
 
 
 
 
 
 
 
133
  ### Data Visualization
134
  ```
135
  Visualization/
 
8
 
9
  - [πŸ“ˆ Interactive Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
10
  - [πŸ€– Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
11
+ - [🧠 Deep Learning Masterclass](https://aashishgarg13.github.io/DataScience/DeepLearning/Deep%20Learning%20Curriculum.html)
12
  - [πŸ“Š Data Visualization](https://aashishgarg13.github.io/DataScience/Visualization/)
13
  - [πŸ”’ Mathematics for Data Science](https://aashishgarg13.github.io/DataScience/math-ds-complete/)
14
  - [βš™οΈ Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)
 
43
  - Visual Learning Aids
44
  - Step-by-Step Explanations
45
 
46
+ ### 🧠 Deep Learning Masterclass
47
+ - **Location:** `DeepLearning/`
48
+ - **Features:**
49
+ - **"Paper & Pain" Methodology:** Rigorous mathematical derivations
50
+ - Neural Network Foundations (MLP, Backprop, Optimizers)
51
+ - Convolutional Neural Networks (CNNs) & Computer Vision
52
+ - Generative AI (GANs, Diffusion Models)
53
+ - Transformers & Large Language Models (LLMs)
54
+ - Interactive Canvas Visualizations
55
+
56
  ### πŸ“Š Data Visualization
57
  - **Location:** `Visualization/`
58
  - **Features:**
 
93
  Visit our GitHub Pages hosted versions:
94
  1. [Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
95
  2. [Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
96
+ 3. [Deep Learning Masterclass](https://aashishgarg13.github.io/DataScience/DeepLearning/Deep%20Learning%20Curriculum.html)
97
 
98
  ### Option B: Run Locally (Recommended for Development)
99
 
 
142
  └── app.js # Interactive components
143
  ```
144
 
145
+ ### Deep Learning Masterclass
146
+ ```
147
+ DeepLearning/
148
+ └── Deep Learning Curriculum.html # All-in-one interactive curriculum
149
+ ```
150
+
151
  ### Data Visualization
152
  ```
153
  Visualization/