Spaces:
Running
Running
add deeplearning
Browse files- DeepLearning/Deep Learning Curriculum.html +1649 -95
- README.md +18 -0
DeepLearning/Deep Learning Curriculum.html
CHANGED
|
@@ -880,6 +880,52 @@
|
|
| 880 |
β’ Try <strong>Leaky ReLU</strong> or <strong>ELU</strong> if ReLU neurons are dying<br>
|
| 881 |
β’ Avoid Sigmoid/Tanh in deep networks (gradient vanishing)
|
| 882 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 883 |
`
|
| 884 |
},
|
| 885 |
"conv-layer": {
|
|
@@ -1214,116 +1260,1624 @@
|
|
| 1214 |
<strong>CodeGen, AlphaCode:</strong> Automated coding, bug detection
|
| 1215 |
</div>
|
| 1216 |
</div>
|
| 1217 |
-
|
| 1218 |
-
|
| 1219 |
-
|
| 1220 |
-
|
| 1221 |
-
|
| 1222 |
-
|
|
|
|
|
|
|
| 1223 |
|
| 1224 |
-
|
| 1225 |
-
|
| 1226 |
-
|
| 1227 |
-
|
| 1228 |
-
|
| 1229 |
-
|
| 1230 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1231 |
|
| 1232 |
-
<div class="
|
| 1233 |
-
<
|
| 1234 |
-
|
| 1235 |
-
<
|
| 1236 |
-
|
| 1237 |
-
|
| 1238 |
-
<button class="tab-btn" onclick="switchTab(event, '${module.id}-summary')">Summary</button>
|
| 1239 |
</div>
|
| 1240 |
|
| 1241 |
-
<div
|
| 1242 |
-
<div class="
|
| 1243 |
-
|
| 1244 |
-
|
| 1245 |
-
|
| 1246 |
-
|
| 1247 |
-
|
| 1248 |
-
|
| 1249 |
-
|
| 1250 |
-
|
| 1251 |
-
|
| 1252 |
-
|
| 1253 |
-
|
| 1254 |
-
|
| 1255 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1256 |
</div>
|
| 1257 |
</div>
|
| 1258 |
-
|
| 1259 |
-
|
| 1260 |
-
<div class="
|
| 1261 |
-
|
| 1262 |
-
${content.concepts || `
|
| 1263 |
-
<p>Fundamental concepts and building blocks for ${module.title.toLowerCase()}.</p>
|
| 1264 |
-
<div class="callout insight">
|
| 1265 |
-
<div class="callout-title">π‘ Main Ideas</div>
|
| 1266 |
-
This section covers the core ideas you need to understand before diving into mathematics.
|
| 1267 |
-
</div>
|
| 1268 |
-
`}
|
| 1269 |
</div>
|
| 1270 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1271 |
|
| 1272 |
-
<
|
| 1273 |
-
|
| 1274 |
-
|
| 1275 |
-
|
| 1276 |
-
|
| 1277 |
-
|
| 1278 |
-
|
| 1279 |
-
|
| 1280 |
-
|
| 1281 |
-
|
| 1282 |
-
|
| 1283 |
-
|
| 1284 |
-
</div>
|
| 1285 |
</div>
|
| 1286 |
|
| 1287 |
-
<div
|
| 1288 |
-
<div class="
|
| 1289 |
-
|
| 1290 |
-
|
| 1291 |
-
|
| 1292 |
-
|
| 1293 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1294 |
</div>
|
| 1295 |
-
|
| 1296 |
-
|
| 1297 |
-
|
| 1298 |
-
|
| 1299 |
-
|
| 1300 |
-
<div class="viz-controls">
|
| 1301 |
-
<button onclick="drawMathVisualization('${module.id}')" class="btn-viz">π Visualize Equations</button>
|
| 1302 |
-
</div>
|
| 1303 |
</div>
|
| 1304 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1305 |
|
| 1306 |
-
<
|
| 1307 |
-
|
| 1308 |
-
|
| 1309 |
-
|
| 1310 |
-
|
| 1311 |
-
|
| 1312 |
-
|
| 1313 |
-
|
| 1314 |
-
|
| 1315 |
-
|
| 1316 |
-
|
| 1317 |
-
|
| 1318 |
-
</div>
|
| 1319 |
-
|
| 1320 |
-
|
| 1321 |
-
|
| 1322 |
-
|
| 1323 |
-
|
| 1324 |
-
|
| 1325 |
-
|
| 1326 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1327 |
</div>
|
| 1328 |
</div>
|
| 1329 |
|
|
|
|
| 880 |
β’ Try <strong>Leaky ReLU</strong> or <strong>ELU</strong> if ReLU neurons are dying<br>
|
| 881 |
β’ Avoid Sigmoid/Tanh in deep networks (gradient vanishing)
|
| 882 |
</div>
|
| 883 |
+
`,
|
| 884 |
+
applications: `
|
| 885 |
+
<div class="info-box">
|
| 886 |
+
<div class="box-title">π§ Neural Network Design</div>
|
| 887 |
+
<div class="box-content">
|
| 888 |
+
Critical choice for every neural network - affects training speed, convergence, and final accuracy
|
| 889 |
+
</div>
|
| 890 |
+
</div>
|
| 891 |
+
<div class="info-box">
|
| 892 |
+
<div class="box-title">π― Task-Specific Selection</div>
|
| 893 |
+
<div class="box-content">
|
| 894 |
+
Different tasks need different outputs: Sigmoid for binary, Softmax for multi-class, Linear for regression
|
| 895 |
+
</div>
|
| 896 |
+
</div>
|
| 897 |
+
`,
|
| 898 |
+
math: `
|
| 899 |
+
<h3>Derivatives: The Backprop Fuel</h3>
|
| 900 |
+
<p>Activation functions must be differentiable for backpropagation to work. Let's look at the derivatives on paper:</p>
|
| 901 |
+
|
| 902 |
+
<div class="list-item">
|
| 903 |
+
<div class="list-num">01</div>
|
| 904 |
+
<div><strong>Sigmoid:</strong> Ο(z) = 1 / (1 + eβ»αΆ»)<br>
|
| 905 |
+
<strong>Derivative:</strong> Ο'(z) = Ο(z)(1 - Ο(z))<br>
|
| 906 |
+
<span class="formula-caption">Max gradient is 0.25 (at z=0). This is why deep networks vanish!</span></div>
|
| 907 |
+
</div>
|
| 908 |
+
|
| 909 |
+
<div class="list-item">
|
| 910 |
+
<div class="list-num">02</div>
|
| 911 |
+
<div><strong>Tanh:</strong> tanh(z) = (eαΆ» - eβ»αΆ») / (eαΆ» + eβ»αΆ»)<br>
|
| 912 |
+
<strong>Derivative:</strong> tanh'(z) = 1 - tanhΒ²(z)<br>
|
| 913 |
+
<span class="formula-caption">Max gradient is 1.0 (at z=0). Better than Sigmoid, but still vanishes.</span></div>
|
| 914 |
+
</div>
|
| 915 |
+
|
| 916 |
+
<div class="list-item">
|
| 917 |
+
<div class="list-num">03</div>
|
| 918 |
+
<div><strong>ReLU:</strong> max(0, z)<br>
|
| 919 |
+
<strong>Derivative:</strong> 1 if z > 0, else 0<br>
|
| 920 |
+
<span class="formula-caption">Gradient is 1.0 for all positive z. No vanishing! But 0 for negative (Dying ReLU).</span></div>
|
| 921 |
+
</div>
|
| 922 |
+
|
| 923 |
+
<div class="callout insight">
|
| 924 |
+
<div class="callout-title">π Paper & Pain: The Chain Effect</div>
|
| 925 |
+
Each layer multiplies the gradient by Ο'(z). <br>
|
| 926 |
+
For 10 Sigmoid layers: Total gradient β (0.25)ΒΉβ° β <strong>0.00000095</strong><br>
|
| 927 |
+
This is the mathematical proof of the Vanishing Gradient Problem!
|
| 928 |
+
</div>
|
| 929 |
`
|
| 930 |
},
|
| 931 |
"conv-layer": {
|
|
|
|
| 1260 |
<strong>CodeGen, AlphaCode:</strong> Automated coding, bug detection
|
| 1261 |
</div>
|
| 1262 |
</div>
|
| 1263 |
+
`,
|
| 1264 |
+
math: `
|
| 1265 |
+
<h3>Scaled Dot-Product Attention</h3>
|
| 1266 |
+
<p>The "heart" of the Transformer. It computes how much "attention" to pay to different parts of the input sequence.</p>
|
| 1267 |
+
|
| 1268 |
+
<div class="formula" style="font-size: 1.3rem; text-align: center; margin: 20px 0; background: rgba(0, 212, 255, 0.05); padding: 20px; border-radius: 8px;">
|
| 1269 |
+
Attention(Q, K, V) = softmax( (QKα΅) / βdβ ) V
|
| 1270 |
+
</div>
|
| 1271 |
|
| 1272 |
+
<h3>Step-by-Step Derivation</h3>
|
| 1273 |
+
<div class="list-item">
|
| 1274 |
+
<div class="list-num">01</div>
|
| 1275 |
+
<div><strong>Dot Product (QKα΅):</strong> Compute raw similarity scores between Queries (what we want) and Keys (what we have)</div>
|
| 1276 |
+
</div>
|
| 1277 |
+
<div class="list-item">
|
| 1278 |
+
<div class="list-num">02</div>
|
| 1279 |
+
<div><strong>Scaling (1/βdβ):</strong> Divide by square root of key dimension. <strong>Why?</strong> With high dimensions, dot products grow large, pushing softmax into regions with vanishing gradients. Scaling prevents this.</div>
|
| 1280 |
+
</div>
|
| 1281 |
+
<div class="list-item">
|
| 1282 |
+
<div class="list-num">03</div>
|
| 1283 |
+
<div><strong>Softmax:</strong> Convert similarity scores into probabilities (attention weights) that sum to 1</div>
|
| 1284 |
+
</div>
|
| 1285 |
+
<div class="list-item">
|
| 1286 |
+
<div class="list-num">04</div>
|
| 1287 |
+
<div><strong>Weighted Sum (ΓV):</strong> Use attention weights to pull information from Values.</div>
|
| 1288 |
+
</div>
|
| 1289 |
|
| 1290 |
+
<div class="callout insight">
|
| 1291 |
+
<div class="callout-title">π Paper & Pain: Multi-Head Attention</div>
|
| 1292 |
+
Instead of one big attention, we split Q, K, V into <em>h</em> heads:<br>
|
| 1293 |
+
1. Heads learn <strong>different aspects</strong> (e.g., syntax vs semantics)<br>
|
| 1294 |
+
2. Concat all heads: MultiHead = Concat(headβ, ..., headβ)Wα΄Ό<br>
|
| 1295 |
+
3. Complexity: <strong>O(nΒ² Β· d)</strong> - This is why long sequences are hard!
|
|
|
|
| 1296 |
</div>
|
| 1297 |
|
| 1298 |
+
<div class="callout warning">
|
| 1299 |
+
<div class="callout-title">π Sinusoidal Positional Encoding</div>
|
| 1300 |
+
PE(pos, 2i) = sin(pos / 10000^{2i/d})<br>
|
| 1301 |
+
PE(pos, 2i+1) = cos(pos / 10000^{2i/d})<br>
|
| 1302 |
+
This allows the model to learn relative positions since PE(pos+k) is a linear function of PE(pos).
|
| 1303 |
+
</div>
|
| 1304 |
+
`
|
| 1305 |
+
},
|
| 1306 |
+
"perceptron": {
|
| 1307 |
+
overview: `
|
| 1308 |
+
<h3>What is a Perceptron?</h3>
|
| 1309 |
+
<p>The perceptron is the simplest neural network, invented in 1958. It's a binary linear classifier that makes predictions based on weighted inputs.</p>
|
| 1310 |
+
|
| 1311 |
+
<div class="callout tip">
|
| 1312 |
+
<div class="callout-title">β
Advantages</div>
|
| 1313 |
+
β’ Simple and fast<br>
|
| 1314 |
+
β’ Guaranteed convergence for linearly separable data<br>
|
| 1315 |
+
β’ Interpretable weights
|
| 1316 |
+
</div>
|
| 1317 |
+
|
| 1318 |
+
<div class="callout warning">
|
| 1319 |
+
<div class="callout-title">β οΈ Key Limitation</div>
|
| 1320 |
+
<strong>Cannot solve XOR:</strong> Limited to linear decision boundaries only
|
| 1321 |
+
</div>
|
| 1322 |
+
`,
|
| 1323 |
+
concepts: `
|
| 1324 |
+
<h3>How Perceptron Works</h3>
|
| 1325 |
+
<div class="list-item">
|
| 1326 |
+
<div class="list-num">01</div>
|
| 1327 |
+
<div><strong>Weighted Sum:</strong> z = wβxβ + wβxβ + ... + b</div>
|
| 1328 |
+
</div>
|
| 1329 |
+
<div class="list-item">
|
| 1330 |
+
<div class="list-num">02</div>
|
| 1331 |
+
<div><strong>Step Function:</strong> Output = 1 if z β₯ 0, else 0</div>
|
| 1332 |
+
</div>
|
| 1333 |
+
<div class="formula">
|
| 1334 |
+
Learning Rule: w_new = w_old + Ξ±(y_true - y_pred)x
|
| 1335 |
+
</div>
|
| 1336 |
+
`,
|
| 1337 |
+
applications: `
|
| 1338 |
+
<div class="info-box">
|
| 1339 |
+
<div class="box-title">π Educational</div>
|
| 1340 |
+
<div class="box-content">
|
| 1341 |
+
Historical importance - first trainable neural model. Perfect for teaching ML fundamentals
|
| 1342 |
</div>
|
| 1343 |
</div>
|
| 1344 |
+
<div class="info-box">
|
| 1345 |
+
<div class="box-title">π¬ Simple Classification</div>
|
| 1346 |
+
<div class="box-content">
|
| 1347 |
+
Linearly separable problems: basic pattern recognition, simple binary decisions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1348 |
</div>
|
| 1349 |
</div>
|
| 1350 |
+
`
|
| 1351 |
+
},
|
| 1352 |
+
"mlp": {
|
| 1353 |
+
overview: `
|
| 1354 |
+
<h3>Multi-Layer Perceptron (MLP)</h3>
|
| 1355 |
+
<p>MLP adds hidden layers between input and output, enabling non-linear decision boundaries and solving the XOR problem that single perceptrons cannot.</p>
|
| 1356 |
+
|
| 1357 |
+
<h3>Why MLPs?</h3>
|
| 1358 |
+
<ul>
|
| 1359 |
+
<li><strong>Universal Approximation:</strong> Can approximate any continuous function</li>
|
| 1360 |
+
<li><strong>Non-Linear Learning:</strong> Solves complex problems</li>
|
| 1361 |
+
<li><strong>Feature Extraction:</strong> Hidden layers learn hierarchical features</li>
|
| 1362 |
+
</ul>
|
| 1363 |
+
|
| 1364 |
+
<div class="callout insight">
|
| 1365 |
+
<div class="callout-title">π‘ The XOR Breakthrough</div>
|
| 1366 |
+
Single perceptron: Cannot solve XOR<br>
|
| 1367 |
+
MLP with 1 hidden layer (2 neurons): Solves XOR!<br>
|
| 1368 |
+
This proves the power of depth.
|
| 1369 |
+
</div>
|
| 1370 |
+
`,
|
| 1371 |
+
concepts: `
|
| 1372 |
+
<h3>Architecture Components</h3>
|
| 1373 |
+
<div class="list-item">
|
| 1374 |
+
<div class="list-num">01</div>
|
| 1375 |
+
<div><strong>Input Layer:</strong> Raw features (no computation)</div>
|
| 1376 |
+
</div>
|
| 1377 |
+
<div class="list-item">
|
| 1378 |
+
<div class="list-num">02</div>
|
| 1379 |
+
<div><strong>Hidden Layers:</strong> Extract progressively abstract features</div>
|
| 1380 |
+
</div>
|
| 1381 |
+
<div class="list-item">
|
| 1382 |
+
<div class="list-num">03</div>
|
| 1383 |
+
<div><strong>Output Layer:</strong> Final predictions</div>
|
| 1384 |
+
</div>
|
| 1385 |
+
`,
|
| 1386 |
+
applications: `
|
| 1387 |
+
<div class="info-box">
|
| 1388 |
+
<div class="box-title">π Tabular Data</div>
|
| 1389 |
+
<div class="box-content">Credit scoring, fraud detection, customer churn, sales forecasting</div>
|
| 1390 |
+
</div>
|
| 1391 |
+
<div class="info-box">
|
| 1392 |
+
<div class="box-title">π Manufacturing</div>
|
| 1393 |
+
<div class="box-content">Quality control, predictive maintenance, demand forecasting</div>
|
| 1394 |
+
</div>
|
| 1395 |
+
`,
|
| 1396 |
+
math: `
|
| 1397 |
+
<h3>Neural Network Forward Pass (Matrix Form)</h3>
|
| 1398 |
+
<p>Vectorization is key to modern deep learning. We process entire layers as matrix multiplications.</p>
|
| 1399 |
+
|
| 1400 |
+
<div class="formula">
|
| 1401 |
+
Layer 1: zβ½ΒΉβΎ = Wβ½ΒΉβΎx + bβ½ΒΉβΎ | aβ½ΒΉβΎ = Ο(zβ½ΒΉβΎ)<br>
|
| 1402 |
+
Layer 2: zβ½Β²βΎ = Wβ½Β²βΎaβ½ΒΉβΎ + bβ½Β²βΎ | aβ½Β²βΎ = Ο(zβ½Β²βΎ)<br>
|
| 1403 |
+
...<br>
|
| 1404 |
+
Layer L: Ε· = Softmax(Wβ½α΄ΈβΎaβ½α΄Έβ»ΒΉβΎ + bβ½α΄ΈβΎ)
|
| 1405 |
+
</div>
|
| 1406 |
|
| 1407 |
+
<h3>Paper & Pain: Dimensionality Audit</h3>
|
| 1408 |
+
<p>Understanding tensor shapes is the #1 skill for debugging neural networks.</p>
|
| 1409 |
+
<div class="list-item">
|
| 1410 |
+
<div class="list-num">01</div>
|
| 1411 |
+
<div><strong>Input x:</strong> [n_features, 1]</div>
|
| 1412 |
+
</div>
|
| 1413 |
+
<div class="list-item">
|
| 1414 |
+
<div class="list-num">02</div>
|
| 1415 |
+
<div><strong>Weights Wβ½ΒΉβΎ:</strong> [n_hidden, n_features]</div>
|
| 1416 |
+
</div>
|
| 1417 |
+
<div class="list-item">
|
| 1418 |
+
<div class="list-num">03</div>
|
| 1419 |
+
<div><strong>Bias bβ½ΒΉβΎ:</strong> [n_hidden, 1]</div>
|
| 1420 |
</div>
|
| 1421 |
|
| 1422 |
+
<div class="callout insight">
|
| 1423 |
+
<div class="callout-title">π Paper & Pain: Solving XOR</div>
|
| 1424 |
+
Input: [0,1], Target: 1<br>
|
| 1425 |
+
Layer 1 (2 neurons):<br>
|
| 1426 |
+
zβ = 10xβ + 10xβ - 5 | aβ = Ο(zβ)<br>
|
| 1427 |
+
zβ = 10xβ + 10xβ - 15 | aβ = Ο(zβ)<br>
|
| 1428 |
+
Layer 2 (1 neuron):<br>
|
| 1429 |
+
y = Ο(20aβ - 20aβ - 10)<br>
|
| 1430 |
+
<strong>Try it on paper!</strong> This specific configuration correctly outputs XOR values.
|
| 1431 |
+
</div>
|
| 1432 |
+
`
|
| 1433 |
+
},
|
| 1434 |
+
"weight-init": {
|
| 1435 |
+
overview: `
|
| 1436 |
+
<h3>Weight Initialization Strategies</h3>
|
| 1437 |
+
<table>
|
| 1438 |
+
<tr>
|
| 1439 |
+
<th>Method</th>
|
| 1440 |
+
<th>Best For</th>
|
| 1441 |
+
<th>Formula</th>
|
| 1442 |
+
</tr>
|
| 1443 |
+
<tr>
|
| 1444 |
+
<td>Xavier/Glorot</td>
|
| 1445 |
+
<td>Sigmoid, Tanh</td>
|
| 1446 |
+
<td>N(0, β(2/(n_in+n_out)))</td>
|
| 1447 |
+
</tr>
|
| 1448 |
+
<tr>
|
| 1449 |
+
<td>He/Kaiming</td>
|
| 1450 |
+
<td>ReLU</td>
|
| 1451 |
+
<td>N(0, β(2/n_in))</td>
|
| 1452 |
+
</tr>
|
| 1453 |
+
</table>
|
| 1454 |
+
|
| 1455 |
+
<div class="callout warning">
|
| 1456 |
+
<div class="callout-title">β οΈ Never Initialize to Zero!</div>
|
| 1457 |
+
All neurons learn identical features (symmetry problem)
|
| 1458 |
+
</div>
|
| 1459 |
+
`,
|
| 1460 |
+
concepts: `
|
| 1461 |
+
<h3>Key Principles</h3>
|
| 1462 |
+
<div class="list-item">
|
| 1463 |
+
<div class="list-num">01</div>
|
| 1464 |
+
<div><strong>Variance Preservation:</strong> Keep activation variance similar across layers</div>
|
| 1465 |
+
</div>
|
| 1466 |
+
<div class="list-item">
|
| 1467 |
+
<div class="list-num">02</div>
|
| 1468 |
+
<div><strong>Symmetry Breaking:</strong> Different weights force different features</div>
|
| 1469 |
+
</div>
|
| 1470 |
+
`,
|
| 1471 |
+
applications: `
|
| 1472 |
+
<div class="info-box">
|
| 1473 |
+
<div class="box-title">π― Critical for Deep Networks</div>
|
| 1474 |
+
<div class="box-content">
|
| 1475 |
+
Proper initialization is essential for training networks >10 layers. Wrong init = training failure
|
| 1476 |
</div>
|
| 1477 |
+
</div>
|
| 1478 |
+
<div class="info-box">
|
| 1479 |
+
<div class="box-title">β‘ Faster Convergence</div>
|
| 1480 |
+
<div class="box-content">
|
| 1481 |
+
Good initialization reduces training time by 2-10Γ, especially with modern optimizers
|
|
|
|
|
|
|
|
|
|
| 1482 |
</div>
|
| 1483 |
</div>
|
| 1484 |
+
`,
|
| 1485 |
+
math: `
|
| 1486 |
+
<h3>The Variance Preservation Principle</h3>
|
| 1487 |
+
<p>To prevent gradients from vanishing or exploding, we want the variance of the activations to remain constant across layers.</p>
|
| 1488 |
+
|
| 1489 |
+
<div class="formula">
|
| 1490 |
+
For a linear layer: y = Ξ£ wα΅’xα΅’<br>
|
| 1491 |
+
Var(y) = Var(Ξ£ wα΅’xα΅’) = Ξ£ Var(wα΅’xα΅’)<br>
|
| 1492 |
+
Assuming w and x are independent with mean 0:<br>
|
| 1493 |
+
Var(wα΅’xα΅’) = E[wα΅’Β²]E[xα΅’Β²] - E[wα΅’]Β²E[xα΅’]Β² = Var(wα΅’)Var(xα΅’)<br>
|
| 1494 |
+
So, Var(y) = n_in Γ Var(w) Γ Var(x)
|
| 1495 |
+
</div>
|
| 1496 |
|
| 1497 |
+
<h3>1. Xavier (Glorot) Initialization</h3>
|
| 1498 |
+
<p>Goal: Var(y) = Var(x) and Var(grad_out) = Var(grad_in)</p>
|
| 1499 |
+
<div class="list-item">
|
| 1500 |
+
<div class="list-num">01</div>
|
| 1501 |
+
<div><strong>Forward Pass:</strong> n_in Γ Var(w) = 1 β Var(w) = 1/n_in</div>
|
| 1502 |
+
</div>
|
| 1503 |
+
<div class="list-item">
|
| 1504 |
+
<div class="list-num">02</div>
|
| 1505 |
+
<div><strong>Backward Pass:</strong> n_out Γ Var(w) = 1 β Var(w) = 1/n_out</div>
|
| 1506 |
+
</div>
|
| 1507 |
+
<div class="list-item">
|
| 1508 |
+
<div class="list-num">03</div>
|
| 1509 |
+
<div><strong>Compromise:</strong> Var(w) = 2 / (n_in + n_out)</div>
|
| 1510 |
+
</div>
|
| 1511 |
+
|
| 1512 |
+
<h3>2. He (Kaiming) Initialization</h3>
|
| 1513 |
+
<p>For ReLU activation, half the neurons are inactive (output 0), which halves the variance. We must compensate.</p>
|
| 1514 |
+
<div class="formula">
|
| 1515 |
+
Var(ReLU(y)) = 1/2 Γ Var(y)<br>
|
| 1516 |
+
To keep Var(ReLU(y)) = Var(x):<br>
|
| 1517 |
+
1/2 Γ n_in Γ Var(w) = 1<br>
|
| 1518 |
+
<strong>Var(w) = 2 / n_in</strong>
|
| 1519 |
+
</div>
|
| 1520 |
+
|
| 1521 |
+
<div class="callout insight">
|
| 1522 |
+
<div class="callout-title">π Paper & Pain Calculation</div>
|
| 1523 |
+
If n_in = 256 and you use ReLU:<br>
|
| 1524 |
+
Weight Std Dev = β(2/256) = β(1/128) β <strong>0.088</strong><br>
|
| 1525 |
+
Initializing with std=1.0 or std=0.01 would cause immediate failure in a deep net!
|
| 1526 |
+
</div>
|
| 1527 |
+
`
|
| 1528 |
+
},
|
| 1529 |
+
"loss": {
|
| 1530 |
+
overview: `
|
| 1531 |
+
<h3>Loss Functions Guide</h3>
|
| 1532 |
+
<table>
|
| 1533 |
+
<tr>
|
| 1534 |
+
<th>Task</th>
|
| 1535 |
+
<th>Loss Function</th>
|
| 1536 |
+
</tr>
|
| 1537 |
+
<tr>
|
| 1538 |
+
<td>Binary Classification</td>
|
| 1539 |
+
<td>Binary Cross-Entropy</td>
|
| 1540 |
+
</tr>
|
| 1541 |
+
<tr>
|
| 1542 |
+
<td>Multi-class</td>
|
| 1543 |
+
<td>Categorical Cross-Entropy</td>
|
| 1544 |
+
</tr>
|
| 1545 |
+
<tr>
|
| 1546 |
+
<td>Regression</td>
|
| 1547 |
+
<td>MSE or MAE</td>
|
| 1548 |
+
</tr>
|
| 1549 |
+
</table>
|
| 1550 |
+
`,
|
| 1551 |
+
concepts: `
|
| 1552 |
+
<h3>Common Loss Functions</h3>
|
| 1553 |
+
<div class="list-item">
|
| 1554 |
+
<div class="list-num">01</div>
|
| 1555 |
+
<div><strong>MSE:</strong> (1/n)Ξ£(y - Ε·)Β² - Penalizes large errors</div>
|
| 1556 |
+
</div>
|
| 1557 |
+
<div class="list-item">
|
| 1558 |
+
<div class="list-num">02</div>
|
| 1559 |
+
<div><strong>Cross-Entropy:</strong> -Ξ£(yΒ·log(Ε·)) - For classification</div>
|
| 1560 |
+
</div>
|
| 1561 |
+
`,
|
| 1562 |
+
applications: `
|
| 1563 |
+
<div class="info-box">
|
| 1564 |
+
<div class="box-title">π― Task-Dependent Selection</div>
|
| 1565 |
+
<div class="box-content">
|
| 1566 |
+
Every ML task needs appropriate loss: classification (cross-entropy), regression (MSE/MAE), ranking (triplet loss)
|
| 1567 |
+
</div>
|
| 1568 |
+
</div>
|
| 1569 |
+
<div class="info-box">
|
| 1570 |
+
<div class="box-title">π Custom Losses</div>
|
| 1571 |
+
<div class="box-content">
|
| 1572 |
+
Business-specific objectives: Focal Loss (imbalanced data), Dice Loss (segmentation), Contrastive Loss (similarity learning)
|
| 1573 |
+
</div>
|
| 1574 |
+
</div>
|
| 1575 |
+
`,
|
| 1576 |
+
math: `
|
| 1577 |
+
<h3>Binary Cross-Entropy (BCE) Derivation</h3>
|
| 1578 |
+
<p>Why do we use logs? BCE is derived from Maximum Likelihood Estimation (MLE) assuming a Bernoulli distribution.</p>
|
| 1579 |
+
|
| 1580 |
+
<div class="formula">
|
| 1581 |
+
L(Ε·, y) = -(y log(Ε·) + (1-y) log(1-Ε·))
|
| 1582 |
+
</div>
|
| 1583 |
+
|
| 1584 |
+
<h3>Paper & Pain: Why not MSE for Classification?</h3>
|
| 1585 |
+
<p>If we use MSE for sigmoid output, the gradient is:</p>
|
| 1586 |
+
<div class="formula">
|
| 1587 |
+
βL/βw = (Ε· - y) <strong>Ο'(z)</strong> x
|
| 1588 |
+
</div>
|
| 1589 |
+
<div class="callout warning">
|
| 1590 |
+
<div class="callout-title">β οΈ The Saturation Problem</div>
|
| 1591 |
+
If the model is very wrong (e.g., target 1, output 0.001), Ο'(z) is near 0. <br>
|
| 1592 |
+
The gradient vanishes, and the model <strong>stops learning!</strong>.
|
| 1593 |
+
</div>
|
| 1594 |
+
|
| 1595 |
+
<h3>The BCE Advantage</h3>
|
| 1596 |
+
<p>When using BCE, the Ο'(z) term cancels out! The gradient becomes:</p>
|
| 1597 |
+
<div class="formula" style="font-size: 1.2rem; color: #00d4ff;">
|
| 1598 |
+
βL/βw = (Ε· - y) x
|
| 1599 |
+
</div>
|
| 1600 |
+
<div class="list-item">
|
| 1601 |
+
<div class="list-num">π‘</div>
|
| 1602 |
+
<div>This is beautiful: the gradient depends <strong>only on the error</strong> (Ε·-y), not on how saturated the neuron is. This enables much faster training.</div>
|
| 1603 |
+
</div>
|
| 1604 |
+
`
|
| 1605 |
+
},
|
| 1606 |
+
"optimizers": {
|
| 1607 |
+
overview: `
|
| 1608 |
+
<h3>Optimizer Selection Guide</h3>
|
| 1609 |
+
<table>
|
| 1610 |
+
<tr>
|
| 1611 |
+
<th>Optimizer</th>
|
| 1612 |
+
<th>When to Use</th>
|
| 1613 |
+
</tr>
|
| 1614 |
+
<tr>
|
| 1615 |
+
<td>Adam/AdamW</td>
|
| 1616 |
+
<td><strong>Default choice</strong> - works 90% of time</td>
|
| 1617 |
+
</tr>
|
| 1618 |
+
<tr>
|
| 1619 |
+
<td>SGD + Momentum</td>
|
| 1620 |
+
<td>CNNs (better final accuracy with patience)</td>
|
| 1621 |
+
</tr>
|
| 1622 |
+
<tr>
|
| 1623 |
+
<td>RMSprop</td>
|
| 1624 |
+
<td>RNNs</td>
|
| 1625 |
+
</tr>
|
| 1626 |
+
</table>
|
| 1627 |
+
|
| 1628 |
+
<div class="formula">
|
| 1629 |
+
Adam: m_t = Ξ²βΒ·m + (1-Ξ²β)Β·βL<br>
|
| 1630 |
+
v_t = Ξ²βΒ·v + (1-Ξ²β)Β·(βL)Β²<br>
|
| 1631 |
+
w = w - Ξ±Β·m_t/β(v_t)
|
| 1632 |
+
</div>
|
| 1633 |
+
`,
|
| 1634 |
+
concepts: `
|
| 1635 |
+
<h3>Optimizer Evolution</h3>
|
| 1636 |
+
<div class="list-item">
|
| 1637 |
+
<div class="list-num">01</div>
|
| 1638 |
+
<div><strong>SGD:</strong> Simple but requires careful learning rate tuning</div>
|
| 1639 |
+
</div>
|
| 1640 |
+
<div class="list-item">
|
| 1641 |
+
<div class="list-num">02</div>
|
| 1642 |
+
<div><strong>Adam:</strong> Adaptive rates + momentum = works out-of-box</div>
|
| 1643 |
+
</div>
|
| 1644 |
+
`,
|
| 1645 |
+
applications: `
|
| 1646 |
+
<div class="info-box">
|
| 1647 |
+
<div class="box-title">π Training Acceleration</div>
|
| 1648 |
+
<div class="box-content">
|
| 1649 |
+
Modern optimizers (Adam) reduce training time by 5-10Γ compared to basic SGD
|
| 1650 |
+
</div>
|
| 1651 |
+
</div>
|
| 1652 |
+
<div class="info-box">
|
| 1653 |
+
<div class="box-title">π― Architecture-Specific</div>
|
| 1654 |
+
<div class="box-content">
|
| 1655 |
+
CNNs: SGD+Momentum | Transformers: AdamW | RNNs: RMSprop | Default: Adam
|
| 1656 |
+
</div>
|
| 1657 |
+
</div>
|
| 1658 |
+
`
|
| 1659 |
+
},
|
| 1660 |
+
"backprop": {
|
| 1661 |
+
overview: `
|
| 1662 |
+
<h3>Backpropagation Algorithm</h3>
|
| 1663 |
+
<p>Backprop efficiently computes gradients by applying the chain rule from output to input, enabling training of deep networks.</p>
|
| 1664 |
+
|
| 1665 |
+
<h3>Why Backpropagation?</h3>
|
| 1666 |
+
<ul>
|
| 1667 |
+
<li><strong>Efficient:</strong> Computes all gradients in single backward pass</li>
|
| 1668 |
+
<li><strong>Scalable:</strong> Works for networks of any depth</li>
|
| 1669 |
+
<li><strong>Automatic:</strong> Modern frameworks do it automatically</li>
|
| 1670 |
+
</ul>
|
| 1671 |
+
`,
|
| 1672 |
+
concepts: `
|
| 1673 |
+
<div class="formula">
|
| 1674 |
+
Chain Rule:<br>
|
| 1675 |
+
βL/βw = βL/βy Γ βy/βz Γ βz/βw<br>
|
| 1676 |
+
<br>
|
| 1677 |
+
For layer l:<br>
|
| 1678 |
+
Ξ΄Λ‘ = (W^(l+1))^T Ξ΄^(l+1) β Ο'(z^l)<br>
|
| 1679 |
+
βL/βW^l = Ξ΄^l (a^(l-1))^T
|
| 1680 |
+
</div>
|
| 1681 |
+
`,
|
| 1682 |
+
applications: `
|
| 1683 |
+
<div class="info-box">
|
| 1684 |
+
<div class="box-title">π§ Universal Training Method</div>
|
| 1685 |
+
<div class="box-content">
|
| 1686 |
+
Every modern neural network uses backprop - from CNNs to Transformers to GANs
|
| 1687 |
+
</div>
|
| 1688 |
+
</div>
|
| 1689 |
+
<div class="info-box">
|
| 1690 |
+
<div class="box-title">π§ Automatic Differentiation</div>
|
| 1691 |
+
<div class="box-content">
|
| 1692 |
+
PyTorch, TensorFlow implement automatic backprop - you define forward pass, framework does backward
|
| 1693 |
+
</div>
|
| 1694 |
+
</div>
|
| 1695 |
+
`,
|
| 1696 |
+
math: `
|
| 1697 |
+
<h3>The 4 Fundamental Equations of Backprop</h3>
|
| 1698 |
+
<p>Backpropagation is essentially the chain rule applied iteratively. We define the error signal Ξ΄ = βL/βz.</p>
|
| 1699 |
+
|
| 1700 |
+
<div class="list-item">
|
| 1701 |
+
<div class="list-num">01</div>
|
| 1702 |
+
<div><strong>Error at Output Layer (L):</strong><br>
|
| 1703 |
+
Ξ΄α΄Έ = ββL β Ο'(zα΄Έ)<br>
|
| 1704 |
+
<span class="formula-caption">Example for MSE: (aα΄Έ - y) β Ο'(zα΄Έ)</span></div>
|
| 1705 |
+
</div>
|
| 1706 |
+
|
| 1707 |
+
<div class="list-item">
|
| 1708 |
+
<div class="list-num">02</div>
|
| 1709 |
+
<div><strong>Error at Layer l (Backwards):</strong><br>
|
| 1710 |
+
Ξ΄Λ‘ = ((WΛ‘βΊΒΉ)α΅ Ξ΄Λ‘βΊΒΉ) β Ο'(zΛ‘)</div>
|
| 1711 |
+
</div>
|
| 1712 |
+
|
| 1713 |
+
<div class="list-item">
|
| 1714 |
+
<div class="list-num">03</div>
|
| 1715 |
+
<div><strong>Gradient w.r.t Bias:</strong><br>
|
| 1716 |
+
βL / βbΛ‘ = Ξ΄Λ‘</div>
|
| 1717 |
+
</div>
|
| 1718 |
+
|
| 1719 |
+
<div class="list-item">
|
| 1720 |
+
<div class="list-num">04</div>
|
| 1721 |
+
<div><strong>Gradient w.r.t Weights:</strong><br>
|
| 1722 |
+
βL / βWΛ‘ = Ξ΄Λ‘ (aΛ‘β»ΒΉ)α΅</div>
|
| 1723 |
+
</div>
|
| 1724 |
+
|
| 1725 |
+
<div class="callout insight">
|
| 1726 |
+
<div class="callout-title">π Paper & Pain Walkthrough</div>
|
| 1727 |
+
Suppose single neuron: z = wx + b, Loss L = (Ο(z) - y)Β²/2<br>
|
| 1728 |
+
1. <strong>Forward:</strong> z=2, a=Ο(2)β0.88, y=1, L=0.007<br>
|
| 1729 |
+
2. <strong>Backward:</strong><br>
|
| 1730 |
+
βL/βa = (a-y) = -0.12<br>
|
| 1731 |
+
βa/βz = Ο(z)(1-Ο(z)) = 0.88 * 0.12 = 0.1056<br>
|
| 1732 |
+
Ξ΄ = βL/βz = -0.12 * 0.1056 = -0.01267<br>
|
| 1733 |
+
<strong>βL/βw = Ξ΄ * x</strong> | <strong>βL/βb = Ξ΄</strong>
|
| 1734 |
+
</div>
|
| 1735 |
+
`
|
| 1736 |
+
},
|
| 1737 |
+
"regularization": {
|
| 1738 |
+
overview: `
|
| 1739 |
+
<h3>Regularization Techniques</h3>
|
| 1740 |
+
<table>
|
| 1741 |
+
<tr>
|
| 1742 |
+
<th>Method</th>
|
| 1743 |
+
<th>How It Works</th>
|
| 1744 |
+
<th>When to Use</th>
|
| 1745 |
+
</tr>
|
| 1746 |
+
<tr>
|
| 1747 |
+
<td>L2 (Ridge)</td>
|
| 1748 |
+
<td>Adds λΣw² to loss</td>
|
| 1749 |
+
<td>Keeps all features, reduces magnitude</td>
|
| 1750 |
+
</tr>
|
| 1751 |
+
<tr>
|
| 1752 |
+
<td>L1 (Lasso)</td>
|
| 1753 |
+
<td>Adds λΣ|w| to loss</td>
|
| 1754 |
+
<td>Feature selection (zeros out weights)</td>
|
| 1755 |
+
</tr>
|
| 1756 |
+
<tr>
|
| 1757 |
+
<td>Dropout</td>
|
| 1758 |
+
<td>Randomly drops neurons (p=0.5 typical)</td>
|
| 1759 |
+
<td><strong>Most effective for deep networks</strong></td>
|
| 1760 |
+
</tr>
|
| 1761 |
+
<tr>
|
| 1762 |
+
<td>Early Stopping</td>
|
| 1763 |
+
<td>Stop when validation loss increases</td>
|
| 1764 |
+
<td>Prevents overfitting during training</td>
|
| 1765 |
+
</tr>
|
| 1766 |
+
<tr>
|
| 1767 |
+
<td>Data Augmentation</td>
|
| 1768 |
+
<td>Artificially expand dataset</td>
|
| 1769 |
+
<td>Computer vision (rotations, flips, crops)</td>
|
| 1770 |
+
</tr>
|
| 1771 |
+
</table>
|
| 1772 |
+
`,
|
| 1773 |
+
applications: `
|
| 1774 |
+
<div class="info-box">
|
| 1775 |
+
<div class="box-title">π― Best Practices</div>
|
| 1776 |
+
<div class="box-content">
|
| 1777 |
+
β’ Start with Dropout (0.5) for hidden layers<br>
|
| 1778 |
+
β’ Add L2 if still overfitting (Ξ»=0.01, 0.001)<br>
|
| 1779 |
+
β’ Always use Early Stopping<br>
|
| 1780 |
+
β’ Data Augmentation for images
|
| 1781 |
+
</div>
|
| 1782 |
+
</div>
|
| 1783 |
+
`
|
| 1784 |
+
},
|
| 1785 |
+
"batch-norm": {
|
| 1786 |
+
overview: `
|
| 1787 |
+
<h3>Batch Normalization</h3>
|
| 1788 |
+
<p>Normalizes layer inputs to have mean=0 and variance=1, stabilizing and accelerating training.</p>
|
| 1789 |
+
|
| 1790 |
+
<div class="callout tip">
|
| 1791 |
+
<div class="callout-title">β
Benefits</div>
|
| 1792 |
+
β’ <strong>Faster Training:</strong> Allows higher learning rates<br>
|
| 1793 |
+
β’ <strong>Reduces Vanishing Gradients:</strong> Better gradient flow<br>
|
| 1794 |
+
β’ <strong>Regularization Effect:</strong> Adds slight noise<br>
|
| 1795 |
+
β’ <strong>Less Sensitive to Init:</strong> Reduces initialization impact
|
| 1796 |
+
</div>
|
| 1797 |
+
`,
|
| 1798 |
+
math: `
|
| 1799 |
+
<h3>The 4 Steps of Batch Normalization</h3>
|
| 1800 |
+
<p>Calculated per mini-batch B = {xβ, ..., xβ}:</p>
|
| 1801 |
+
|
| 1802 |
+
<div class="list-item">
|
| 1803 |
+
<div class="list-num">01</div>
|
| 1804 |
+
<div><strong>Mini-Batch Mean:</strong> ΞΌ_B = (1/m) Ξ£ xα΅’</div>
|
| 1805 |
+
</div>
|
| 1806 |
+
<div class="list-item">
|
| 1807 |
+
<div class="list-num">02</div>
|
| 1808 |
+
<div><strong>Mini-Batch Variance:</strong> ΟΒ²_B = (1/m) Ξ£ (xα΅’ - ΞΌ_B)Β²</div>
|
| 1809 |
+
</div>
|
| 1810 |
+
<div class="list-item">
|
| 1811 |
+
<div class="list-num">03</div>
|
| 1812 |
+
<div><strong>Normalize:</strong> xΜα΅’ = (xα΅’ - ΞΌ_B) / β(ΟΒ²_B + Ξ΅)</div>
|
| 1813 |
+
</div>
|
| 1814 |
+
<div class="list-item">
|
| 1815 |
+
<div class="list-num">04</div>
|
| 1816 |
+
<div><strong>Scale and Shift:</strong> yα΅’ = Ξ³ xΜα΅’ + Ξ²</div>
|
| 1817 |
+
</div>
|
| 1818 |
+
|
| 1819 |
+
<div class="callout insight">
|
| 1820 |
+
<div class="callout-title">π Paper & Pain: Why Ξ³ and Ξ²?</div>
|
| 1821 |
+
If we only normalized to (0,1), we might restrict the representation power of the network. <br>
|
| 1822 |
+
Ξ³ and Ξ² allow the network to <strong>undo</strong> the normalization if that's optimal: <br>
|
| 1823 |
+
If Ξ³ = β(ΟΒ²) and Ξ² = ΞΌ, we get the original data back!
|
| 1824 |
+
</div>
|
| 1825 |
+
`
|
| 1826 |
+
},
|
| 1827 |
+
"cv-intro": {
|
| 1828 |
+
overview: `
|
| 1829 |
+
<h3>Why Computer Vision Needs Special Architectures</h3>
|
| 1830 |
+
<p><strong>Problem:</strong> Images have huge dimensionality</p>
|
| 1831 |
+
<ul>
|
| 1832 |
+
<li>224Γ224 RGB image = 150,528 input features</li>
|
| 1833 |
+
<li>Fully connected layer with 1000 neurons = 150M parameters!</li>
|
| 1834 |
+
<li>Result: Overfitting, slow training, memory issues</li>
|
| 1835 |
+
</ul>
|
| 1836 |
+
|
| 1837 |
+
<h3>Solution: Convolutional Neural Networks</h3>
|
| 1838 |
+
<ul>
|
| 1839 |
+
<li><strong>Weight Sharing:</strong> Same filter applied everywhere (1000x fewer parameters)</li>
|
| 1840 |
+
<li><strong>Local Connectivity:</strong> Neurons see small patches</li>
|
| 1841 |
+
<li><strong>Translation Invariance:</strong> Detect cat anywhere in image</li>
|
| 1842 |
+
</ul>
|
| 1843 |
+
`,
|
| 1844 |
+
concepts: `
|
| 1845 |
+
<h3>Why CNNs Beat Fully Connected</h3>
|
| 1846 |
+
<div class="list-item">
|
| 1847 |
+
<div class="list-num">01</div>
|
| 1848 |
+
<div><strong>Parameter Efficiency:</strong> 1000Γ fewer parameters through weight sharing</div>
|
| 1849 |
+
</div>
|
| 1850 |
+
<div class="list-item">
|
| 1851 |
+
<div class="list-num">02</div>
|
| 1852 |
+
<div><strong>Translation Equivariance:</strong> Same object β same activation regardless of position</div>
|
| 1853 |
+
</div>
|
| 1854 |
+
`,
|
| 1855 |
+
applications: `
|
| 1856 |
+
<div class="info-box">
|
| 1857 |
+
<div class="box-title">πΈ All Computer Vision Tasks</div>
|
| 1858 |
+
<div class="box-content">
|
| 1859 |
+
Image classification, object detection, segmentation, face recognition, OCR, medical imaging
|
| 1860 |
+
</div>
|
| 1861 |
+
</div>
|
| 1862 |
+
`
|
| 1863 |
+
},
|
| 1864 |
+
"pooling": {
|
| 1865 |
+
overview: `
|
| 1866 |
+
<h3>Pooling Layers</h3>
|
| 1867 |
+
<p>Pooling reduces spatial dimensions while retaining important information.</p>
|
| 1868 |
+
|
| 1869 |
+
<table>
|
| 1870 |
+
<tr>
|
| 1871 |
+
<th>Type</th>
|
| 1872 |
+
<th>Operation</th>
|
| 1873 |
+
<th>Use Case</th>
|
| 1874 |
+
</tr>
|
| 1875 |
+
<tr>
|
| 1876 |
+
<td>Max Pooling</td>
|
| 1877 |
+
<td>Take maximum value</td>
|
| 1878 |
+
<td><strong>Most common</strong> - preserves strong activations</td>
|
| 1879 |
+
</tr>
|
| 1880 |
+
<tr>
|
| 1881 |
+
<td>Average Pooling</td>
|
| 1882 |
+
<td>Take average</td>
|
| 1883 |
+
<td>Smoother, less common (used in final layers)</td>
|
| 1884 |
+
</tr>
|
| 1885 |
+
<tr>
|
| 1886 |
+
<td>Global Pooling</td>
|
| 1887 |
+
<td>Pool entire feature map</td>
|
| 1888 |
+
<td>Replace FC layers (reduces parameters)</td>
|
| 1889 |
+
</tr>
|
| 1890 |
+
</table>
|
| 1891 |
+
|
| 1892 |
+
<div class="callout tip">
|
| 1893 |
+
<div class="callout-title">β
Benefits</div>
|
| 1894 |
+
β’ Reduces spatial size (faster computation)<br>
|
| 1895 |
+
β’ Adds translation invariance<br>
|
| 1896 |
+
β’ Prevents overfitting<br>
|
| 1897 |
+
β’ Typical: 2Γ2 window, stride 2 (halves dimensions)
|
| 1898 |
+
</div>
|
| 1899 |
+
`,
|
| 1900 |
+
concepts: `
|
| 1901 |
+
<h3>Pooling Mechanics</h3>
|
| 1902 |
+
<div class="list-item">
|
| 1903 |
+
<div class="list-num">01</div>
|
| 1904 |
+
<div><strong>Downsampling:</strong> Reduces HΓW by pooling factor (typically 2Γ)</div>
|
| 1905 |
+
</div>
|
| 1906 |
+
<div class="list-item">
|
| 1907 |
+
<div class="list-num">02</div>
|
| 1908 |
+
<div><strong>No Learnable Parameters:</strong> Fixed operation (max/average)</div>
|
| 1909 |
+
</div>
|
| 1910 |
+
<div class="formula">
|
| 1911 |
+
Example: 4Γ4 input β 2Γ2 max pooling β 2Γ2 output
|
| 1912 |
+
</div>
|
| 1913 |
+
`,
|
| 1914 |
+
applications: `
|
| 1915 |
+
<div class="info-box">
|
| 1916 |
+
<div class="box-title">π― Standard CNN Component</div>
|
| 1917 |
+
<div class="box-content">
|
| 1918 |
+
Used after conv layers in AlexNet, VGG, and most classic CNNs to progressively reduce spatial dimensions
|
| 1919 |
+
</div>
|
| 1920 |
+
</div>
|
| 1921 |
+
`
|
| 1922 |
+
},
|
| 1923 |
+
"cnn-basics": {
|
| 1924 |
+
overview: `
|
| 1925 |
+
<h3>CNN Architecture Pattern</h3>
|
| 1926 |
+
<div class="formula">
|
| 1927 |
+
Input β [Conv β ReLU β Pool] Γ N β Flatten β FC β Softmax
|
| 1928 |
+
</div>
|
| 1929 |
+
|
| 1930 |
+
<h3>Typical Layering Strategy</h3>
|
| 1931 |
+
<ul>
|
| 1932 |
+
<li><strong>Early Layers:</strong> Detect low-level features (edges, textures) - small filters (3Γ3)</li>
|
| 1933 |
+
<li><strong>Middle Layers:</strong> Combine into patterns, parts - more filters, same size</li>
|
| 1934 |
+
<li><strong>Deep Layers:</strong> High-level concepts (faces, objects) - many filters</li>
|
| 1935 |
+
<li><strong>Final FC Layers:</strong> Classification based on learned features</li>
|
| 1936 |
+
</ul>
|
| 1937 |
+
|
| 1938 |
+
<div class="callout insight">
|
| 1939 |
+
<div class="callout-title">π‘ Filter Progression</div>
|
| 1940 |
+
Layer 1: 32 filters (edges)<br>
|
| 1941 |
+
Layer 2: 64 filters (textures)<br>
|
| 1942 |
+
Layer 3: 128 filters (patterns)<br>
|
| 1943 |
+
Layer 4: 256 filters (parts)<br>
|
| 1944 |
+
Common pattern: double filters after each pooling
|
| 1945 |
+
</div>
|
| 1946 |
+
`,
|
| 1947 |
+
concepts: `
|
| 1948 |
+
<h3>Module Design Principles</h3>
|
| 1949 |
+
<div class="list-item">
|
| 1950 |
+
<div class="list-num">01</div>
|
| 1951 |
+
<div><strong>Spatial Reduction:</strong> Progressively downsample (224β112β56β28...)</div>
|
| 1952 |
+
</div>
|
| 1953 |
+
<div class="list-item">
|
| 1954 |
+
<div class="list-num">02</div>
|
| 1955 |
+
<div><strong>Channel Expansion:</strong> Increase filters as spatial dims decrease</div>
|
| 1956 |
+
</div>
|
| 1957 |
+
`,
|
| 1958 |
+
applications: `
|
| 1959 |
+
<div class="info-box">
|
| 1960 |
+
<div class="box-title">π― All Modern Vision Models</div>
|
| 1961 |
+
<div class="box-content">
|
| 1962 |
+
This pattern forms the backbone of ResNet, MobileNet, EfficientNet - fundamental CNN design
|
| 1963 |
+
</div>
|
| 1964 |
+
</div>
|
| 1965 |
+
`,
|
| 1966 |
+
math: `
|
| 1967 |
+
<h3>1. The Golden Formula for Output Size</h3>
|
| 1968 |
+
<p>Given Input (W), Filter Size (F), Padding (P), and Stride (S):</p>
|
| 1969 |
+
<div class="formula" style="font-size: 1.2rem; text-align: center; margin: 20px 0;">
|
| 1970 |
+
Output Size = β(W - F + 2P) / Sβ + 1
|
| 1971 |
+
</div>
|
| 1972 |
+
|
| 1973 |
+
<h3>2. Parameter Count Calculation</h3>
|
| 1974 |
+
<div class="list-item">
|
| 1975 |
+
<div class="list-num">01</div>
|
| 1976 |
+
<div><strong>Parameters PER Filter:</strong> (F Γ F Γ C_in) + 1 (bias)</div>
|
| 1977 |
+
</div>
|
| 1978 |
+
<div class="list-item">
|
| 1979 |
+
<div class="list-num">02</div>
|
| 1980 |
+
<div><strong>Total Parameters:</strong> N_filters Γ ((F Γ F Γ C_in) + 1)</div>
|
| 1981 |
+
</div>
|
| 1982 |
+
|
| 1983 |
+
<div class="callout insight">
|
| 1984 |
+
<div class="callout-title">π Paper & Pain Calculation</div>
|
| 1985 |
+
<strong>Input:</strong> 224x224x3 | <strong>Layer:</strong> 64 filters of 3x3 | <strong>Stride:</strong> 1 | <strong>Padding:</strong> 1<br>
|
| 1986 |
+
1. <strong>Output Size:</strong> (224 - 3 + 2(1))/1 + 1 = 224 (Same Padding)<br>
|
| 1987 |
+
2. <strong>Params:</strong> 64 * (3 * 3 * 3 + 1) = 64 * 28 = <strong>1,792 parameters</strong><br>
|
| 1988 |
+
3. <strong>FLOPs:</strong> 224 * 224 * 1792 β <strong>90 Million operations</strong> per image!
|
| 1989 |
+
</div>
|
| 1990 |
+
`
|
| 1991 |
+
},
|
| 1992 |
+
"viz-filters": {
|
| 1993 |
+
overview: `
|
| 1994 |
+
<h3>What CNNs Learn</h3>
|
| 1995 |
+
<p>CNN filters automatically learn hierarchical visual features:</p>
|
| 1996 |
+
|
| 1997 |
+
<h3>Layer-by-Layer Visualization</h3>
|
| 1998 |
+
<div class="list-item">
|
| 1999 |
+
<div class="list-num">01</div>
|
| 2000 |
+
<div><strong>Layer 1:</strong> Edges and colors (horizontal, vertical, diagonal lines)</div>
|
| 2001 |
+
</div>
|
| 2002 |
+
<div class="list-item">
|
| 2003 |
+
<div class="list-num">02</div>
|
| 2004 |
+
<div><strong>Layer 2:</strong> Textures and patterns (corners, curves, simple shapes)</div>
|
| 2005 |
+
</div>
|
| 2006 |
+
<div class="list-item">
|
| 2007 |
+
<div class="list-num">03</div>
|
| 2008 |
+
<div><strong>Layer 3:</strong> Object parts (eyes, wheels, windows)</div>
|
| 2009 |
+
</div>
|
| 2010 |
+
<div class="list-item">
|
| 2011 |
+
<div class="list-num">04</div>
|
| 2012 |
+
<div><strong>Layer 4-5:</strong> Whole objects (faces, cars, animals)</div>
|
| 2013 |
+
</div>
|
| 2014 |
+
`,
|
| 2015 |
+
concepts: `
|
| 2016 |
+
<h3>Visualization Techniques</h3>
|
| 2017 |
+
<div class="list-item">
|
| 2018 |
+
<div class="list-num">01</div>
|
| 2019 |
+
<div><strong>Activation Maximization:</strong> Find input that maximizes filter response</div>
|
| 2020 |
+
</div>
|
| 2021 |
+
<div class="list-item">
|
| 2022 |
+
<div class="list-num">02</div>
|
| 2023 |
+
<div><strong>Grad-CAM:</strong> Highlight important regions for predictions</div>
|
| 2024 |
+
</div>
|
| 2025 |
+
`,
|
| 2026 |
+
applications: `
|
| 2027 |
+
<div class="info-box">
|
| 2028 |
+
<div class="box-title">π Model Interpretability</div>
|
| 2029 |
+
<div class="box-content">
|
| 2030 |
+
Understanding what CNNs learn helps debug failures, build trust, and improve architecture design
|
| 2031 |
+
</div>
|
| 2032 |
+
</div>
|
| 2033 |
+
<div class="info-box">
|
| 2034 |
+
<div class="box-title">π¨ Art & Style Transfer</div>
|
| 2035 |
+
<div class="box-content">
|
| 2036 |
+
Filter visualizations inspired neural style transfer (VGG features)
|
| 2037 |
+
</div>
|
| 2038 |
+
</div>
|
| 2039 |
+
`
|
| 2040 |
+
},
|
| 2041 |
+
"lenet": {
|
| 2042 |
+
overview: `
|
| 2043 |
+
<h3>LeNet-5 (1998) - The Pioneer</h3>
|
| 2044 |
+
<p>First successful CNN for digit recognition (MNIST). Introduced the Conv β Pool β Conv β Pool pattern still used today.</p>
|
| 2045 |
+
|
| 2046 |
+
<h3>Architecture</h3>
|
| 2047 |
+
<div class="formula">
|
| 2048 |
+
Input 32Γ32 β Conv(6 filters, 5Γ5) β AvgPool β Conv(16 filters, 5Γ5) β AvgPool β FC(120) β FC(84)β FC(10)
|
| 2049 |
+
</div>
|
| 2050 |
+
|
| 2051 |
+
<div class="callout insight">
|
| 2052 |
+
<div class="callout-title">π Historical Impact</div>
|
| 2053 |
+
β’ Used by US Postal Service for zip code recognition<br>
|
| 2054 |
+
β’ Proved CNNs work for real-world tasks<br>
|
| 2055 |
+
β’ Template for modern architectures
|
| 2056 |
+
</div>
|
| 2057 |
+
`,
|
| 2058 |
+
concepts: `
|
| 2059 |
+
<h3>Key Innovations</h3>
|
| 2060 |
+
<div class="list-item">
|
| 2061 |
+
<div class="list-num">01</div>
|
| 2062 |
+
<div><strong>Layered Architecture:</strong> Hierarchical feature extraction</div>
|
| 2063 |
+
</div>
|
| 2064 |
+
<div class="list-item">
|
| 2065 |
+
<div class="list-num">02</div>
|
| 2066 |
+
<div><strong>Shared Weights:</strong> Convolutional parameter sharing</div>
|
| 2067 |
+
</div>
|
| 2068 |
+
`,
|
| 2069 |
+
applications: `
|
| 2070 |
+
<div class="info-box">
|
| 2071 |
+
<div class="box-title">βοΈ Handwriting Recognition</div>
|
| 2072 |
+
<div class="box-content">
|
| 2073 |
+
USPS mail sorting, check processing, form digitization
|
| 2074 |
+
</div>
|
| 2075 |
+
</div>
|
| 2076 |
+
<div class="info-box">
|
| 2077 |
+
<div class="box-title">π Educational Foundation</div>
|
| 2078 |
+
<div class="box-content">
|
| 2079 |
+
Perfect starting point for learning CNNs - simple enough to understand, complex enough to be useful
|
| 2080 |
+
</div>
|
| 2081 |
+
</div>
|
| 2082 |
+
`
|
| 2083 |
+
},
|
| 2084 |
+
"alexnet": {
|
| 2085 |
+
overview: `
|
| 2086 |
+
<h3>AlexNet (2012) - The Deep Learning Revolution</h3>
|
| 2087 |
+
<p>Won ImageNet 2012 by huge margin (15.3% vs 26.2% error), igniting the deep learning revolution.</p>
|
| 2088 |
+
|
| 2089 |
+
<h3>Key Innovations</h3>
|
| 2090 |
+
<ul>
|
| 2091 |
+
<li><strong>ReLU Activation:</strong> Faster training than sigmoid/tanh</li>
|
| 2092 |
+
<li><strong>Dropout:</strong> Prevents overfitting (p=0.5)</li>
|
| 2093 |
+
<li><strong>Data Augmentation:</strong> Random crops/flips</li>
|
| 2094 |
+
<li><strong>GPU Training:</strong> Used 2 GTX580 GPUs</li>
|
| 2095 |
+
<li><strong>Deep:</strong> 8 layers (5 conv + 3 FC), 60M parameters</li>
|
| 2096 |
+
</ul>
|
| 2097 |
+
|
| 2098 |
+
<div class="callout tip">
|
| 2099 |
+
<div class="callout-title">π‘ Why So Important?</div>
|
| 2100 |
+
First to show that deeper networks + more data + GPU compute = breakthrough performance
|
| 2101 |
+
</div>
|
| 2102 |
+
`,
|
| 2103 |
+
concepts: `
|
| 2104 |
+
<h3>Technical Contributions</h3>
|
| 2105 |
+
<div class="list-item">
|
| 2106 |
+
<div class="list-num">01</div>
|
| 2107 |
+
<div><strong>ReLU:</strong> Solved vanishing gradients, enabled deeper networks</div>
|
| 2108 |
+
</div>
|
| 2109 |
+
<div class="list-item">
|
| 2110 |
+
<div class="list-num">02</div>
|
| 2111 |
+
<div><strong>Dropout:</strong> First major regularization for deep nets</div>
|
| 2112 |
+
</div>
|
| 2113 |
+
`,
|
| 2114 |
+
applications: `
|
| 2115 |
+
<div class="info-box">
|
| 2116 |
+
<div class="box-title">π― ImageNet Challenge</div>
|
| 2117 |
+
<div class="box-content">
|
| 2118 |
+
Shattered records on 1000-class classification, proving deep learning superiority
|
| 2119 |
+
</div>
|
| 2120 |
+
</div>
|
| 2121 |
+
<div class="info-box">
|
| 2122 |
+
<div class="box-title">π Industry Catalyst</div>
|
| 2123 |
+
<div class="box-content">
|
| 2124 |
+
Sparked AI renaissance - Google, Facebook, Microsoft pivoted to deep learning after AlexNet
|
| 2125 |
+
</div>
|
| 2126 |
+
</div>
|
| 2127 |
+
`
|
| 2128 |
+
},
|
| 2129 |
+
"vgg": {
|
| 2130 |
+
overview: `
|
| 2131 |
+
<h3>VGGNet (2014) - The Power of Depth</h3>
|
| 2132 |
+
<p>VGG showed that depth matters - 16-19 layers using only small 3Γ3 filters.</p>
|
| 2133 |
+
|
| 2134 |
+
<h3>Key Insight: Stacking Small Filters</h3>
|
| 2135 |
+
<p>Two 3Γ3 conv layers = same receptive field as one 5Γ5, but:</p>
|
| 2136 |
+
<ul>
|
| 2137 |
+
<li><strong>Fewer Parameters:</strong> 2Γ(3Β²) = 18 vs 5Β² = 25</li>
|
| 2138 |
+
<li><strong>More Non-linearity:</strong> Two ReLUs instead of one</li>
|
| 2139 |
+
<li><strong>Deeper Network:</strong> Better feature learning</li>
|
| 2140 |
+
</ul>
|
| 2141 |
+
|
| 2142 |
+
<div class="callout warning">
|
| 2143 |
+
<div class="callout-title">β οΈ Limitation</div>
|
| 2144 |
+
138M parameters (VGG-16) - very memory intensive for deployment
|
| 2145 |
+
</div>
|
| 2146 |
+
`
|
| 2147 |
+
},
|
| 2148 |
+
"resnet": {
|
| 2149 |
+
overview: `
|
| 2150 |
+
<h3>ResNet (2015) - Residual Connections</h3>
|
| 2151 |
+
<p><strong>Problem:</strong> Very deep networks (>20 layers) had degradation - training accuracy got worse!</p>
|
| 2152 |
+
|
| 2153 |
+
<h3>Solution: Skip Connections</h3>
|
| 2154 |
+
<div class="formula">
|
| 2155 |
+
Instead of learning H(x), learn residual F(x) = H(x) - x<br>
|
| 2156 |
+
Output: y = F(x) + x (shortcut connection)
|
| 2157 |
+
</div>
|
| 2158 |
+
|
| 2159 |
+
<h3>Why Skip Connections Work</h3>
|
| 2160 |
+
<ul>
|
| 2161 |
+
<li><strong>Gradient Flow:</strong> Gradients flow directly through shortcuts</li>
|
| 2162 |
+
<li><strong>Identity Mapping:</strong> Easy to learn identity (just set F(x)=0)</li>
|
| 2163 |
+
<li><strong>Feature Reuse:</strong> Earlier features directly available to later layers</li>
|
| 2164 |
+
</ul>
|
| 2165 |
+
|
| 2166 |
+
<div class="callout tip">
|
| 2167 |
+
<div class="callout-title">π Impact</div>
|
| 2168 |
+
β’ Enabled training of 152-layer networks (even 1000+ layers)<br>
|
| 2169 |
+
β’ Won ImageNet 2015<br>
|
| 2170 |
+
β’ Skip connections now used everywhere (U-Net, Transformers, etc.)
|
| 2171 |
+
</div>
|
| 2172 |
+
`
|
| 2173 |
+
},
|
| 2174 |
+
"inception": {
|
| 2175 |
+
overview: `
|
| 2176 |
+
<h3>Inception/GoogLeNet (2014) - Going Wider</h3>
|
| 2177 |
+
<p>Instead of going deeper, Inception modules go wider - using multiple filter sizes in parallel.</p>
|
| 2178 |
+
|
| 2179 |
+
<h3>Inception Module</h3>
|
| 2180 |
+
<div class="formula">
|
| 2181 |
+
Input β [1Γ1 conv] β [3Γ3 conv] β [5Γ5 conv] β [3Γ3 pool] β Concatenate
|
| 2182 |
+
</div>
|
| 2183 |
+
|
| 2184 |
+
<h3>Key Innovation: 1Γ1 Convolutions</h3>
|
| 2185 |
+
<ul>
|
| 2186 |
+
<li><strong>Dimensionality Reduction:</strong> Reduce channels before expensive 3Γ3, 5Γ5</li>
|
| 2187 |
+
<li><strong>Non-linearity:</strong> Add extra ReLU</li>
|
| 2188 |
+
<li><strong>Bottleneck Design:</strong> Reduces FLOPs by 10Γ</li>
|
| 2189 |
+
</ul>
|
| 2190 |
+
|
| 2191 |
+
<div class="callout insight">
|
| 2192 |
+
<div class="callout-title">π‘ Efficiency</div>
|
| 2193 |
+
22 layers but only 5M parameters (27Γ less than AlexNet!)
|
| 2194 |
+
</div>
|
| 2195 |
+
`
|
| 2196 |
+
},
|
| 2197 |
+
"mobilenet": {
|
| 2198 |
+
overview: `
|
| 2199 |
+
<h3>MobileNet - CNNs for Mobile Devices</h3>
|
| 2200 |
+
<p>Designed for mobile/embedded vision using depthwise separable convolutions.</p>
|
| 2201 |
+
|
| 2202 |
+
<h3>Depthwise Separable Convolution</h3>
|
| 2203 |
+
<div class="formula">
|
| 2204 |
+
Standard Conv = Depthwise Conv + Pointwise (1Γ1) Conv
|
| 2205 |
+
</div>
|
| 2206 |
+
|
| 2207 |
+
<h3>Computation Reduction</h3>
|
| 2208 |
+
<table>
|
| 2209 |
+
<tr>
|
| 2210 |
+
<th>Method</th>
|
| 2211 |
+
<th>Parameters</th>
|
| 2212 |
+
<th>FLOPs</th>
|
| 2213 |
+
</tr>
|
| 2214 |
+
<tr>
|
| 2215 |
+
<td>Standard 3Γ3 Conv</td>
|
| 2216 |
+
<td>3Γ3ΓC_inΓC_out</td>
|
| 2217 |
+
<td>High</td>
|
| 2218 |
+
</tr>
|
| 2219 |
+
<tr>
|
| 2220 |
+
<td>Depthwise Separable</td>
|
| 2221 |
+
<td>3Γ3ΓC_in + C_inΓC_out</td>
|
| 2222 |
+
<td><strong>8-9Γ less!</strong></td>
|
| 2223 |
+
</tr>
|
| 2224 |
+
</table>
|
| 2225 |
+
|
| 2226 |
+
<div class="callout tip">
|
| 2227 |
+
<div class="callout-title">β
Applications</div>
|
| 2228 |
+
β’ Real-time mobile apps (camera filters, AR)<br>
|
| 2229 |
+
β’ Edge devices (drones, IoT)<br>
|
| 2230 |
+
β’ Latency-critical systems<br>
|
| 2231 |
+
β’ Good accuracy with 10-20Γ speedup
|
| 2232 |
+
</div>
|
| 2233 |
+
`
|
| 2234 |
+
},
|
| 2235 |
+
"transfer-learning": {
|
| 2236 |
+
overview: `
|
| 2237 |
+
<h3>Transfer Learning - Don't Train from Scratch!</h3>
|
| 2238 |
+
<p>Use pre-trained models (ImageNet) as feature extractors for your custom task.</p>
|
| 2239 |
+
|
| 2240 |
+
<h3>Two Strategies</h3>
|
| 2241 |
+
<table>
|
| 2242 |
+
<tr>
|
| 2243 |
+
<th>Approach</th>
|
| 2244 |
+
<th>When to Use</th>
|
| 2245 |
+
<th>How</th>
|
| 2246 |
+
</tr>
|
| 2247 |
+
<tr>
|
| 2248 |
+
<td>Feature Extraction</td>
|
| 2249 |
+
<td><strong>Small dataset</strong> (<10K images)</td>
|
| 2250 |
+
<td>Freeze all layers, train only final FC layer</td>
|
| 2251 |
+
</tr>
|
| 2252 |
+
<tr>
|
| 2253 |
+
<td>Fine-tuning</td>
|
| 2254 |
+
<td><strong>Medium dataset</strong> (10K-100K)</td>
|
| 2255 |
+
<td>Freeze early layers, train last few + FC</td>
|
| 2256 |
+
</tr>
|
| 2257 |
+
<tr>
|
| 2258 |
+
<td>Full Training</td>
|
| 2259 |
+
<td><strong>Large dataset</strong> (>1M images)</td>
|
| 2260 |
+
<td>Use pre-trained as initialization, train all</td>
|
| 2261 |
+
</tr>
|
| 2262 |
+
</table>
|
| 2263 |
+
|
| 2264 |
+
<div class="callout tip">
|
| 2265 |
+
<div class="callout-title">π‘ Best Practices</div>
|
| 2266 |
+
β’ Use pre-trained models when dataset < 100K images<br>
|
| 2267 |
+
β’ Start with low learning rate (1e-4) for fine-tuning<br>
|
| 2268 |
+
β’ Popular backbones: ResNet50, EfficientNet, ViT
|
| 2269 |
+
</div>
|
| 2270 |
+
`
|
| 2271 |
+
},
|
| 2272 |
+
"localization": {
|
| 2273 |
+
overview: `
|
| 2274 |
+
<h3>Object Localization</h3>
|
| 2275 |
+
<p>Predict both class and bounding box for a single object in image.</p>
|
| 2276 |
+
|
| 2277 |
+
<h3>Multi-Task Loss</h3>
|
| 2278 |
+
<div class="formula">
|
| 2279 |
+
Total Loss = L_classification + Ξ» Γ L_bbox<br>
|
| 2280 |
+
<br>
|
| 2281 |
+
Where:<br>
|
| 2282 |
+
L_classification = Cross-Entropy<br>
|
| 2283 |
+
L_bbox = Smooth L1 or IoU loss<br>
|
| 2284 |
+
Ξ» = balance term (typically 1-10)
|
| 2285 |
+
</div>
|
| 2286 |
+
|
| 2287 |
+
<h3>Bounding Box Representation</h3>
|
| 2288 |
+
<ul>
|
| 2289 |
+
<li><strong>Option 1:</strong> (x_min, y_min, x_max, y_max)</li>
|
| 2290 |
+
<li><strong>Option 2:</strong> (x_center, y_center, width, height) β Most common</li>
|
| 2291 |
+
</ul>
|
| 2292 |
+
`
|
| 2293 |
+
},
|
| 2294 |
+
"rcnn": {
|
| 2295 |
+
overview: `
|
| 2296 |
+
<h3>R-CNN Family Evolution</h3>
|
| 2297 |
+
<table>
|
| 2298 |
+
<tr>
|
| 2299 |
+
<th>Model</th>
|
| 2300 |
+
<th>Year</th>
|
| 2301 |
+
<th>Speed (FPS)</th>
|
| 2302 |
+
<th>Key Innovation</th>
|
| 2303 |
+
</tr>
|
| 2304 |
+
<tr>
|
| 2305 |
+
<td>R-CNN</td>
|
| 2306 |
+
<td>2014</td>
|
| 2307 |
+
<td>0.05</td>
|
| 2308 |
+
<td>Selective Search + CNN features</td>
|
| 2309 |
+
</tr>
|
| 2310 |
+
<tr>
|
| 2311 |
+
<td>Fast R-CNN</td>
|
| 2312 |
+
<td>2015</td>
|
| 2313 |
+
<td>0.5</td>
|
| 2314 |
+
<td>RoI Pooling (share conv features)</td>
|
| 2315 |
+
</tr>
|
| 2316 |
+
<tr>
|
| 2317 |
+
<td>Faster R-CNN</td>
|
| 2318 |
+
<td>2015</td>
|
| 2319 |
+
<td>7</td>
|
| 2320 |
+
<td>Region Proposal Network (RPN)</td>
|
| 2321 |
+
</tr>
|
| 2322 |
+
<tr>
|
| 2323 |
+
<td>Mask R-CNN</td>
|
| 2324 |
+
<td>2017</td>
|
| 2325 |
+
<td>5</td>
|
| 2326 |
+
<td>+ Instance Segmentation masks</td>
|
| 2327 |
+
</tr>
|
| 2328 |
+
</table>
|
| 2329 |
+
|
| 2330 |
+
<div class="callout tip">
|
| 2331 |
+
<div class="callout-title">π‘ When to Use</div>
|
| 2332 |
+
Faster R-CNN: Best accuracy for detection (not real-time)<br>
|
| 2333 |
+
Mask R-CNN: Detection + instance segmentation
|
| 2334 |
+
</div>
|
| 2335 |
+
`
|
| 2336 |
+
},
|
| 2337 |
+
"ssd": {
|
| 2338 |
+
overview: `
|
| 2339 |
+
<h3>SSD (Single Shot MultiBox Detector)</h3>
|
| 2340 |
+
<p>Balances speed and accuracy by predicting boxes at multiple scales.</p>
|
| 2341 |
+
|
| 2342 |
+
<h3>Key Ideas</h3>
|
| 2343 |
+
<ul>
|
| 2344 |
+
<li><strong>Multi-Scale:</strong> Predictions from different layers (early = small objects, deep = large)</li>
|
| 2345 |
+
<li><strong>Default Boxes (Anchors):</strong> Pre-defined boxes of various aspects ratios</li>
|
| 2346 |
+
<li><strong>Single Pass:</strong> No separate region proposal step</li>
|
| 2347 |
+
</ul>
|
| 2348 |
+
|
| 2349 |
+
<div class="callout insight">
|
| 2350 |
+
<div class="callout-title">π Performance</div>
|
| 2351 |
+
SSD300: 59 FPS, 74.3% mAP<br>
|
| 2352 |
+
SSD512: 22 FPS, 76.8% mAP<br>
|
| 2353 |
+
<br>
|
| 2354 |
+
Sweet spot between YOLO (faster) and Faster R-CNN (more accurate)
|
| 2355 |
+
</div>
|
| 2356 |
+
`
|
| 2357 |
+
},
|
| 2358 |
+
"semantic-seg": {
|
| 2359 |
+
overview: `
|
| 2360 |
+
<h3>Semantic Segmentation</h3>
|
| 2361 |
+
<p>Classify every pixel in the image (pixel-wise classification).</p>
|
| 2362 |
+
|
| 2363 |
+
<h3>Popular Architectures</h3>
|
| 2364 |
+
<table>
|
| 2365 |
+
<tr>
|
| 2366 |
+
<th>Model</th>
|
| 2367 |
+
<th>Key Feature</th>
|
| 2368 |
+
</tr>
|
| 2369 |
+
<tr>
|
| 2370 |
+
<td>FCN</td>
|
| 2371 |
+
<td>Fully Convolutional (no FC layers)</td>
|
| 2372 |
+
</tr>
|
| 2373 |
+
<tr>
|
| 2374 |
+
<td>U-Net</td>
|
| 2375 |
+
<td>Skip connections from encoder to decoder</td>
|
| 2376 |
+
</tr>
|
| 2377 |
+
<tr>
|
| 2378 |
+
<td>DeepLab</td>
|
| 2379 |
+
<td>Atrous (dilated) convolutions + ASPP</td>
|
| 2380 |
+
</tr>
|
| 2381 |
+
</table>
|
| 2382 |
+
|
| 2383 |
+
<div class="formula">
|
| 2384 |
+
U-Net Pattern:<br>
|
| 2385 |
+
Input β Encoder (downsample) β Bottleneck β Decoder (upsample) β Pixel-wise Output<br>
|
| 2386 |
+
With skip connections from encoder to decoder at each level
|
| 2387 |
+
</div>
|
| 2388 |
+
`,
|
| 2389 |
+
applications: `
|
| 2390 |
+
<div class="info-box">
|
| 2391 |
+
<div class="box-title">π₯ Medical Imaging</div>
|
| 2392 |
+
<div class="box-content">Tumor segmentation, organ delineation, cell analysis</div>
|
| 2393 |
+
</div>
|
| 2394 |
+
<div class="info-box">
|
| 2395 |
+
<div class="box-title">π Autonomous Driving</div>
|
| 2396 |
+
<div class="box-content">Road segmentation, free space detection, drivable area</div>
|
| 2397 |
+
</div>
|
| 2398 |
+
`
|
| 2399 |
+
},
|
| 2400 |
+
"instance-seg": {
|
| 2401 |
+
overview: `
|
| 2402 |
+
<h3>Instance Segmentation</h3>
|
| 2403 |
+
<p>Detect AND segment each individual object (combines object detection + semantic segmentation).</p>
|
| 2404 |
+
|
| 2405 |
+
<h3>Difference from Semantic Segmentation</h3>
|
| 2406 |
+
<ul>
|
| 2407 |
+
<li><strong>Semantic:</strong> All "person" pixels get same label</li>
|
| 2408 |
+
<li><strong>Instance:</strong> Person #1, Person #2, Person #3 (separate instances)</li>
|
| 2409 |
+
</ul>
|
| 2410 |
+
|
| 2411 |
+
<h3>Main Approach: Mask R-CNN</h3>
|
| 2412 |
+
<div class="formula">
|
| 2413 |
+
Faster R-CNN + Segmentation Branch<br>
|
| 2414 |
+
<br>
|
| 2415 |
+
For each RoI:<br>
|
| 2416 |
+
1. Bounding box regression<br>
|
| 2417 |
+
2. Class prediction<br>
|
| 2418 |
+
3. <strong>Binary mask for the object</strong>
|
| 2419 |
+
</div>
|
| 2420 |
+
`
|
| 2421 |
+
},
|
| 2422 |
+
"face-recog": {
|
| 2423 |
+
overview: `
|
| 2424 |
+
<h3>Face Recognition with Siamese Networks</h3>
|
| 2425 |
+
<p>Learn similarity between faces using metric learning instead of classification.</p>
|
| 2426 |
+
|
| 2427 |
+
<h3>Triplet Loss Training</h3>
|
| 2428 |
+
<div class="formula">
|
| 2429 |
+
Loss = max(||f(A) - f(P)||Β² - ||f(A) - f(N)||Β² + margin, 0)<br>
|
| 2430 |
+
<br>
|
| 2431 |
+
Where:<br>
|
| 2432 |
+
A = Anchor (reference face)<br>
|
| 2433 |
+
P = Positive (same person)<br>
|
| 2434 |
+
N = Negative (different person)<br>
|
| 2435 |
+
margin = minimum separation (e.g., 0.2)
|
| 2436 |
+
</div>
|
| 2437 |
+
|
| 2438 |
+
<div class="callout tip">
|
| 2439 |
+
<div class="callout-title">π‘ One-Shot Learning</div>
|
| 2440 |
+
After training, recognize new people with just 1-2 photos!<br>
|
| 2441 |
+
No retraining needed - just compare embeddings.
|
| 2442 |
+
</div>
|
| 2443 |
+
`,
|
| 2444 |
+
applications: `
|
| 2445 |
+
<div class="info-box">
|
| 2446 |
+
<div class="box-title">π± Phone Unlock</div>
|
| 2447 |
+
<div class="box-content">Face ID, biometric authentication</div>
|
| 2448 |
+
</div>
|
| 2449 |
+
<div class="info-box">
|
| 2450 |
+
<div class="box-title">π Security</div>
|
| 2451 |
+
<div class="box-content">Access control, surveillance, identity verification</div>
|
| 2452 |
+
</div>
|
| 2453 |
+
`
|
| 2454 |
+
},
|
| 2455 |
+
"autoencoders": {
|
| 2456 |
+
overview: `
|
| 2457 |
+
<h3>Autoencoders</h3>
|
| 2458 |
+
<p>Unsupervised learning to compress data into latent representation and reconstruct it.</p>
|
| 2459 |
+
|
| 2460 |
+
<h3>Architecture</h3>
|
| 2461 |
+
<div class="formula">
|
| 2462 |
+
Input β Encoder β Latent Code (bottleneck) β Decoder β Reconstruction<br>
|
| 2463 |
+
<br>
|
| 2464 |
+
Loss = ||Input - Reconstruction||Β² (MSE)
|
| 2465 |
+
</div>
|
| 2466 |
+
|
| 2467 |
+
<h3>Variants</h3>
|
| 2468 |
+
<ul>
|
| 2469 |
+
<li><strong>Vanilla:</strong> Basic autoencoder</li>
|
| 2470 |
+
<li><strong>Denoising:</strong> Input corrupted, output clean (learns robust features)</li>
|
| 2471 |
+
<li><strong>Variational (VAE):</strong> Probabilistic latent space (for generation)</li>
|
| 2472 |
+
<li><strong>Sparse:</strong> Encourage sparse activations</li>
|
| 2473 |
+
</ul>
|
| 2474 |
+
`,
|
| 2475 |
+
applications: `
|
| 2476 |
+
<div class="info-box">
|
| 2477 |
+
<div class="box-title">ποΈ Compression</div>
|
| 2478 |
+
<div class="box-content">Dimensionality reduction, data compression, feature extraction</div>
|
| 2479 |
+
</div>
|
| 2480 |
+
<div class="info-box">
|
| 2481 |
+
<div class="box-title">π Anomaly Detection</div>
|
| 2482 |
+
<div class="box-content">High reconstruction error = anomaly (fraud detection, defect detection)</div>
|
| 2483 |
+
</div>
|
| 2484 |
+
`
|
| 2485 |
+
},
|
| 2486 |
+
"gans": {
|
| 2487 |
+
overview: `
|
| 2488 |
+
<h3>GANs (Generative Adversarial Networks)</h3>
|
| 2489 |
+
<p>Two networks compete: Generator creates fake data, Discriminator tries to detect fakes.</p>
|
| 2490 |
+
|
| 2491 |
+
<h3>The GAN Game</h3>
|
| 2492 |
+
<div class="formula">
|
| 2493 |
+
Generator: Creates fake images from random noise<br>
|
| 2494 |
+
Goal: Fool discriminator<br>
|
| 2495 |
+
<br>
|
| 2496 |
+
Discriminator: Classifies real vs fake<br>
|
| 2497 |
+
Goal: Correctly identify fakes<br>
|
| 2498 |
+
<br>
|
| 2499 |
+
Minimax Loss:<br>
|
| 2500 |
+
min_G max_D E[log D(x)] + E[log(1 - D(G(z)))]
|
| 2501 |
+
</div>
|
| 2502 |
+
|
| 2503 |
+
<div class="callout warning">
|
| 2504 |
+
<div class="callout-title">β οΈ Training Challenges</div>
|
| 2505 |
+
β’ Mode collapse (Generator produces limited variety)<br>
|
| 2506 |
+
β’ Training instability (careful tuning needed)<br>
|
| 2507 |
+
β’ Convergence issues<br>
|
| 2508 |
+
β’ Solutions: Wasserstein GAN, Spectral Normalization, StyleGAN improvements
|
| 2509 |
+
</div>
|
| 2510 |
+
`,
|
| 2511 |
+
applications: `
|
| 2512 |
+
<div class="info-box">
|
| 2513 |
+
<div class="box-title">π¨ Image Generation</div>
|
| 2514 |
+
<div class="box-content">
|
| 2515 |
+
<strong>StyleGAN:</strong> Photorealistic faces, art generation<br>
|
| 2516 |
+
<strong>DCGAN:</strong> Bedroom images, object generation
|
| 2517 |
+
</div>
|
| 2518 |
+
</div>
|
| 2519 |
+
`,
|
| 2520 |
+
math: `
|
| 2521 |
+
<h3>The Minimax Game Objective</h3>
|
| 2522 |
+
<p>The original GAN objective from Ian Goodfellow (2014) is a zero-sum game between Discriminator (D) and Generator (G).</p>
|
| 2523 |
+
|
| 2524 |
+
<div class="formula" style="font-size: 1.1rem; padding: 20px;">
|
| 2525 |
+
min_G max_D V(D, G) = E_xβΌp_data[log D(x)] + E_zβΌp_z[log(1 - D(G(z)))]
|
| 2526 |
+
</div>
|
| 2527 |
+
|
| 2528 |
+
<h3>Paper & Pain: Finding the Optimal Discriminator</h3>
|
| 2529 |
+
<p>For a fixed Generator, the optimal Discriminator D* is:</p>
|
| 2530 |
+
<div class="formula">
|
| 2531 |
+
D*(x) = p_data(x) / (p_data(x) + p_g(x))
|
| 2532 |
+
</div>
|
| 2533 |
+
|
| 2534 |
+
<div class="callout insight">
|
| 2535 |
+
<div class="callout-title">π Theoretical Insight</div>
|
| 2536 |
+
When the Discriminator is optimal, the Generator's task is essentially to minimize the <strong>Jensen-Shannon Divergence (JSD)</strong> between the data distribution and the model distribution. <br>
|
| 2537 |
+
<strong>Problem:</strong> JSD is "flat" when distributions don't overlap, leading to vanishing gradients. This is why <strong>Wasserstein GAN (WGAN)</strong> was inventedβusing Earth Mover's distance instead!
|
| 2538 |
+
</div>
|
| 2539 |
+
|
| 2540 |
+
<h3>Generator Gradient Problem</h3>
|
| 2541 |
+
<p>Early in training, D(G(z)) is near 0. The term log(1-D(G(z))) has a very small gradient. </p>
|
| 2542 |
+
<div class="list-item">
|
| 2543 |
+
<div class="list-num">π‘</div>
|
| 2544 |
+
<div><strong>Heuristic Fix:</strong> Instead of minimizing log(1-D(G(z))), we maximize <strong>log D(G(z))</strong>. This provides much stronger gradients early on!</div>
|
| 2545 |
+
</div>
|
| 2546 |
+
`
|
| 2547 |
+
},
|
| 2548 |
+
"diffusion": {
|
| 2549 |
+
overview: `
|
| 2550 |
+
<h3>Diffusion Models</h3>
|
| 2551 |
+
<p>Learn to reverse a gradual noising process, generating high-quality images.</p>
|
| 2552 |
+
|
| 2553 |
+
<h3>How Diffusion Works</h3>
|
| 2554 |
+
<div class="list-item">
|
| 2555 |
+
<div class="list-num">01</div>
|
| 2556 |
+
<div><strong>Forward Process:</strong> Gradually add Gaussian noise over T steps (xβ β xβ β ... β x_T = pure noise)</div>
|
| 2557 |
+
</div>
|
| 2558 |
+
<div class="list-item">
|
| 2559 |
+
<div class="list-num">02</div>
|
| 2560 |
+
<div><strong>Reverse Process:</strong> Train neural network to denoise (x_T β x_{T-1} β ... β xβ = clean image)</div>
|
| 2561 |
+
</div>
|
| 2562 |
+
<div class="list-item">
|
| 2563 |
+
<div class="list-num">03</div>
|
| 2564 |
+
<div><strong>Generation:</strong> Start from random noise, iteratively denoise T steps</div>
|
| 2565 |
+
</div>
|
| 2566 |
+
|
| 2567 |
+
<div class="callout tip">
|
| 2568 |
+
<div class="callout-title">β
Advantages over GANs</div>
|
| 2569 |
+
β’ More stable training (no adversarial dynamics)<br>
|
| 2570 |
+
β’ Better sample quality and diversity<br>
|
| 2571 |
+
β’ Mode coverage (no mode collapse)<br>
|
| 2572 |
+
β’ Controllable generation (text-to-image)
|
| 2573 |
+
</div>
|
| 2574 |
+
`,
|
| 2575 |
+
applications: `
|
| 2576 |
+
<div class="info-box">
|
| 2577 |
+
<div class="box-title">πΌοΈ Text-to-Image</div>
|
| 2578 |
+
<div class="box-content">
|
| 2579 |
+
<strong>Stable Diffusion:</strong> Open-source, runs on consumer GPUs<br>
|
| 2580 |
+
<strong>DALL-E 2:</strong> OpenAI's photorealistic generator<br>
|
| 2581 |
+
<strong>Midjourney:</strong> Artistic image generation
|
| 2582 |
+
</div>
|
| 2583 |
+
</div>
|
| 2584 |
+
`
|
| 2585 |
+
},
|
| 2586 |
+
"rnn": {
|
| 2587 |
+
overview: `
|
| 2588 |
+
<h3>RNNs & LSTMs</h3>
|
| 2589 |
+
<p>Process sequences by maintaining hidden state that captures past information.</p>
|
| 2590 |
+
|
| 2591 |
+
<h3>The Vanishing Gradient Problem</h3>
|
| 2592 |
+
<p><strong>Problem:</strong> Standard RNNs can't learn long-term dependencies (gradients vanish over many time steps)</p>
|
| 2593 |
+
<p><strong>Solution:</strong> LSTM (Long Short-Term Memory) with gating mechanisms</p>
|
| 2594 |
+
|
| 2595 |
+
<h3>LSTM Gates</h3>
|
| 2596 |
+
<ul>
|
| 2597 |
+
<li><strong>Forget Gate:</strong> What to remove from cell state</li>
|
| 2598 |
+
<li><strong>Input Gate:</strong> What new information to add</li>
|
| 2599 |
+
<li><strong>Output Gate:</strong> What to output as hidden state</li>
|
| 2600 |
+
</ul>
|
| 2601 |
+
|
| 2602 |
+
<div class="callout warning">
|
| 2603 |
+
<div class="callout-title">β οΈ Limitation</div>
|
| 2604 |
+
Sequential processing (can't parallelize) - Transformers solved this!
|
| 2605 |
+
</div>
|
| 2606 |
+
`,
|
| 2607 |
+
applications: `
|
| 2608 |
+
<div class="info-box">
|
| 2609 |
+
<div class="box-title">π Text Generation</div>
|
| 2610 |
+
<div class="box-content">Character-level generation, autocomplete (before Transformers)</div>
|
| 2611 |
+
</div>
|
| 2612 |
+
<div class="info-box">
|
| 2613 |
+
<div class="box-title">π΅ Time Series</div>
|
| 2614 |
+
<div class="box-content">Stock prediction, weather forecasting, music generation</div>
|
| 2615 |
+
</div>
|
| 2616 |
+
`,
|
| 2617 |
+
math: `
|
| 2618 |
+
<h3>RNN State Equations</h3>
|
| 2619 |
+
<p>Standard RNN processes a sequence xβ, xβ, ..., xβ using a recurring hidden state hβ.</p>
|
| 2620 |
+
|
| 2621 |
+
<div class="formula">
|
| 2622 |
+
hβ = tanh(Wββhβββ + Wββxβ + bβ)<br>
|
| 2623 |
+
yβ = Wβα΅§hβ + bα΅§
|
| 2624 |
+
</div>
|
| 2625 |
+
|
| 2626 |
+
<h3>Paper & Pain: The Vanishing Gradient Derivation</h3>
|
| 2627 |
+
<p>Why do RNNs fail on long sequences? Let's check the gradient βL/βhβ:</p>
|
| 2628 |
+
<div class="formula">
|
| 2629 |
+
βL/βhβ = (βL/βhβ) Γ (βhβ/βhβββ) Γ (βhβββ/βhβββ) Γ ... Γ (βhβ/βhβ)<br>
|
| 2630 |
+
<br>
|
| 2631 |
+
Where βhβ±Ό/βhβ±Όββ = Wββα΅ diag(tanh'(zβ±Ό))
|
| 2632 |
+
</div>
|
| 2633 |
+
<div class="callout warning">
|
| 2634 |
+
<div class="callout-title">β οΈ The Power Effect</div>
|
| 2635 |
+
If the largest eigenvalue of Wββ < 1: Gradients <strong>shrink exponentially</strong> (0.9ΒΉβ°β° β 0.00002).<br>
|
| 2636 |
+
If > 1: Gradients <strong>explode</strong>.<br>
|
| 2637 |
+
<strong>LSTM Solution:</strong> The "Constant Error Carousel" (CEC) ensures gradients flow via the cell state without multiplication.
|
| 2638 |
+
</div>
|
| 2639 |
+
|
| 2640 |
+
<h3>LSTM Gating Math</h3>
|
| 2641 |
+
<div class="list-item">
|
| 2642 |
+
<div class="list-num">01</div>
|
| 2643 |
+
<div>Forget Gate: fβ = Ο(W_f[hβββ, xβ] + b_f)</div>
|
| 2644 |
+
</div>
|
| 2645 |
+
<div class="list-item">
|
| 2646 |
+
<div class="list-num">02</div>
|
| 2647 |
+
<div>Input Gate: iβ = Ο(W_i[hβββ, xβ] + b_i)</div>
|
| 2648 |
+
</div>
|
| 2649 |
+
<div class="list-item">
|
| 2650 |
+
<div class="list-num">03</div>
|
| 2651 |
+
<div>Cell State Update: cβ = fβcβββ + iβtanh(W_c[hβββ, xβ] + b_c)</div>
|
| 2652 |
+
</div>
|
| 2653 |
+
`
|
| 2654 |
+
},
|
| 2655 |
+
"bert": {
|
| 2656 |
+
overview: `
|
| 2657 |
+
<h3>BERT (Bidirectional Encoder Representations from Transformers)</h3>
|
| 2658 |
+
<p>Pre-trained encoder-only Transformer for understanding language (not generation).</p>
|
| 2659 |
+
|
| 2660 |
+
<h3>Key Innovation: Bidirectional Context</h3>
|
| 2661 |
+
<p>Unlike GPT (left-to-right), BERT sees both left AND right context simultaneously.</p>
|
| 2662 |
+
|
| 2663 |
+
<h3>Pre-training Tasks</h3>
|
| 2664 |
+
<ul>
|
| 2665 |
+
<li><strong>Masked Language Modeling:</strong> Mask 15% of tokens, predict them (e.g., "The cat [MASK] on the mat" β predict "sat")</li>
|
| 2666 |
+
<li><strong>Next Sentence Prediction:</strong> Predict if sentence B follows A</li>
|
| 2667 |
+
</ul>
|
| 2668 |
+
|
| 2669 |
+
<div class="callout tip">
|
| 2670 |
+
<div class="callout-title">π‘ Fine-tuning BERT</div>
|
| 2671 |
+
1. Start with pre-trained BERT (trained on billions of words)<br>
|
| 2672 |
+
2. Add task-specific head (classification, QA, NER)<br>
|
| 2673 |
+
3. Fine-tune on your dataset (10K-100K examples)<br>
|
| 2674 |
+
4. Achieves SOTA with minimal data!
|
| 2675 |
+
</div>
|
| 2676 |
+
`,
|
| 2677 |
+
applications: `
|
| 2678 |
+
<div class="info-box">
|
| 2679 |
+
<div class="box-title">π Search & QA</div>
|
| 2680 |
+
<div class="box-content">
|
| 2681 |
+
<strong>Google Search:</strong> Uses BERT for understanding queries<br>
|
| 2682 |
+
Question answering systems, document retrieval
|
| 2683 |
+
</div>
|
| 2684 |
+
</div>
|
| 2685 |
+
<div class="info-box">
|
| 2686 |
+
<div class="box-title">π Text Classification</div>
|
| 2687 |
+
<div class="box-content">Sentiment analysis, topic classification, spam detection</div>
|
| 2688 |
+
</div>
|
| 2689 |
+
`
|
| 2690 |
+
},
|
| 2691 |
+
"gpt": {
|
| 2692 |
+
overview: `
|
| 2693 |
+
<h3>GPT (Generative Pre-trained Transformer)</h3>
|
| 2694 |
+
<p>Decoder-only Transformer trained to predict next token (autoregressive language modeling).</p>
|
| 2695 |
+
|
| 2696 |
+
<h3>GPT Evolution</h3>
|
| 2697 |
+
<table>
|
| 2698 |
+
<tr>
|
| 2699 |
+
<th>Model</th>
|
| 2700 |
+
<th>Params</th>
|
| 2701 |
+
<th>Training Data</th>
|
| 2702 |
+
<th>Capability</th>
|
| 2703 |
+
</tr>
|
| 2704 |
+
<tr>
|
| 2705 |
+
<td>GPT-1</td>
|
| 2706 |
+
<td>117M</td>
|
| 2707 |
+
<td>BooksCorpus</td>
|
| 2708 |
+
<td>Basic text generation</td>
|
| 2709 |
+
</tr>
|
| 2710 |
+
<tr>
|
| 2711 |
+
<td>GPT-2</td>
|
| 2712 |
+
<td>1.5B</td>
|
| 2713 |
+
<td>WebText (40GB)</td>
|
| 2714 |
+
<td>Coherent paragraphs</td>
|
| 2715 |
+
</tr>
|
| 2716 |
+
<tr>
|
| 2717 |
+
<td>GPT-3</td>
|
| 2718 |
+
<td>175B</td>
|
| 2719 |
+
<td>570GB text</td>
|
| 2720 |
+
<td>Few-shot learning</td>
|
| 2721 |
+
</tr>
|
| 2722 |
+
<tr>
|
| 2723 |
+
<td>GPT-4</td>
|
| 2724 |
+
<td>~1.8T</td>
|
| 2725 |
+
<td>Multi-modal</td>
|
| 2726 |
+
<td>Reasoning, coding, images</td>
|
| 2727 |
+
</tr>
|
| 2728 |
+
</table>
|
| 2729 |
+
|
| 2730 |
+
<div class="callout insight">
|
| 2731 |
+
<div class="callout-title">π Emergent Abilities</div>
|
| 2732 |
+
As models scale, new capabilities emerge:<br>
|
| 2733 |
+
β’ In-context learning (learn from prompts)<br>
|
| 2734 |
+
β’ Chain-of-thought reasoning<br>
|
| 2735 |
+
β’ Code generation<br>
|
| 2736 |
+
β’ Multi-step problem solving
|
| 2737 |
+
</div>
|
| 2738 |
+
`,
|
| 2739 |
+
applications: `
|
| 2740 |
+
<div class="info-box">
|
| 2741 |
+
<div class="box-title">π¬ ChatGPT & Assistants</div>
|
| 2742 |
+
<div class="box-content">
|
| 2743 |
+
Conversational AI, customer support, tutoring, brainstorming
|
| 2744 |
+
</div>
|
| 2745 |
+
</div>
|
| 2746 |
+
<div class="info-box">
|
| 2747 |
+
<div class="box-title">π» Code Generation</div>
|
| 2748 |
+
<div class="box-content">
|
| 2749 |
+
GitHub Copilot, code completion, bug fixing, documentation
|
| 2750 |
+
</div>
|
| 2751 |
+
</div>
|
| 2752 |
+
`
|
| 2753 |
+
},
|
| 2754 |
+
"vit": {
|
| 2755 |
+
overview: `
|
| 2756 |
+
<h3>Vision Transformer (ViT)</h3>
|
| 2757 |
+
<p>Apply Transformer architecture directly to images by treating them as sequences of patches.</p>
|
| 2758 |
+
|
| 2759 |
+
<h3>How ViT Works</h3>
|
| 2760 |
+
<div class="list-item">
|
| 2761 |
+
<div class="list-num">01</div>
|
| 2762 |
+
<div><strong>Patchify:</strong> Split 224Γ224 image into 16Γ16 patches (14Γ14 = 196 patches)</div>
|
| 2763 |
+
</div>
|
| 2764 |
+
<div class="list-item">
|
| 2765 |
+
<div class="list-num">02</div>
|
| 2766 |
+
<div><strong>Linear Projection:</strong> Flatten each patch β linear embedding (like word embeddings)</div>
|
| 2767 |
+
</div>
|
| 2768 |
+
<div class="list-item">
|
| 2769 |
+
<div class="list-num">03</div>
|
| 2770 |
+
<div><strong>Positional Encoding:</strong> Add position information</div>
|
| 2771 |
+
</div>
|
| 2772 |
+
<div class="list-item">
|
| 2773 |
+
<div class="list-num">04</div>
|
| 2774 |
+
<div><strong>Transformer Encoder:</strong> Standard Transformer (self-attention, FFN)</div>
|
| 2775 |
+
</div>
|
| 2776 |
+
<div class="list-item">
|
| 2777 |
+
<div class="list-num">05</div>
|
| 2778 |
+
<div><strong>Classification:</strong> Use [CLS] token for final prediction</div>
|
| 2779 |
+
</div>
|
| 2780 |
+
|
| 2781 |
+
<div class="callout tip">
|
| 2782 |
+
<div class="callout-title">π‘ When ViT Shines</div>
|
| 2783 |
+
β’ <strong>Large Datasets:</strong> Needs 10M+ images (or pre-training on ImageNet-21K)<br>
|
| 2784 |
+
β’ <strong>Transfer Learning:</strong> Pre-trained ViT beats CNNs on many tasks<br>
|
| 2785 |
+
β’ <strong>Long-Range Dependencies:</strong> Global attention vs CNN's local receptive field
|
| 2786 |
+
</div>
|
| 2787 |
+
`
|
| 2788 |
+
}
|
| 2789 |
+
};
|
| 2790 |
+
|
| 2791 |
+
function createModuleHTML(module) {
|
| 2792 |
+
const content = MODULE_CONTENT[module.id] || {};
|
| 2793 |
+
|
| 2794 |
+
return `
|
| 2795 |
+
<div class="module" id="${module.id}-module">
|
| 2796 |
+
<button class="btn-back" onclick="switchTo('dashboard')">β Back to Dashboard</button>
|
| 2797 |
+
<header>
|
| 2798 |
+
<h1>${module.icon} ${module.title}</h1>
|
| 2799 |
+
<p class="subtitle">${module.description}</p>
|
| 2800 |
+
</header>
|
| 2801 |
+
|
| 2802 |
+
<div class="tabs">
|
| 2803 |
+
<button class="tab-btn active" onclick="switchTab(event, '${module.id}-overview')">Overview</button>
|
| 2804 |
+
<button class="tab-btn" onclick="switchTab(event, '${module.id}-concepts')">Key Concepts</button>
|
| 2805 |
+
<button class="tab-btn" onclick="switchTab(event, '${module.id}-visualization')">π Visualization</button>
|
| 2806 |
+
<button class="tab-btn" onclick="switchTab(event, '${module.id}-math')">Math</button>
|
| 2807 |
+
<button class="tab-btn" onclick="switchTab(event, '${module.id}-applications')">Applications</button>
|
| 2808 |
+
<button class="tab-btn" onclick="switchTab(event, '${module.id}-summary')">Summary</button>
|
| 2809 |
+
</div>
|
| 2810 |
+
|
| 2811 |
+
<div id="${module.id}-overview" class="tab active">
|
| 2812 |
+
<div class="section">
|
| 2813 |
+
<h2>π Overview</h2>
|
| 2814 |
+
${content.overview || `
|
| 2815 |
+
<p>Complete coverage of ${module.title.toLowerCase()}. Learn the fundamentals, mathematics, real-world applications, and implementation details.</p>
|
| 2816 |
+
<div class="info-box">
|
| 2817 |
+
<div class="box-title">Learning Objectives</div>
|
| 2818 |
+
<div class="box-content">
|
| 2819 |
+
β Understand core concepts and theory<br>
|
| 2820 |
+
β Master mathematical foundations<br>
|
| 2821 |
+
β Learn practical applications<br>
|
| 2822 |
+
β Implement and experiment
|
| 2823 |
+
</div>
|
| 2824 |
+
</div>
|
| 2825 |
+
`}
|
| 2826 |
+
</div>
|
| 2827 |
+
</div>
|
| 2828 |
+
|
| 2829 |
+
<div id="${module.id}-concepts" class="tab">
|
| 2830 |
+
<div class="section">
|
| 2831 |
+
<h2>π― Key Concepts</h2>
|
| 2832 |
+
${content.concepts || `
|
| 2833 |
+
<p>Fundamental concepts and building blocks for ${module.title.toLowerCase()}.</p>
|
| 2834 |
+
<div class="callout insight">
|
| 2835 |
+
<div class="callout-title">π‘ Main Ideas</div>
|
| 2836 |
+
This section covers the core ideas you need to understand before diving into mathematics.
|
| 2837 |
+
</div>
|
| 2838 |
+
`}
|
| 2839 |
+
</div>
|
| 2840 |
+
</div>
|
| 2841 |
+
|
| 2842 |
+
<div id="${module.id}-visualization" class="tab">
|
| 2843 |
+
<div class="section">
|
| 2844 |
+
<h2>π Interactive Visualization</h2>
|
| 2845 |
+
<p>Visual representation to help understand ${module.title.toLowerCase()} concepts intuitively.</p>
|
| 2846 |
+
<div id="${module.id}-viz" class="viz-container">
|
| 2847 |
+
<canvas id="${module.id}-canvas" width="800" height="400" style="border: 1px solid rgba(0, 212, 255, 0.3); border-radius: 8px; background: rgba(0, 212, 255, 0.02);"></canvas>
|
| 2848 |
+
</div>
|
| 2849 |
+
<div class="viz-controls">
|
| 2850 |
+
<button onclick="drawVisualization('${module.id}')" class="btn-viz">π Refresh Visualization</button>
|
| 2851 |
+
<button onclick="toggleVizAnimation('${module.id}')" class="btn-viz">βΆοΈ Animate</button>
|
| 2852 |
+
<button onclick="downloadViz('${module.id}')" class="btn-viz">β¬οΈ Save Image</button>
|
| 2853 |
+
</div>
|
| 2854 |
+
</div>
|
| 2855 |
+
</div>
|
| 2856 |
+
|
| 2857 |
+
<div id="${module.id}-math" class="tab">
|
| 2858 |
+
<div class="section">
|
| 2859 |
+
<h2>π Mathematical Foundation</h2>
|
| 2860 |
+
${content.math || `
|
| 2861 |
+
<p>Rigorous mathematical treatment of ${module.title.toLowerCase()}.</p>
|
| 2862 |
+
<div class="formula">
|
| 2863 |
+
Mathematical formulas and derivations go here
|
| 2864 |
+
</div>
|
| 2865 |
+
`}
|
| 2866 |
+
</div>
|
| 2867 |
+
</div>
|
| 2868 |
+
|
| 2869 |
+
<div id="${module.id}-applications" class="tab">
|
| 2870 |
+
<div class="section">
|
| 2871 |
+
<h2>π Real-World Applications</h2>
|
| 2872 |
+
${content.applications || `
|
| 2873 |
+
<p>How ${module.title.toLowerCase()} is used in practice across different industries.</p>
|
| 2874 |
+
<div class="info-box">
|
| 2875 |
+
<div class="box-title">Use Cases</div>
|
| 2876 |
+
<div class="box-content">
|
| 2877 |
+
Common applications and practical examples
|
| 2878 |
+
</div>
|
| 2879 |
+
</div>
|
| 2880 |
+
`}
|
| 2881 |
</div>
|
| 2882 |
</div>
|
| 2883 |
|
README.md
CHANGED
|
@@ -8,6 +8,7 @@ Visit our courses directly in your browser:
|
|
| 8 |
|
| 9 |
- [π Interactive Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
|
| 10 |
- [π€ Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
|
|
|
|
| 11 |
- [π Data Visualization](https://aashishgarg13.github.io/DataScience/Visualization/)
|
| 12 |
- [π’ Mathematics for Data Science](https://aashishgarg13.github.io/DataScience/math-ds-complete/)
|
| 13 |
- [βοΈ Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)
|
|
@@ -42,6 +43,16 @@ Essential resources for mastering AI prompt engineering:
|
|
| 42 |
- Visual Learning Aids
|
| 43 |
- Step-by-Step Explanations
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
### π Data Visualization
|
| 46 |
- **Location:** `Visualization/`
|
| 47 |
- **Features:**
|
|
@@ -82,6 +93,7 @@ The repository supports automatic updates for:
|
|
| 82 |
Visit our GitHub Pages hosted versions:
|
| 83 |
1. [Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
|
| 84 |
2. [Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
|
|
|
|
| 85 |
|
| 86 |
### Option B: Run Locally (Recommended for Development)
|
| 87 |
|
|
@@ -130,6 +142,12 @@ ml_complete-all-topics/
|
|
| 130 |
βββ app.js # Interactive components
|
| 131 |
```
|
| 132 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
### Data Visualization
|
| 134 |
```
|
| 135 |
Visualization/
|
|
|
|
| 8 |
|
| 9 |
- [π Interactive Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
|
| 10 |
- [π€ Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
|
| 11 |
+
- [π§ Deep Learning Masterclass](https://aashishgarg13.github.io/DataScience/DeepLearning/Deep%20Learning%20Curriculum.html)
|
| 12 |
- [π Data Visualization](https://aashishgarg13.github.io/DataScience/Visualization/)
|
| 13 |
- [π’ Mathematics for Data Science](https://aashishgarg13.github.io/DataScience/math-ds-complete/)
|
| 14 |
- [βοΈ Feature Engineering Guide](https://aashishgarg13.github.io/DataScience/feature-engineering/)
|
|
|
|
| 43 |
- Visual Learning Aids
|
| 44 |
- Step-by-Step Explanations
|
| 45 |
|
| 46 |
+
### π§ Deep Learning Masterclass
|
| 47 |
+
- **Location:** `DeepLearning/`
|
| 48 |
+
- **Features:**
|
| 49 |
+
- **"Paper & Pain" Methodology:** Rigorous mathematical derivations
|
| 50 |
+
- Neural Network Foundations (MLP, Backprop, Optimizers)
|
| 51 |
+
- Convolutional Neural Networks (CNNs) & Computer Vision
|
| 52 |
+
- Generative AI (GANs, Diffusion Models)
|
| 53 |
+
- Transformers & Large Language Models (LLMs)
|
| 54 |
+
- Interactive Canvas Visualizations
|
| 55 |
+
|
| 56 |
### π Data Visualization
|
| 57 |
- **Location:** `Visualization/`
|
| 58 |
- **Features:**
|
|
|
|
| 93 |
Visit our GitHub Pages hosted versions:
|
| 94 |
1. [Statistics Course](https://aashishgarg13.github.io/DataScience/complete-statistics/)
|
| 95 |
2. [Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)
|
| 96 |
+
3. [Deep Learning Masterclass](https://aashishgarg13.github.io/DataScience/DeepLearning/Deep%20Learning%20Curriculum.html)
|
| 97 |
|
| 98 |
### Option B: Run Locally (Recommended for Development)
|
| 99 |
|
|
|
|
| 142 |
βββ app.js # Interactive components
|
| 143 |
```
|
| 144 |
|
| 145 |
+
### Deep Learning Masterclass
|
| 146 |
+
```
|
| 147 |
+
DeepLearning/
|
| 148 |
+
βββ Deep Learning Curriculum.html # All-in-one interactive curriculum
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
### Data Visualization
|
| 152 |
```
|
| 153 |
Visualization/
|