Spaces:
Running
Running
add deeplearnin
Browse files
DeepLearning/Deep Learning Curriculum.html
CHANGED
|
@@ -995,6 +995,48 @@
|
|
| 995 |
W_out = floor((W_in + 2×padding - kernel_size) / stride) + 1<br>
|
| 996 |
H_out = floor((H_in + 2×padding - kernel_size) / stride) + 1
|
| 997 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 998 |
`
|
| 999 |
},
|
| 1000 |
"yolo": {
|
|
@@ -1113,6 +1155,28 @@
|
|
| 1113 |
Tumor localization, cell counting, anatomical structure detection in X-rays/CT scans
|
| 1114 |
</div>
|
| 1115 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1116 |
`
|
| 1117 |
},
|
| 1118 |
"transformers": {
|
|
@@ -1854,11 +1918,34 @@
|
|
| 1854 |
`,
|
| 1855 |
applications: `
|
| 1856 |
<div class="info-box">
|
| 1857 |
-
<div class="box-title">📸
|
| 1858 |
<div class="box-content">
|
| 1859 |
-
|
| 1860 |
</div>
|
| 1861 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1862 |
`
|
| 1863 |
},
|
| 1864 |
"pooling": {
|
|
@@ -1918,6 +2005,37 @@
|
|
| 1918 |
Used after conv layers in AlexNet, VGG, and most classic CNNs to progressively reduce spatial dimensions
|
| 1919 |
</div>
|
| 1920 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1921 |
`
|
| 1922 |
},
|
| 1923 |
"cnn-basics": {
|
|
@@ -2124,24 +2242,56 @@
|
|
| 2124 |
Sparked AI renaissance - Google, Facebook, Microsoft pivoted to deep learning after AlexNet
|
| 2125 |
</div>
|
| 2126 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2127 |
`
|
| 2128 |
},
|
| 2129 |
"vgg": {
|
| 2130 |
overview: `
|
| 2131 |
<h3>VGGNet (2014) - The Power of Depth</h3>
|
| 2132 |
<p>VGG showed that depth matters - 16-19 layers using only small 3×3 filters.</p>
|
| 2133 |
-
|
| 2134 |
-
|
| 2135 |
-
<
|
| 2136 |
-
<
|
| 2137 |
-
<
|
| 2138 |
-
<
|
| 2139 |
-
|
| 2140 |
-
|
| 2141 |
-
|
| 2142 |
-
|
| 2143 |
-
|
| 2144 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2145 |
</div>
|
| 2146 |
`
|
| 2147 |
},
|
|
@@ -2169,6 +2319,35 @@
|
|
| 2169 |
• Won ImageNet 2015<br>
|
| 2170 |
• Skip connections now used everywhere (U-Net, Transformers, etc.)
|
| 2171 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2172 |
`
|
| 2173 |
},
|
| 2174 |
"inception": {
|
|
@@ -2180,17 +2359,35 @@
|
|
| 2180 |
<div class="formula">
|
| 2181 |
Input → [1×1 conv] ⊕ [3×3 conv] ⊕ [5×5 conv] ⊕ [3×3 pool] → Concatenate
|
| 2182 |
</div>
|
| 2183 |
-
|
| 2184 |
-
|
| 2185 |
-
<
|
| 2186 |
-
|
| 2187 |
-
<
|
| 2188 |
-
<
|
| 2189 |
-
</
|
| 2190 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2191 |
<div class="callout insight">
|
| 2192 |
-
<div class="callout-title"
|
| 2193 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2194 |
</div>
|
| 2195 |
`
|
| 2196 |
},
|
|
@@ -2230,6 +2427,33 @@
|
|
| 2230 |
• Latency-critical systems<br>
|
| 2231 |
• Good accuracy with 10-20× speedup
|
| 2232 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2233 |
`
|
| 2234 |
},
|
| 2235 |
"transfer-learning": {
|
|
|
|
| 995 |
W_out = floor((W_in + 2×padding - kernel_size) / stride) + 1<br>
|
| 996 |
H_out = floor((H_in + 2×padding - kernel_size) / stride) + 1
|
| 997 |
</div>
|
| 998 |
+
`,
|
| 999 |
+
math: `
|
| 1000 |
+
<h3>The Mathematical Operation: Cross-Correlation</h3>
|
| 1001 |
+
<p>In deep learning, what we call "convolution" is mathematically "cross-correlation". It is a local dot product of the kernel and image patch.</p>
|
| 1002 |
+
|
| 1003 |
+
<div class="formula">
|
| 1004 |
+
S(i, j) = (I * K)(i, j) = Σ_m Σ_n I(i+m, j+n) K(m, n)
|
| 1005 |
+
</div>
|
| 1006 |
+
|
| 1007 |
+
<div class="callout insight">
|
| 1008 |
+
<div class="callout-title">📝 Paper & Pain: Manual Convolution</div>
|
| 1009 |
+
**Input (3x3):**<br>
|
| 1010 |
+
[1 2 0]<br>
|
| 1011 |
+
[0 1 1]<br>
|
| 1012 |
+
[1 0 2]<br>
|
| 1013 |
+
<br>
|
| 1014 |
+
**Kernel (2x2):**<br>
|
| 1015 |
+
[1 0]<br>
|
| 1016 |
+
[0 1]<br>
|
| 1017 |
+
<br>
|
| 1018 |
+
**Calculation:**<br>
|
| 1019 |
+
Step 1 (Top-Left): (1x1) + (2x0) + (0x0) + (1x1) = <strong>2</strong><br>
|
| 1020 |
+
Step 2 (Top-Right): (2x1) + (0x0) + (1x0) + (1x1) = <strong>3</strong><br>
|
| 1021 |
+
... Output is a 2x2 matrix.
|
| 1022 |
+
</div>
|
| 1023 |
+
|
| 1024 |
+
<h3>Backprop through Conv</h3>
|
| 1025 |
+
<p>Calculated using the same formula but with the kernel flipped vertically and horizontally (true convolution)!</p>
|
| 1026 |
+
`,
|
| 1027 |
+
applications: `
|
| 1028 |
+
<div class="info-box">
|
| 1029 |
+
<div class="box-title">🔍 Feature Extraction</div>
|
| 1030 |
+
<div class="box-content">
|
| 1031 |
+
Early layers learn edges (Gabor-like filters), middle layers learn textures, deep layers learn specific object parts (eyes, wheels).
|
| 1032 |
+
</div>
|
| 1033 |
+
</div>
|
| 1034 |
+
<div class="info-box">
|
| 1035 |
+
<div class="box-title">🎨 Image Processing</div>
|
| 1036 |
+
<div class="box-content">
|
| 1037 |
+
Blurring, sharpening, and edge detection in Photoshop/GIMP are all done with 2D convolutions using fixed kernels.
|
| 1038 |
+
</div>
|
| 1039 |
+
</div>
|
| 1040 |
`
|
| 1041 |
},
|
| 1042 |
"yolo": {
|
|
|
|
| 1155 |
Tumor localization, cell counting, anatomical structure detection in X-rays/CT scans
|
| 1156 |
</div>
|
| 1157 |
</div>
|
| 1158 |
+
`,
|
| 1159 |
+
math: `
|
| 1160 |
+
<h3>Intersection over Union (IoU)</h3>
|
| 1161 |
+
<p>How do we measure if a predicted box is correct? We use the geometric ratio of intersection and union.</p>
|
| 1162 |
+
<div class="formula">
|
| 1163 |
+
IoU = Area of Overlap / Area of Union
|
| 1164 |
+
</div>
|
| 1165 |
+
|
| 1166 |
+
<div class="callout insight">
|
| 1167 |
+
<div class="callout-title">📝 Paper & Pain: Manual IoU</div>
|
| 1168 |
+
**Box A (GT):** [0,0,10,10] (Area=100)<br>
|
| 1169 |
+
**Box B (Pred):** [5,5,15,15] (Area=100)<br>
|
| 1170 |
+
1. **Intersection:** Area between [5,5] and [10,10] = 5x5 = 25<br>
|
| 1171 |
+
2. **Union:** Area A + Area B - Intersection = 100 + 100 - 25 = 175<br>
|
| 1172 |
+
3. **IoU:** 25 / 175 ≈ <strong>0.142</strong> (Poor match!)
|
| 1173 |
+
</div>
|
| 1174 |
+
|
| 1175 |
+
<h3>YOLO Multi-Part Loss</h3>
|
| 1176 |
+
<p>YOLO uses a composite loss function combining localization, confidence, and classification errors.</p>
|
| 1177 |
+
<div class="formula">
|
| 1178 |
+
L = λ_coord Σ(Localization Loss) + Σ(Confidence Loss) + Σ(Classification Loss)
|
| 1179 |
+
</div>
|
| 1180 |
`
|
| 1181 |
},
|
| 1182 |
"transformers": {
|
|
|
|
| 1918 |
`,
|
| 1919 |
applications: `
|
| 1920 |
<div class="info-box">
|
| 1921 |
+
<div class="box-title">📸 Real-World CV</div>
|
| 1922 |
<div class="box-content">
|
| 1923 |
+
Face ID, medical imaging (MRI/CT), autonomous drone navigation, manufacturing defect detection, and satellite imagery analysis
|
| 1924 |
</div>
|
| 1925 |
</div>
|
| 1926 |
+
`,
|
| 1927 |
+
math: `
|
| 1928 |
+
<h3>The Parameter Explosion Problem</h3>
|
| 1929 |
+
<p>Why do standard Neural Networks fail on images? Let's calculate the parameters for a small image.</p>
|
| 1930 |
+
|
| 1931 |
+
<div class="callout insight">
|
| 1932 |
+
<div class="callout-title">📝 Paper & Pain: MLP vs Images</div>
|
| 1933 |
+
1. **Input:** 224 × 224 pixels with 3 color channels (RGB)<br>
|
| 1934 |
+
2. **Input Size:** 224 × 224 × 3 = <strong>150,528 features</strong><br>
|
| 1935 |
+
3. **Hidden Layer:** Suppose we want just 1000 neurons.<br>
|
| 1936 |
+
4. **Matrix size:** [1000, 150528]<br>
|
| 1937 |
+
5. **Total Weights:** 1000 × 150528 ≈ <strong>150 Million parameters</strong> for just ONE layer!
|
| 1938 |
+
</div>
|
| 1939 |
+
|
| 1940 |
+
<h3>The CNN Solution: Weight Sharing</h3>
|
| 1941 |
+
<p>Instead of every neuron looking at every pixel, we use <strong>translation invariance</strong>. If an edge detector works in the top-left, it should work in the bottom-right.</p>
|
| 1942 |
+
|
| 1943 |
+
<div class="formula">
|
| 1944 |
+
Total Params = (Kernel_H × Kernel_W × Input_Channels) × Num_Filters<br>
|
| 1945 |
+
<br>
|
| 1946 |
+
For a 3x3 filter: (3 × 3 × 3) × 64 = <strong>1,728 parameters</strong><br>
|
| 1947 |
+
Reduction: 150M / 1.7k ≈ <strong>86,000× more efficient!</strong>
|
| 1948 |
+
</div>
|
| 1949 |
`
|
| 1950 |
},
|
| 1951 |
"pooling": {
|
|
|
|
| 2005 |
Used after conv layers in AlexNet, VGG, and most classic CNNs to progressively reduce spatial dimensions
|
| 2006 |
</div>
|
| 2007 |
</div>
|
| 2008 |
+
`,
|
| 2009 |
+
math: `
|
| 2010 |
+
<h3>Max Pooling: Winning Signal Selection</h3>
|
| 2011 |
+
<p>Pooling operations are non-parametric (no weights). They simply select or average values within a local window.</p>
|
| 2012 |
+
|
| 2013 |
+
<div class="callout insight">
|
| 2014 |
+
<div class="callout-title">📝 Paper & Pain: 2x2 Max Pooling</div>
|
| 2015 |
+
**Input (4x4):**<br>
|
| 2016 |
+
[1 3 | 2 1]<br>
|
| 2017 |
+
[5 1 | 0 2]<br>
|
| 2018 |
+
-----------<br>
|
| 2019 |
+
[1 1 | 8 2]<br>
|
| 2020 |
+
[0 2 | 4 1]<br>
|
| 2021 |
+
<br>
|
| 2022 |
+
**Output (2x2):**<br>
|
| 2023 |
+
Step 1: max(1, 3, 5, 1) = <strong>5</strong><br>
|
| 2024 |
+
Step 2: max(2, 1, 0, 2) = <strong>2</strong><br>
|
| 2025 |
+
Step 3: max(1, 1, 0, 2) = <strong>2</strong><br>
|
| 2026 |
+
Step 4: max(8, 2, 4, 1) = <strong>8</strong><br>
|
| 2027 |
+
**Final:** [5 2] / [2 8]
|
| 2028 |
+
</div>
|
| 2029 |
+
|
| 2030 |
+
<h3>Backprop through Pooling</h3>
|
| 2031 |
+
<div class="list-item">
|
| 2032 |
+
<div class="list-num">💡</div>
|
| 2033 |
+
<div><strong>Max Pooling:</strong> Gradient is routed ONLY to the neuron that had the maximum value. All others get 0.</div>
|
| 2034 |
+
</div>
|
| 2035 |
+
<div class="list-item">
|
| 2036 |
+
<div class="list-num">💡</div>
|
| 2037 |
+
<div><strong>Average Pooling:</strong> Gradient is distributed evenly among all neurons in the window.</div>
|
| 2038 |
+
</div>
|
| 2039 |
`
|
| 2040 |
},
|
| 2041 |
"cnn-basics": {
|
|
|
|
| 2242 |
Sparked AI renaissance - Google, Facebook, Microsoft pivoted to deep learning after AlexNet
|
| 2243 |
</div>
|
| 2244 |
</div>
|
| 2245 |
+
`,
|
| 2246 |
+
math: `
|
| 2247 |
+
<h3>Paper & Pain: Parameter Counting</h3>
|
| 2248 |
+
<p>Understanding AlexNet's 60M parameters:</p>
|
| 2249 |
+
<div class="list-item">
|
| 2250 |
+
<div class="list-num">01</div>
|
| 2251 |
+
<div><strong>Conv Layers:</strong> Only ~2.3 Million parameters. They do most of the work with small memory!</div>
|
| 2252 |
+
</div>
|
| 2253 |
+
<div class="list-item">
|
| 2254 |
+
<div class="list-num">02</div>
|
| 2255 |
+
<div><strong>FC Layers:</strong> Over **58 Million parameters**. The first FC layer (FC6) takes 4096 * (6*6*256) ≈ 37M params!</div>
|
| 2256 |
+
</div>
|
| 2257 |
+
<div class="callout warning">
|
| 2258 |
+
<div class="callout-title">⚠️ The Design Flaw</div>
|
| 2259 |
+
FC layers are the memory bottleneck. Modern models (ResNet, Inception) replace these with Global Average Pooling to save 90% parameters.
|
| 2260 |
+
</div>
|
| 2261 |
`
|
| 2262 |
},
|
| 2263 |
"vgg": {
|
| 2264 |
overview: `
|
| 2265 |
<h3>VGGNet (2014) - The Power of Depth</h3>
|
| 2266 |
<p>VGG showed that depth matters - 16-19 layers using only small 3×3 filters.</p>
|
| 2267 |
+
`,
|
| 2268 |
+
concepts: `
|
| 2269 |
+
<h3>Small Filters, Receptive Field</h3>
|
| 2270 |
+
<div class="list-item">
|
| 2271 |
+
<div class="list-num">01</div>
|
| 2272 |
+
<div><strong>Uniformity:</strong> Uses 3×3 filters everywhere with stride 1, padding 1.</div>
|
| 2273 |
+
</div>
|
| 2274 |
+
<div class="list-item">
|
| 2275 |
+
<div class="list-num">02</div>
|
| 2276 |
+
<div><strong>Pooling Pattern:</strong> 2×2 max pooling after every 2-3 conv layers.</div>
|
| 2277 |
+
</div>
|
| 2278 |
+
`,
|
| 2279 |
+
math: `
|
| 2280 |
+
<h3>The 5×5 vs 3×3+3×3 Equivalence</h3>
|
| 2281 |
+
<p>Why stack 3x3 filters instead of one large filter?</p>
|
| 2282 |
+
<div class="callout insight">
|
| 2283 |
+
<div class="callout-title">📝 Paper & Pain: Paramount Efficiency</div>
|
| 2284 |
+
1. **Receptive Field:** Two 3x3 layers cover 5x5 area. Three 3x3 layers cover 7x7 area.<br>
|
| 2285 |
+
2. **Param Count (C filters):**<br>
|
| 2286 |
+
• One 7x7 layer: 7² × C² = 49C² parameters.<br>
|
| 2287 |
+
• Three 3x3 layers: 3 × (3² × C²) = 27C² parameters.<br>
|
| 2288 |
+
**Result:** 45% reduction in weights for the SAME "view" of the image!
|
| 2289 |
+
</div>
|
| 2290 |
+
`,
|
| 2291 |
+
applications: `
|
| 2292 |
+
<div class="info-box">
|
| 2293 |
+
<div class="box-title">🖼️ Feature Backbone</div>
|
| 2294 |
+
VGG is the preferred architectural backbone for Neural Style Transfer and early GANs due to its simple, clean feature extraction properties.
|
| 2295 |
</div>
|
| 2296 |
`
|
| 2297 |
},
|
|
|
|
| 2319 |
• Won ImageNet 2015<br>
|
| 2320 |
• Skip connections now used everywhere (U-Net, Transformers, etc.)
|
| 2321 |
</div>
|
| 2322 |
+
`,
|
| 2323 |
+
concepts: `
|
| 2324 |
+
<h3>Identity & Projection Shortcuts</h3>
|
| 2325 |
+
<div class="list-item">
|
| 2326 |
+
<div class="list-num">01</div>
|
| 2327 |
+
<div><strong>Identity Shortcut:</strong> Used when dimensions match. y = F(x, {W}) + x</div>
|
| 2328 |
+
</div>
|
| 2329 |
+
<div class="list-item">
|
| 2330 |
+
<div class="list-num">02</div>
|
| 2331 |
+
<div><strong>Projection Shortcut (1×1 Conv):</strong> Used when dimensions change. y = F(x, {W}) + W_s x</div>
|
| 2332 |
+
</div>
|
| 2333 |
+
`,
|
| 2334 |
+
math: `
|
| 2335 |
+
<h3>The Vanishing Gradient Solution</h3>
|
| 2336 |
+
<p>Why do skip connections help? Let's differentiate the output y = F(x) + x:</p>
|
| 2337 |
+
<div class="formula">
|
| 2338 |
+
∂y/∂x = ∂F/∂x + 1
|
| 2339 |
+
</div>
|
| 2340 |
+
<div class="callout insight">
|
| 2341 |
+
<div class="callout-title">📝 Paper & Pain: Gradient Flow</div>
|
| 2342 |
+
The "+1" term acts as a **gradient highway**. Even if the weights in F(x) are small (causing ∂F/∂x → 0), the gradient can still flow through the +1 term. <br>
|
| 2343 |
+
This prevents the gradient from vanishing even in networks with 1000+ layers!
|
| 2344 |
+
</div>
|
| 2345 |
+
`,
|
| 2346 |
+
applications: `
|
| 2347 |
+
<div class="info-box">
|
| 2348 |
+
<div class="box-title">🏗️ Modern Vision Backbones</div>
|
| 2349 |
+
<div class="box-content">ResNet is the default starting point for nearly all computer vision tasks today (Mask R-CNN, YOLO, etc.).</div>
|
| 2350 |
+
</div>
|
| 2351 |
`
|
| 2352 |
},
|
| 2353 |
"inception": {
|
|
|
|
| 2359 |
<div class="formula">
|
| 2360 |
Input → [1×1 conv] ⊕ [3×3 conv] ⊕ [5×5 conv] ⊕ [3×3 pool] → Concatenate
|
| 2361 |
</div>
|
| 2362 |
+
`,
|
| 2363 |
+
concepts: `
|
| 2364 |
+
<h3>Core Innovations</h3>
|
| 2365 |
+
<div class="list-item">
|
| 2366 |
+
<div class="list-num">01</div>
|
| 2367 |
+
<div><strong>1×1 Bottlenecks:</strong> Dimensionality reduction before expensive convolutions.</div>
|
| 2368 |
+
</div>
|
| 2369 |
+
<div class="list-item">
|
| 2370 |
+
<div class="list-num">02</div>
|
| 2371 |
+
<div><strong>Auxiliary Classifiers:</strong> Used during training to combat gradient vanishing in middle layers.</div>
|
| 2372 |
+
</div>
|
| 2373 |
+
`,
|
| 2374 |
+
math: `
|
| 2375 |
+
<h3>1×1 Convolution Math (Network-in-Network)</h3>
|
| 2376 |
+
<p>A 1×1 convolution acts like a channel-wise MLP. It maps input channels C to output channels C' using 1×1×C parameters per filter.</p>
|
| 2377 |
<div class="callout insight">
|
| 2378 |
+
<div class="callout-title">📝 Paper & Pain: Compression</div>
|
| 2379 |
+
Input: 28x28x256 | Target: 28x28x512 with 3x3 Filters.<br>
|
| 2380 |
+
**Direct:** 512 * (3*3*256) ≈ 1.1 Million params.<br>
|
| 2381 |
+
**Inception (1x1 bottleneck to 64):**<br>
|
| 2382 |
+
Step 1 (1x1): 64 * (1*1*256) = 16k params.<br>
|
| 2383 |
+
Step 2 (3x3): 512 * (3*3*64) = 294k params.<br>
|
| 2384 |
+
**Total:** 310k params. **~3.5× reduction in parameters!**
|
| 2385 |
+
</div>
|
| 2386 |
+
`,
|
| 2387 |
+
applications: `
|
| 2388 |
+
<div class="info-box">
|
| 2389 |
+
<div class="box-title">🏎️ Computational Efficiency</div>
|
| 2390 |
+
Inception designs are optimized for running deep networks on limited compute budgets.
|
| 2391 |
</div>
|
| 2392 |
`
|
| 2393 |
},
|
|
|
|
| 2427 |
• Latency-critical systems<br>
|
| 2428 |
• Good accuracy with 10-20× speedup
|
| 2429 |
</div>
|
| 2430 |
+
`,
|
| 2431 |
+
concepts: `
|
| 2432 |
+
<h3>Efficiency Factors</h3>
|
| 2433 |
+
<div class="list-item">
|
| 2434 |
+
<div class="list-num">01</div>
|
| 2435 |
+
<div><strong>Width Multiplier (α):</strong> Thins the network by reducing channels.</div>
|
| 2436 |
+
</div>
|
| 2437 |
+
<div class="list-item">
|
| 2438 |
+
<div class="list-num">02</div>
|
| 2439 |
+
<div><strong>Resolution Multiplier (ρ):</strong> Reduces input image size.</div>
|
| 2440 |
+
</div>
|
| 2441 |
+
`,
|
| 2442 |
+
math: `
|
| 2443 |
+
<h3>Depthwise Separable Math</h3>
|
| 2444 |
+
<p>Standard convolution complexity: F² × C_in × C_out × H × W</p>
|
| 2445 |
+
<p>Separable complexity: (F² × C_in + C_in × C_out) × H × W</p>
|
| 2446 |
+
<div class="callout insight">
|
| 2447 |
+
<div class="callout-title">📝 Paper & Pain: The 9× Speedup</div>
|
| 2448 |
+
Reduction ratio is roughly: 1/C_out + 1/F². <br>
|
| 2449 |
+
For 3x3 filters (F=3): Reduction is roughly **1/9th** the computation of standard conv!
|
| 2450 |
+
</div>
|
| 2451 |
+
`,
|
| 2452 |
+
applications: `
|
| 2453 |
+
<div class="info-box">
|
| 2454 |
+
<div class="box-title">📱 Edge Devices</div>
|
| 2455 |
+
<div class="box-content">Real-time object detection on smartphones, web browsers (TensorFlow.js), and IoT devices.</div>
|
| 2456 |
+
</div>
|
| 2457 |
`
|
| 2458 |
},
|
| 2459 |
"transfer-learning": {
|