AmberLJC commited on
Commit
2c24e4a
·
verified ·
1 Parent(s): 0d8aaba

Upload todo.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. todo.md +32 -0
todo.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PlainMLP vs ResMLP Comparison - Distant Identity Task
2
+
3
+ ## Objective
4
+ Compare a 20-layer PlainMLP and ResMLP on a synthetic "Distant Identity" task to demonstrate the vanishing gradient problem and how residual connections solve it.
5
+
6
+ ## Tasks
7
+
8
+ ### Phase 1: Implementation
9
+ - [ ] Implement PlainMLP (20 layers, hidden dim 64, ReLU, Kaiming He init)
10
+ - [ ] Implement ResMLP (20 layers, hidden dim 64, residual connections, Kaiming He init)
11
+ - [ ] Generate synthetic data (1024 vectors, dim 64, U(-1,1), Y=X)
12
+
13
+ ### Phase 2: Training
14
+ - [ ] Train both models for 500 steps with Adam (lr=1e-3)
15
+ - [ ] Record MSE loss at each step
16
+
17
+ ### Phase 3: Final State Analysis
18
+ - [ ] Implement PyTorch hooks for gradient and activation capture
19
+ - [ ] Perform forward/backward pass on new random batch
20
+ - [ ] Capture L2 norm of gradients at each layer
21
+ - [ ] Capture mean and std of activations at each layer
22
+
23
+ ### Phase 4: Visualization & Reporting
24
+ - [ ] Plot Training Loss vs Steps (both models)
25
+ - [ ] Plot Gradient Magnitude vs Layer Depth
26
+ - [ ] Plot Activation Mean vs Layer Depth
27
+ - [ ] Plot Activation Std vs Layer Depth
28
+ - [ ] Write summary report with analysis
29
+
30
+ ## Expected Outcomes
31
+ - PlainMLP: Vanishing gradients, poor learning of identity function
32
+ - ResMLP: Stable gradients, successful learning of identity function