Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

AmberLJC
/

resmlp_comparison

Model card Files Files and versions

xet

Community

resmlp_comparison / todo.md

AmberLJC

Upload todo.md with huggingface_hub

2c24e4a verified 15 days ago

preview code

|

raw

history blame contribute delete

1.3 kB

PlainMLP vs ResMLP Comparison - Distant Identity Task

Objective

Compare a 20-layer PlainMLP and ResMLP on a synthetic "Distant Identity" task to demonstrate the vanishing gradient problem and how residual connections solve it.

Tasks

Phase 1: Implementation

Implement PlainMLP (20 layers, hidden dim 64, ReLU, Kaiming He init)
Implement ResMLP (20 layers, hidden dim 64, residual connections, Kaiming He init)
Generate synthetic data (1024 vectors, dim 64, U(-1,1), Y=X)

Phase 2: Training

Train both models for 500 steps with Adam (lr=1e-3)
Record MSE loss at each step

Phase 3: Final State Analysis

Implement PyTorch hooks for gradient and activation capture
Perform forward/backward pass on new random batch
Capture L2 norm of gradients at each layer
Capture mean and std of activations at each layer

Phase 4: Visualization & Reporting

Plot Training Loss vs Steps (both models)
Plot Gradient Magnitude vs Layer Depth
Plot Activation Mean vs Layer Depth
Plot Activation Std vs Layer Depth
Write summary report with analysis

Expected Outcomes

PlainMLP: Vanishing gradients, poor learning of identity function
ResMLP: Stable gradients, successful learning of identity function