AmberLJC
/

resmlp_comparison

Model card Files Files and versions

resmlp_comparison / todo.md

AmberLJC's picture

Upload todo.md with huggingface_hub

2c24e4a verified 16 days ago

|

history blame contribute delete

1.3 kB

	# PlainMLP vs ResMLP Comparison - Distant Identity Task

	## Objective
	Compare a 20-layer PlainMLP and ResMLP on a synthetic "Distant Identity" task to demonstrate the vanishing gradient problem and how residual connections solve it.

	## Tasks

	### Phase 1: Implementation
	- [ ] Implement PlainMLP (20 layers, hidden dim 64, ReLU, Kaiming He init)
	- [ ] Implement ResMLP (20 layers, hidden dim 64, residual connections, Kaiming He init)
	- [ ] Generate synthetic data (1024 vectors, dim 64, U(-1,1), Y=X)

	### Phase 2: Training
	- [ ] Train both models for 500 steps with Adam (lr=1e-3)
	- [ ] Record MSE loss at each step

	### Phase 3: Final State Analysis
	- [ ] Implement PyTorch hooks for gradient and activation capture
	- [ ] Perform forward/backward pass on new random batch
	- [ ] Capture L2 norm of gradients at each layer
	- [ ] Capture mean and std of activations at each layer

	### Phase 4: Visualization & Reporting
	- [ ] Plot Training Loss vs Steps (both models)
	- [ ] Plot Gradient Magnitude vs Layer Depth
	- [ ] Plot Activation Mean vs Layer Depth
	- [ ] Plot Activation Std vs Layer Depth
	- [ ] Write summary report with analysis

	## Expected Outcomes
	- PlainMLP: Vanishing gradients, poor learning of identity function
	- ResMLP: Stable gradients, successful learning of identity function