AmberLJC
/

resmlp_comparison

Model card Files Files and versions

resmlp_comparison / progress.md

AmberLJC's picture

Upload progress.md with huggingface_hub

1da57e1 verified 3 days ago

|

history blame contribute delete

1.08 kB

	# Progress Report

	## Task: PlainMLP vs ResMLP Comparison on Distant Identity Task

	- [x] Step 1: Setup project directory - DONE
	- [x] Step 2: Implement PlainMLP architecture - DONE
	- [x] Step 3: Implement ResMLP architecture - DONE
	- [x] Step 4: Generate synthetic identity data - DONE
	- [x] Step 5: Train both models for 500 steps - DONE
	- [x] Step 6: Capture activation/gradient statistics - DONE
	- [x] Step 7: Generate all 4 plots - DONE
	- [x] Step 8: Create summary report - IN PROGRESS

	## Key Results

	\| Metric \| PlainMLP \| ResMLP \|
	\|--------\|----------\|--------\|
	\| Final Loss \| 0.3123 \| 0.0630 \|
	\| Improvement \| - \| 5.0x \|
	\| Gradient Range \| [7.6e-3, 1.0e-2] \| [1.9e-3, 3.8e-3] \|
	\| Activation Std Range \| [0.36, 0.95] \| [0.13, 0.18] \|

	## Files Generated
	- `experiment_final.py` - Main experiment code
	- `results.json` - Numerical results
	- `plots/training_loss.png` - Training loss comparison
	- `plots/gradient_magnitude.png` - Per-layer gradient norms
	- `plots/activation_mean.png` - Per-layer activation means
	- `plots/activation_std.png` - Per-layer activation stds