File size: 866 Bytes
d9e1a5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Gradient Clipping Experiment
## Objective
Demonstrate how gradient clipping stabilizes training by preventing sudden large weight updates caused by rare, high-loss data points.
## Task Breakdown
- [ ] Step 1: Implement simple PyTorch model (Embedding + Linear)
- [ ] Step 2: Create imbalanced synthetic dataset (990 'A', 10 'B' targets)
- [ ] Step 3: Training loop WITHOUT gradient clipping - record metrics
- [ ] Step 4: Training loop WITH gradient clipping (threshold=1.0) - record metrics
- [ ] Step 5: Generate comparison plots
- [ ] Step 6: Write summary report with findings
## Key Metrics to Track
1. Training loss per step
2. L2 norm of gradients (before clipping)
3. L2 norm of model weights
## Expected Outcome
- Without clipping: Spiky gradient norms when encountering rare 'B' samples
- With clipping: Bounded gradient norms, more stable training
|