| # Gradient Clipping Experiment | |
| ## Objective | |
| Demonstrate how gradient clipping stabilizes training by preventing sudden large weight updates caused by rare, high-loss data points. | |
| ## Task Breakdown | |
| - [ ] Step 1: Implement simple PyTorch model (Embedding + Linear) | |
| - [ ] Step 2: Create imbalanced synthetic dataset (990 'A', 10 'B' targets) | |
| - [ ] Step 3: Training loop WITHOUT gradient clipping - record metrics | |
| - [ ] Step 4: Training loop WITH gradient clipping (threshold=1.0) - record metrics | |
| - [ ] Step 5: Generate comparison plots | |
| - [ ] Step 6: Write summary report with findings | |
| ## Key Metrics to Track | |
| 1. Training loss per step | |
| 2. L2 norm of gradients (before clipping) | |
| 3. L2 norm of model weights | |
| ## Expected Outcome | |
| - Without clipping: Spiky gradient norms when encountering rare 'B' samples | |
| - With clipping: Bounded gradient norms, more stable training | |