AmberLJC
/

gradient_clipping_experiment

AmberLJC commited on Jan 24

Commit

d9e1a5d

verified ·

1 Parent(s): 6016155

Upload todo.md with huggingface_hub

Files changed (1) hide show

todo.md ADDED Viewed

+# Gradient Clipping Experiment
+## Objective
+Demonstrate how gradient clipping stabilizes training by preventing sudden large weight updates caused by rare, high-loss data points.
+## Task Breakdown
+- [ ] Step 1: Implement simple PyTorch model (Embedding + Linear)
+- [ ] Step 2: Create imbalanced synthetic dataset (990 'A', 10 'B' targets)
+- [ ] Step 3: Training loop WITHOUT gradient clipping - record metrics
+- [ ] Step 4: Training loop WITH gradient clipping (threshold=1.0) - record metrics
+- [ ] Step 5: Generate comparison plots
+- [ ] Step 6: Write summary report with findings
+## Key Metrics to Track
+1. Training loss per step
+2. L2 norm of gradients (before clipping)
+3. L2 norm of model weights
+## Expected Outcome
+- Without clipping: Spiky gradient norms when encountering rare 'B' samples
+- With clipping: Bounded gradient norms, more stable training