thoughtworks
/

arithmetic-sorl

+# Modular Arithmetic SoRL — Experiment Notes
+Generated: 2026-04-23
+## Architecture Sweep Results
+### Goal
+Test whether SoRL stabilizes grokking on modular arithmetic (mod 113).
+Baselines grok but immediately un-grok (classic instability). Does SoRL hold it?
+### Results Summary
+| Model              | Mode     | Best Acc | Final Acc | Notes                        |
+|--------------------|----------|----------|-----------|------------------------------|
+| 1L/1H/32d          | baseline | 100%     | 6.5%      | Grokked epoch 2800, crashed  |
+| 1L/2H/64d          | baseline | 65%      | 14%       | Partial, unstable            |
+| 1L/1H/128d         | baseline | 100%     | 30%       | Grokked, then un-grokked     |
+| 1L/4H/64d          | SoRL     | 100%     | **100%**  | Stable ✓                     |
+| 1L/4H/128d         | SoRL     | 100%     | **100%**  | Stable ✓ (4400 epochs)       |
+| 1L/1H/32d          | SoRL     | ~TBD     | ~TBD      | Interrupted                  |
+### Key Finding
+SoRL stabilizes grokking. Baselines find the solution and lose it; SoRL locks it in.
+This mirrors the arithmetic interpretability finding: SoRL externalizes the mechanism,
+making it robust to the weight updates that cause baseline un-grokking.
+### Architecture
+- Task: (a + b) mod 113, p=113, all 12769 pairs, 30% train (seed=42)
+- Qwen3-based SorlModelWrapper, trained from scratch
+- abs_vocab=30, K=1, alpha_info_gain=10, alpha_abs=0.1, alpha_soft_zipf=1.0
+- Full-batch training (batch_size=0), weight_decay=1.0
+### Fourier Analysis (Experiment 11)
+Negative result: abstract tokens do NOT encode Fourier structure in 1L/4H/128d model.
+DC component completely dominates (non-DC ratio ~0.01 for all groupings).
+Hypothesis: model has sufficient internal capacity → abstract tokens redundant.
+Undersized model sweep was the follow-up to test capacity hypothesis.
+### SoRL Training Bug (Fixed)
+Original modular train.py had three bugs vs trainer_ablate.py:
+1. btl not detached → gradient through -10*btl taught model to forget baseline
+2. btl not added to total loss → no SFT anchor
+3. sorl_search not wrapped in torch.no_grad() → memory/gradient instability
+Fixed by matching trainer_ablate.py pattern exactly.
+### Files
+- modular/code/train.py — training script (baseline + SoRL)
+- modular/code/sweep_undersized.txt — architecture sweep jobs
+- modular/code/fourier_analysis.py — Fourier analysis script (experiment 11)
+- modular/<run_name>/ — per-run: history.json, curves.png, config.json, best/, final/