amirali1985 commited on
Commit
b02421a
Β·
verified Β·
1 Parent(s): 641998d

Upload modular/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. modular/README.md +49 -0
modular/README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Modular Arithmetic SoRL β€” Experiment Notes
2
+ Generated: 2026-04-23
3
+
4
+ ## Architecture Sweep Results
5
+
6
+ ### Goal
7
+ Test whether SoRL stabilizes grokking on modular arithmetic (mod 113).
8
+ Baselines grok but immediately un-grok (classic instability). Does SoRL hold it?
9
+
10
+ ### Results Summary
11
+
12
+ | Model | Mode | Best Acc | Final Acc | Notes |
13
+ |--------------------|----------|----------|-----------|------------------------------|
14
+ | 1L/1H/32d | baseline | 100% | 6.5% | Grokked epoch 2800, crashed |
15
+ | 1L/2H/64d | baseline | 65% | 14% | Partial, unstable |
16
+ | 1L/1H/128d | baseline | 100% | 30% | Grokked, then un-grokked |
17
+ | 1L/4H/64d | SoRL | 100% | **100%** | Stable βœ“ |
18
+ | 1L/4H/128d | SoRL | 100% | **100%** | Stable βœ“ (4400 epochs) |
19
+ | 1L/1H/32d | SoRL | ~TBD | ~TBD | Interrupted |
20
+
21
+ ### Key Finding
22
+ SoRL stabilizes grokking. Baselines find the solution and lose it; SoRL locks it in.
23
+ This mirrors the arithmetic interpretability finding: SoRL externalizes the mechanism,
24
+ making it robust to the weight updates that cause baseline un-grokking.
25
+
26
+ ### Architecture
27
+ - Task: (a + b) mod 113, p=113, all 12769 pairs, 30% train (seed=42)
28
+ - Qwen3-based SorlModelWrapper, trained from scratch
29
+ - abs_vocab=30, K=1, alpha_info_gain=10, alpha_abs=0.1, alpha_soft_zipf=1.0
30
+ - Full-batch training (batch_size=0), weight_decay=1.0
31
+
32
+ ### Fourier Analysis (Experiment 11)
33
+ Negative result: abstract tokens do NOT encode Fourier structure in 1L/4H/128d model.
34
+ DC component completely dominates (non-DC ratio ~0.01 for all groupings).
35
+ Hypothesis: model has sufficient internal capacity β†’ abstract tokens redundant.
36
+ Undersized model sweep was the follow-up to test capacity hypothesis.
37
+
38
+ ### SoRL Training Bug (Fixed)
39
+ Original modular train.py had three bugs vs trainer_ablate.py:
40
+ 1. btl not detached β†’ gradient through -10*btl taught model to forget baseline
41
+ 2. btl not added to total loss β†’ no SFT anchor
42
+ 3. sorl_search not wrapped in torch.no_grad() β†’ memory/gradient instability
43
+ Fixed by matching trainer_ablate.py pattern exactly.
44
+
45
+ ### Files
46
+ - modular/code/train.py β€” training script (baseline + SoRL)
47
+ - modular/code/sweep_undersized.txt β€” architecture sweep jobs
48
+ - modular/code/fourier_analysis.py β€” Fourier analysis script (experiment 11)
49
+ - modular/<run_name>/ β€” per-run: history.json, curves.png, config.json, best/, final/