Spaces:

y-agent
/

modular-addition-feature-learning

Sleeping

App Files Files Community

zhuoranyang commited on Feb 18

Commit

e518ead

·

verified ·

1 Parent(s): 8df851f

Fix Tab 7: 10K epochs, better neuron selection, fixed timepoints

Files changed (19) hide show

README.md +2 -2
precompute/generate_plots.py +37 -11
precompute/prime_config.py +2 -2
precomputed_results/p_015/p015_phase_align_quad.png +0 -0
precomputed_results/p_015/p015_phase_align_relu.png +0 -0
precomputed_results/p_015/p015_single_freq_quad.png +2 -2
precomputed_results/p_015/p015_single_freq_relu.png +2 -2
precomputed_results/p_023/p023_phase_align_quad.png +0 -0
precomputed_results/p_023/p023_phase_align_relu.png +0 -0
precomputed_results/p_023/p023_single_freq_quad.png +2 -2
precomputed_results/p_023/p023_single_freq_relu.png +2 -2
precomputed_results/p_029/p029_phase_align_quad.png +0 -0
precomputed_results/p_029/p029_phase_align_relu.png +0 -0
precomputed_results/p_029/p029_single_freq_quad.png +2 -2
precomputed_results/p_029/p029_single_freq_relu.png +2 -2
precomputed_results/p_031/p031_phase_align_quad.png +0 -0
precomputed_results/p_031/p031_phase_align_relu.png +0 -0
precomputed_results/p_031/p031_single_freq_quad.png +2 -2
precomputed_results/p_031/p031_single_freq_relu.png +2 -2

README.md CHANGED Viewed

@@ -160,8 +160,8 @@ Each modulus produces ~33 files in `precomputed_results/p_XXX/`:
 | `standard` | ReLU | AdamW | 5e-5 | 0 | 100% | 5,000 | Tabs 1–4 |
 | `grokking` | ReLU | AdamW | 1e-4 | 2.0 | 75% | 50,000 | Tabs 1, 6 |
 | `quad_random` | Quad | AdamW | 5e-5 | 0 | 100% | 5,000 | Tab 5 |
-| `quad_single_freq` | Quad | SGD | 0.1 | 0 | 100% | 5,000 | Tab 7 |
-| `relu_single_freq` | ReLU | SGD | 0.01 | 0 | 100% | 5,000 | Tab 7 |
 ## Running a Single Experiment

 | `standard` | ReLU | AdamW | 5e-5 | 0 | 100% | 5,000 | Tabs 1–4 |
 | `grokking` | ReLU | AdamW | 1e-4 | 2.0 | 75% | 50,000 | Tabs 1, 6 |
 | `quad_random` | Quad | AdamW | 5e-5 | 0 | 100% | 5,000 | Tab 5 |
+| `quad_single_freq` | Quad | SGD | 0.1 | 0 | 100% | 10,000 | Tab 7 |
+| `relu_single_freq` | ReLU | SGD | 0.01 | 0 | 100% | 10,000 | Tab 7 |
 ## Running a Single Experiment

precompute/generate_plots.py CHANGED Viewed

@@ -1603,12 +1603,36 @@ class PlotGenerator:
                         'phi_out': phi_out,
                     })
-            # Select a neuron that shows clear phase alignment
-            # Pick neuron with largest final scale
             final_records = [r for r in all_neuron_records if r['epoch'] == epochs[-1]]
             if not final_records:
                 continue
-            best_neuron = max(final_records, key=lambda r: r['scale_in'])['neuron']
             # Extract trajectory for this neuron
             neuron_records = [r for r in all_neuron_records if r['neuron'] == best_neuron]
@@ -1667,16 +1691,18 @@ class PlotGenerator:
             _save_fig(fig, self._out(f'phase_align_{prefix}.png'))
             # ---- Decoded weights at timepoints ----
             if prefix == 'quad':
-                keys = [0]
-                mid = min(epochs, key=lambda e: abs(e - 1000))
-                end = epochs[-1]
-                if mid not in keys:
-                    keys.append(mid)
-                if end not in keys:
-                    keys.append(end)
             else:
-                keys = [0, epochs[-1]]
             num_components = min(20, d_mlp)
             n = len(keys)

                         'phi_out': phi_out,
                     })
+            # Select a neuron that shows interesting phase convergence dynamics.
+            # The lottery winner (largest final scale) already has ψ ≈ 2φ from
+            # the start, producing flat boring plots. Instead, pick a neuron that
+            # (a) has significant final scale (top quartile → actually learned),
+            # (b) had the largest initial phase misalignment |ψ₀ - 2φ₀|.
             final_records = [r for r in all_neuron_records if r['epoch'] == epochs[-1]]
             if not final_records:
                 continue
+            init_records = [r for r in all_neuron_records if r['epoch'] == epochs[0]]
+            init_by_neuron = {r['neuron']: r for r in init_records}
+            # Keep neurons with final scale in top 25%
+            scales = sorted([r['scale_in'] for r in final_records], reverse=True)
+            scale_threshold = scales[max(0, len(scales) // 4 - 1)] if len(scales) >= 4 else scales[-1]
+            strong_neurons = [r for r in final_records if r['scale_in'] >= scale_threshold]
+            # Among strong neurons, pick the one with largest initial misalignment
+            best_neuron = None
+            best_misalign = -1.0
+            for r in strong_neurons:
+                n = r['neuron']
+                if n not in init_by_neuron:
+                    continue
+                ir = init_by_neuron[n]
+                misalign = abs(normalize_to_pi(ir['phi_out'] - 2 * ir['phi_in']))
+                if misalign > best_misalign:
+                    best_misalign = misalign
+                    best_neuron = n
+            if best_neuron is None:
+                best_neuron = max(final_records, key=lambda r: r['scale_in'])['neuron']
             # Extract trajectory for this neuron
             neuron_records = [r for r in all_neuron_records if r['neuron'] == best_neuron]
             _save_fig(fig, self._out(f'phase_align_{prefix}.png'))
             # ---- Decoded weights at timepoints ----
+            # Use fixed timepoints matching the notebook figures:
+            #   Quad: steps 0, 1000, 5000   ReLU: steps 0, 5000
             if prefix == 'quad':
+                target_keys = [0, 1000, 5000]
             else:
+                target_keys = [0, 5000]
+            # Snap each target to the nearest available checkpoint epoch
+            keys = []
+            for t in target_keys:
+                nearest = min(epochs, key=lambda e: abs(e - t))
+                if nearest not in keys:
+                    keys.append(nearest)
             num_components = min(20, d_mlp)
             n = len(keys)

precompute/prime_config.py CHANGED Viewed

@@ -94,7 +94,7 @@ TRAINING_RUNS = {
         "lr": 0.1,
         "weight_decay": 0,
         "frac_train": 1.0,
-        "num_epochs": 5000,
         "save_every": 200,
         "init_scale": 0.02,
         "save_models": True,
@@ -109,7 +109,7 @@ TRAINING_RUNS = {
         "lr": 0.01,
         "weight_decay": 0,
         "frac_train": 1.0,
-        "num_epochs": 5000,
         "save_every": 200,
         "init_scale": 0.002,
         "save_models": True,

         "lr": 0.1,
         "weight_decay": 0,
         "frac_train": 1.0,
+        "num_epochs": 10000,
         "save_every": 200,
         "init_scale": 0.02,
         "save_models": True,
         "lr": 0.01,
         "weight_decay": 0,
         "frac_train": 1.0,
+        "num_epochs": 10000,
         "save_every": 200,
         "init_scale": 0.002,
         "save_models": True,

precomputed_results/p_015/p015_phase_align_quad.png CHANGED Viewed

precomputed_results/p_015/p015_phase_align_relu.png CHANGED Viewed

precomputed_results/p_015/p015_single_freq_quad.png CHANGED Viewed

Git LFS Details

SHA256: ccbca685fc93991c9965ca3e8fc3fa87ae3fd03e74fb5c40b9f03468e63dda79
Pointer size: 131 Bytes
Size of remote file: 143 kB

Git LFS Details

SHA256: e5906064f6dbfbeca87b012e3a4df82884d4c5349469e352ffe1ea2339a1d857
Pointer size: 131 Bytes
Size of remote file: 145 kB

precomputed_results/p_015/p015_single_freq_relu.png CHANGED Viewed

Git LFS Details

SHA256: fa5f0678750033da499f044eb3d50a94cfd0f2ce00b69dbd76692a8ae6be4aa6
Pointer size: 131 Bytes
Size of remote file: 105 kB

Git LFS Details

SHA256: d6c7fe07aa3804a74002a27f11d9d5fc0cf59f4878a782ed599182af615b366c
Pointer size: 131 Bytes
Size of remote file: 103 kB

precomputed_results/p_023/p023_phase_align_quad.png CHANGED Viewed

precomputed_results/p_023/p023_phase_align_relu.png CHANGED Viewed

precomputed_results/p_023/p023_single_freq_quad.png CHANGED Viewed

Git LFS Details

SHA256: 48cecf07f923c6d0757652f68c427b4da1b3cde5f6b3dd8e2503bf60d9058f13
Pointer size: 131 Bytes
Size of remote file: 160 kB

Git LFS Details

SHA256: 514323ec0a4055901b250f0fc32289d125dd2e1678d947417ad26ce33e531063
Pointer size: 131 Bytes
Size of remote file: 151 kB

precomputed_results/p_023/p023_single_freq_relu.png CHANGED Viewed

Git LFS Details

SHA256: 441e3592d831b1ce488d7445610f296841302f31b208417f9e942d50adad4129
Pointer size: 131 Bytes
Size of remote file: 125 kB

Git LFS Details

SHA256: 8e30db53c38ab81e903456d954ee0c279d40fc28b8481d710a2d4e41a3c236bc
Pointer size: 131 Bytes
Size of remote file: 124 kB

precomputed_results/p_029/p029_phase_align_quad.png CHANGED Viewed

precomputed_results/p_029/p029_phase_align_relu.png CHANGED Viewed

precomputed_results/p_029/p029_single_freq_quad.png CHANGED Viewed

Git LFS Details

SHA256: a4bc4fa7afb28ff80d3ffa30a460293e12f604a4ee3238eddddb8402cc13f79a
Pointer size: 131 Bytes
Size of remote file: 164 kB

Git LFS Details

SHA256: 71aa792034d88cd9c5a7dbeb8ffa3088b52817fcf7209d0ddf1c120f1439f0f8
Pointer size: 131 Bytes
Size of remote file: 163 kB

precomputed_results/p_029/p029_single_freq_relu.png CHANGED Viewed

Git LFS Details

SHA256: 85dcb7facbf0161bad07679baeb1a2c96b684176b9706e4547b0808ea122adce
Pointer size: 131 Bytes
Size of remote file: 126 kB

Git LFS Details

SHA256: 964d258244a653f430aa88e4aa62abcbba753bef38961e6b8c49797ab22bf4bb
Pointer size: 131 Bytes
Size of remote file: 125 kB

precomputed_results/p_031/p031_phase_align_quad.png CHANGED Viewed

precomputed_results/p_031/p031_phase_align_relu.png CHANGED Viewed

precomputed_results/p_031/p031_single_freq_quad.png CHANGED Viewed

Git LFS Details

SHA256: 00ed32cbc3259510286729ea0d16cac0f7050de0e35c4bca4e65f5d46b0270dc
Pointer size: 131 Bytes
Size of remote file: 162 kB

Git LFS Details

SHA256: c5a7ccd341127c493b97e32ea2b02b42a5d3678b7f38db3bc57d8b441abd30d9
Pointer size: 131 Bytes
Size of remote file: 162 kB

precomputed_results/p_031/p031_single_freq_relu.png CHANGED Viewed

Git LFS Details

SHA256: b4873e378e97c8ec6c04d1da29b9df4382d434baf78edbb1e6ca3e6571b646e4
Pointer size: 131 Bytes
Size of remote file: 127 kB

Git LFS Details

SHA256: bc893000f749ec845d0cb00c4103a2d68dd4acfbabd55f8f17eb4196624b3848
Pointer size: 131 Bytes
Size of remote file: 126 kB