amirali1985 commited on
Commit
6c7e2bf
Β·
verified Β·
1 Parent(s): b7fce0d

Upload folder using huggingface_hub

Browse files
app.py CHANGED
@@ -327,21 +327,44 @@ a small auxiliary vocabulary (e.g. 30 tokens) inserted at regular intervals (eve
327
  detail_btn = gr.Button("Show splits")
328
  detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
329
 
330
- # ── Tab 2: Interpretability ──
331
- with gr.TabItem("Interpretability"):
332
- gr.Markdown("""## Do abstraction tokens encode arithmetic reasoning?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
333
 
334
- When humans add multi-digit numbers, they track **carries** β€” if 7+8=15, they write 5 and carry 1
335
- to the next column. For a chain like 999999+1, the carry cascades through all 6 digits.
 
 
 
 
 
 
336
 
337
- [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn carry/borrow
338
- circuits internally, discoverable only through activation-level analysis (PCA, probing). They define
339
- a **tri-state carry classifier** (their eq. 2): for each digit position, the carry state is either
340
- **0** (no carry), **1** (definite carry), or **U** (uncertain β€” the digit pair sums to exactly 9,
341
- so whether there's a carry depends on the cascade from the right).
342
 
343
- **SoRL makes these circuits visible as explicit tokens.** We analyze a model trained with 30
344
- abstraction tokens inserted every 4 positions (K=4).
345
  """)
346
 
347
  gr.Markdown("""### 1. Token specialization by difficulty
 
327
  detail_btn = gr.Button("Show splits")
328
  detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
329
 
330
+ # ── Tab 2: Results ──
331
+ with gr.TabItem("Results"):
332
+ gr.Markdown("""## SoRL K=1 abs30: never loses to baseline
333
+
334
+ Our best config β€” **K=1 (abstraction at every position), vocab size 30** β€” matches or beats
335
+ the SFT baseline on every data size and every architecture tested. No exceptions.
336
+ """)
337
+ gr.Image("static_figures/fig_data_efficiency.png")
338
+
339
+ gr.Markdown("""**At 10K training examples**, SoRL K=1 abs30 reaches **96.1%** while the baseline
340
+ reaches only 72.4% β€” a **+24 percentage point** improvement. At 25K, SoRL hits 100% while the
341
+ baseline is at 91.6%. By 50K both reach 100%.
342
+
343
+ K=4 (abstraction every 4th position) fails at 10K data β€” it doesn't have enough examples to learn
344
+ useful abstractions through search. K=1 is more data-efficient because every position gets a
345
+ scratchpad token.
346
+ """)
347
+
348
+ gr.Markdown("### SoRL helps undersized models the most")
349
+ gr.Image("static_figures/fig_undersized.png")
350
 
351
+ gr.Markdown("""The biggest gains are on **capacity-limited architectures**. A 2L/1H/128d model
352
+ goes from 50% (baseline) to **85%** (SoRL K=1 abs30) β€” a +35pp improvement. The abstraction tokens
353
+ effectively give the model external memory that compensates for its limited hidden dimensions.
354
+ """)
355
+
356
+ # ── Tab 3: Interpretability ──
357
+ with gr.TabItem("Interpretability"):
358
+ gr.Markdown("""## Do abstraction tokens encode carry/borrow circuits?
359
 
360
+ When humans add multi-digit numbers, they track **carries** β€” if 7+8=15, they write 5 and carry 1.
361
+ [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed transformers learn carry/borrow
362
+ circuits internally, discoverable only through activation-level analysis. They define a **tri-state
363
+ carry classifier** (eq. 2): each position is **0** (no carry), **1** (definite carry), or **U**
364
+ (uncertain β€” digit sum = 9, carry depends on cascade from right).
365
 
366
+ **SoRL makes these circuits visible as explicit tokens.** We analyze our K=4 abs30 model (which
367
+ has fewer abstraction positions, forcing sharper specialization).
368
  """)
369
 
370
  gr.Markdown("""### 1. Token specialization by difficulty
static_figures/fig_data_efficiency.png ADDED
static_figures/fig_undersized.png ADDED