amirali1985 commited on
Commit
c19bfbf
·
verified ·
1 Parent(s): 2b72200

Upload folder using huggingface_hub

Browse files
app.py CHANGED
@@ -289,12 +289,49 @@ activation-level probing or SAEs needed. This is what we test on
289
  detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
290
 
291
  with gr.Accordion("Token Interpretability", open=True):
292
- gr.Markdown("""**Do abstraction tokens encode carry/borrow circuits?**
293
- Parallels [Quirke et al. §3.2-3.3](https://arxiv.org/abs/2402.02619) — their equations 2 (STn tri-state carry) and 7 (MBn borrow).""")
294
- interp_model_dd = gr.Dropdown(label="Model", choices=[], allow_custom_value=True)
295
- interp_btn = gr.Button("Analyze", variant="primary")
296
- interp_profiles = gr.Markdown("")
297
- interp_causal = gr.Markdown("")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
298
 
299
  with gr.Accordion("About This Study", open=False):
300
  eval_info_md = gr.Markdown("")
 
289
  detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
290
 
291
  with gr.Accordion("Token Interpretability", open=True):
292
+ gr.Markdown("""## Do abstraction tokens encode arithmetic reasoning?
293
+
294
+ When humans add multi-digit numbers, they track **carries** — if 7+8=15, they write 5 and carry 1 to the next column. [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn the same carry/borrow circuits internally, discoverable only through activation-level analysis (PCA, probing).
295
+
296
+ **SoRL makes these circuits visible as explicit tokens.** Below we analyze a model trained with 30 abstraction tokens inserted every 4 positions (K=4). We ask: do specific tokens correlate with carry/borrow operations, and does the model *need* the right tokens to get the right answer?
297
+
298
+ ### 1. Token distribution changes with problem difficulty
299
+
300
+ Easy problems (S0: no carries) and hard problems (S6: 6 consecutive carries) use **different abstraction tokens**. The model assigns different "scratchpad notes" depending on the arithmetic complexity.
301
+ """)
302
+ gr.Image("static_figures/fig1_token_by_difficulty.png", label="Token distribution by cascade depth")
303
+
304
+ gr.Markdown("""### 2. Causal verification: token identity matters for hard problems
305
+
306
+ Three interventions test whether tokens carry real information:
307
+ - **Shuffle**: randomly permute token IDs within the sequence (keeps positions, scrambles identity)
308
+ - **Random**: replace all tokens with random IDs
309
+ - **Knockout**: remove all abstraction tokens entirely
310
+
311
+ For easy problems (S0-S2), shuffling barely matters — the model can solve them without carry information. For **deep cascades (S4-S6)**, shuffling drops accuracy by 10-30 percentage points. The model needs the *correct* token identity to propagate carries through multiple digits.
312
+ """)
313
+ gr.Image("static_figures/fig2_causal_by_depth.png", label="Causal verification by cascade depth")
314
+
315
+ gr.Markdown("""### 3. Key token profiles
316
+
317
+ | Token | Appears when... | Interpretation |
318
+ |-------|----------------|----------------|
319
+ | **t3** | SA (simple add), addition only, no carry, answer=0 | **"No overflow"** — the leftmost digit is 0 |
320
+ | **t6** | UC (use carry), addition, sum%10=9 in 92% of cases | **"Sum-of-9 carry"** — Quirke's tri-state U case (eq. 2) |
321
+ | **t8, t9** | UC (use carry), addition, carry=100% | **"Definite carry"** — Quirke's tri-state 1 case |
322
+ | **t17** | MB (make borrow) 51%, subtraction | **"Borrow indicator"** — Quirke's MBn (eq. 7) |
323
+ | **t16** | carry=82%, mixed add/sub | **"Active carry/borrow state"** |
324
+
325
+ The model has independently discovered Quirke's tri-state carry classifier (eq. 2): separate tokens for "no carry" (t3), "uncertain/sum=9" (t6), and "definite carry" (t8/t9). This structure emerged purely from the SoRL info-gain objective, without any supervision about carry logic.
326
+
327
+ *Model: abs30 K=4, 2L/3H/510d, 100K training examples*
328
+ """)
329
+ # Keep interactive analysis as fallback
330
+ with gr.Accordion("Interactive Analysis (select model)", open=False):
331
+ interp_model_dd = gr.Dropdown(label="Model", choices=[], allow_custom_value=True)
332
+ interp_btn = gr.Button("Analyze")
333
+ interp_profiles = gr.Markdown("")
334
+ interp_causal = gr.Markdown("")
335
 
336
  with gr.Accordion("About This Study", open=False):
337
  eval_info_md = gr.Markdown("")
static_figures/fig1_token_by_difficulty.png ADDED
static_figures/fig2_causal_by_depth.png ADDED