Upload folder using huggingface_hub
Browse files
app.py
CHANGED
|
@@ -289,12 +289,49 @@ activation-level probing or SAEs needed. This is what we test on
|
|
| 289 |
detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
|
| 290 |
|
| 291 |
with gr.Accordion("Token Interpretability", open=True):
|
| 292 |
-
gr.Markdown("""
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 298 |
|
| 299 |
with gr.Accordion("About This Study", open=False):
|
| 300 |
eval_info_md = gr.Markdown("")
|
|
|
|
| 289 |
detail_table = gr.Dataframe(headers=["Split", "Accuracy", "N"], interactive=False)
|
| 290 |
|
| 291 |
with gr.Accordion("Token Interpretability", open=True):
|
| 292 |
+
gr.Markdown("""## Do abstraction tokens encode arithmetic reasoning?
|
| 293 |
+
|
| 294 |
+
When humans add multi-digit numbers, they track **carries** — if 7+8=15, they write 5 and carry 1 to the next column. [Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed that transformers learn the same carry/borrow circuits internally, discoverable only through activation-level analysis (PCA, probing).
|
| 295 |
+
|
| 296 |
+
**SoRL makes these circuits visible as explicit tokens.** Below we analyze a model trained with 30 abstraction tokens inserted every 4 positions (K=4). We ask: do specific tokens correlate with carry/borrow operations, and does the model *need* the right tokens to get the right answer?
|
| 297 |
+
|
| 298 |
+
### 1. Token distribution changes with problem difficulty
|
| 299 |
+
|
| 300 |
+
Easy problems (S0: no carries) and hard problems (S6: 6 consecutive carries) use **different abstraction tokens**. The model assigns different "scratchpad notes" depending on the arithmetic complexity.
|
| 301 |
+
""")
|
| 302 |
+
gr.Image("static_figures/fig1_token_by_difficulty.png", label="Token distribution by cascade depth")
|
| 303 |
+
|
| 304 |
+
gr.Markdown("""### 2. Causal verification: token identity matters for hard problems
|
| 305 |
+
|
| 306 |
+
Three interventions test whether tokens carry real information:
|
| 307 |
+
- **Shuffle**: randomly permute token IDs within the sequence (keeps positions, scrambles identity)
|
| 308 |
+
- **Random**: replace all tokens with random IDs
|
| 309 |
+
- **Knockout**: remove all abstraction tokens entirely
|
| 310 |
+
|
| 311 |
+
For easy problems (S0-S2), shuffling barely matters — the model can solve them without carry information. For **deep cascades (S4-S6)**, shuffling drops accuracy by 10-30 percentage points. The model needs the *correct* token identity to propagate carries through multiple digits.
|
| 312 |
+
""")
|
| 313 |
+
gr.Image("static_figures/fig2_causal_by_depth.png", label="Causal verification by cascade depth")
|
| 314 |
+
|
| 315 |
+
gr.Markdown("""### 3. Key token profiles
|
| 316 |
+
|
| 317 |
+
| Token | Appears when... | Interpretation |
|
| 318 |
+
|-------|----------------|----------------|
|
| 319 |
+
| **t3** | SA (simple add), addition only, no carry, answer=0 | **"No overflow"** — the leftmost digit is 0 |
|
| 320 |
+
| **t6** | UC (use carry), addition, sum%10=9 in 92% of cases | **"Sum-of-9 carry"** — Quirke's tri-state U case (eq. 2) |
|
| 321 |
+
| **t8, t9** | UC (use carry), addition, carry=100% | **"Definite carry"** — Quirke's tri-state 1 case |
|
| 322 |
+
| **t17** | MB (make borrow) 51%, subtraction | **"Borrow indicator"** — Quirke's MBn (eq. 7) |
|
| 323 |
+
| **t16** | carry=82%, mixed add/sub | **"Active carry/borrow state"** |
|
| 324 |
+
|
| 325 |
+
The model has independently discovered Quirke's tri-state carry classifier (eq. 2): separate tokens for "no carry" (t3), "uncertain/sum=9" (t6), and "definite carry" (t8/t9). This structure emerged purely from the SoRL info-gain objective, without any supervision about carry logic.
|
| 326 |
+
|
| 327 |
+
*Model: abs30 K=4, 2L/3H/510d, 100K training examples*
|
| 328 |
+
""")
|
| 329 |
+
# Keep interactive analysis as fallback
|
| 330 |
+
with gr.Accordion("Interactive Analysis (select model)", open=False):
|
| 331 |
+
interp_model_dd = gr.Dropdown(label="Model", choices=[], allow_custom_value=True)
|
| 332 |
+
interp_btn = gr.Button("Analyze")
|
| 333 |
+
interp_profiles = gr.Markdown("")
|
| 334 |
+
interp_causal = gr.Markdown("")
|
| 335 |
|
| 336 |
with gr.Accordion("About This Study", open=False):
|
| 337 |
eval_info_md = gr.Markdown("")
|
static_figures/fig1_token_by_difficulty.png
ADDED
|
static_figures/fig2_causal_by_depth.png
ADDED
|