Upload folder using huggingface_hub
Browse files- .gitattributes +2 -0
- app.py +20 -39
- static_figures/fig_k1_causal.png +3 -0
- static_figures/fig_k1_token_difficulty.png +3 -0
.gitattributes
CHANGED
|
@@ -36,3 +36,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 36 |
static_figures/fig1_token_difficulty_profiles.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
static_figures/fig1_token_specialization.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
static_figures/fig2_causal_by_depth.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 36 |
static_figures/fig1_token_difficulty_profiles.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
static_figures/fig1_token_specialization.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
static_figures/fig2_causal_by_depth.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
static_figures/fig_k1_causal.png filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
static_figures/fig_k1_token_difficulty.png filter=lfs diff=lfs merge=lfs -text
|
app.py
CHANGED
|
@@ -358,59 +358,40 @@ effectively give the model external memory that compensates for its limited hidd
|
|
| 358 |
gr.Markdown("""## Do abstraction tokens encode carry/borrow circuits?
|
| 359 |
|
| 360 |
When humans add multi-digit numbers, they track **carries** β if 7+8=15, they write 5 and carry 1.
|
|
|
|
|
|
|
|
|
|
| 361 |
[Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed transformers learn carry/borrow
|
| 362 |
-
circuits internally,
|
| 363 |
-
carry classifier** (eq. 2): each position is **0** (no carry), **1** (definite carry), or **U**
|
| 364 |
-
(uncertain β digit sum = 9, carry depends on cascade from right).
|
| 365 |
|
| 366 |
-
**SoRL makes these circuits visible as explicit tokens.**
|
| 367 |
-
|
|
|
|
| 368 |
""")
|
| 369 |
|
| 370 |
gr.Markdown("""### 1. Token specialization by difficulty
|
| 371 |
|
| 372 |
For each token, we ask: **what kinds of problems does this token appear in?** The heatmap shows
|
| 373 |
-
P(difficulty
|
| 374 |
-
|
| 375 |
-
|
| 376 |
-
**Addition (left):** Token t3 (simple addition, 0% carry) concentrates on S0 (no cascades). Tokens
|
| 377 |
-
t8, t9 (100% carry) spread across S1-S5 β they're the carry workhorses. Token t2 peaks at S6
|
| 378 |
-
(the hardest cascade) despite having only 5% local carry β it encodes cascade *propagation*, not
|
| 379 |
-
local carry state.
|
| 380 |
-
|
| 381 |
-
**Subtraction (right):** Token t16 appears 93% in M0 (no borrows) β a pure "easy case" marker.
|
| 382 |
-
Tokens t5 and t11 shift toward M4/M5 (deep borrow cascades).
|
| 383 |
""")
|
| 384 |
-
gr.Image("static_figures/
|
| 385 |
|
| 386 |
-
gr.Markdown("""### 2. Causal verification: token identity
|
| 387 |
|
| 388 |
Three interventions test whether tokens carry real information:
|
| 389 |
-
- **Shuffle**: randomly permute token IDs
|
| 390 |
-
- **Random**: replace all tokens with random IDs
|
| 391 |
-
- **Knockout**: remove all abstraction tokens
|
| 392 |
|
| 393 |
-
|
| 394 |
-
|
| 395 |
-
|
| 396 |
""")
|
| 397 |
-
gr.Image("static_figures/
|
| 398 |
-
|
| 399 |
-
gr.Markdown("""### 3. Key token profiles
|
| 400 |
-
|
| 401 |
-
| Token | Appears when... | Interpretation |
|
| 402 |
-
|-------|----------------|----------------|
|
| 403 |
-
| **t3** | Simple add (SA), no carry, answer digit = 0 | **"No overflow"** β leftmost digit is 0 |
|
| 404 |
-
| **t6** | Use carry (UC), input digit sum mod 10 = 9 in 92% of cases | **"Sum-of-9"** β Quirke's tri-state **U** (uncertain carry) |
|
| 405 |
-
| **t8, t9** | Use carry (UC), carry = 100% of cases | **"Definite carry"** β Quirke's tri-state **1** |
|
| 406 |
-
| **t17** | Make borrow (MB) 51%, subtraction only | **"Borrow indicator"** β Quirke's MBn (eq. 7) |
|
| 407 |
-
| **t16** | Carry = 82%, mixed addition/subtraction | **"Active carry/borrow state"** |
|
| 408 |
-
|
| 409 |
-
The model independently discovered Quirke's tri-state carry classifier: separate tokens for
|
| 410 |
-
"no carry" (t3), "uncertain / sum=9" (t6), and "definite carry" (t8/t9).
|
| 411 |
-
This emerged purely from SoRL's info-gain objective β no supervision about carry logic.
|
| 412 |
|
| 413 |
-
|
|
|
|
| 414 |
""")
|
| 415 |
|
| 416 |
# ββ Tab 3: About ββ
|
|
|
|
| 358 |
gr.Markdown("""## Do abstraction tokens encode carry/borrow circuits?
|
| 359 |
|
| 360 |
When humans add multi-digit numbers, they track **carries** β if 7+8=15, they write 5 and carry 1.
|
| 361 |
+
For a chain like 999999+1, the carry cascades through all 6 digits. Subtraction has the same
|
| 362 |
+
structure with **borrows** instead of carries.
|
| 363 |
+
|
| 364 |
[Quirke et al. (2024)](https://arxiv.org/abs/2402.02619) showed transformers learn carry/borrow
|
| 365 |
+
circuits internally, but these are only discoverable through activation-level analysis (PCA, probing).
|
|
|
|
|
|
|
| 366 |
|
| 367 |
+
**SoRL makes these circuits visible as explicit tokens.** With K=1 (an abstraction at every
|
| 368 |
+
position), each answer digit gets its own scratchpad token. We analyze whether these tokens
|
| 369 |
+
specialize by problem difficulty.
|
| 370 |
""")
|
| 371 |
|
| 372 |
gr.Markdown("""### 1. Token specialization by difficulty
|
| 373 |
|
| 374 |
For each token, we ask: **what kinds of problems does this token appear in?** The heatmap shows
|
| 375 |
+
P(difficulty | token). Tokens at the top specialize in easy problems, tokens at the bottom
|
| 376 |
+
specialize in hard cascades.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 377 |
""")
|
| 378 |
+
gr.Image("static_figures/fig_k1_token_difficulty.png")
|
| 379 |
|
| 380 |
+
gr.Markdown("""### 2. Causal verification: token identity is critical
|
| 381 |
|
| 382 |
Three interventions test whether tokens carry real information:
|
| 383 |
+
- **Shuffle**: randomly permute token IDs (keeps positions, scrambles identity)
|
| 384 |
+
- **Random**: replace all tokens with random IDs
|
| 385 |
+
- **Knockout**: remove all abstraction tokens (0% accuracy β total dependence)
|
| 386 |
|
| 387 |
+
**Shuffle drops accuracy by 56-66 percentage points on S5/S6** (5-6 carry cascades).
|
| 388 |
+
Even easy problems (S0) drop ~30pp β with K=1, every position has an abstraction,
|
| 389 |
+
so shuffling disrupts every digit's computation.
|
| 390 |
""")
|
| 391 |
+
gr.Image("static_figures/fig_k1_causal.png")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 392 |
|
| 393 |
+
gr.Markdown("""
|
| 394 |
+
*Model: K=1 abs30, 2L/3H/510d, 100K training examples. Analysis: 4400 eval examples (200/split).*
|
| 395 |
""")
|
| 396 |
|
| 397 |
# ββ Tab 3: About ββ
|
static_figures/fig_k1_causal.png
ADDED
|
Git LFS Details
|
static_figures/fig_k1_token_difficulty.png
ADDED
|
Git LFS Details
|